Cluster Analysis of Information in Complex Networks
DOI:
https://doi.org/10.47839/ijc.22.4.3360Keywords:
complex networks, information system, random matrix, graph clustering, Monte Carlo methodAbstract
The research is devoted to the study of information in complex networks, namely: calculation of statistical characteristics and cluster analysis of data. Special software (crawler) was developed for direct data collection from the web space. In addition, a new structure of information technology has been developed for the collection, processing, and storage of large volumes of data collected from the web space. With the help of this structure, statistical characteristics of different segments of the web space (Ukrainian – edu.ua, Polish – edu.pl and Israeli – ac.il) are studied and their cluster structure is studied. The study of the cluster structure of web space zones was carried out using the spectral clustering algorithm of РІС (Rower iteration clustering). The results of the search for the optimal number of clusters using the "elbow" method and the k-Core decomposition method are presented, graphs illustrating the cluster structure of the investigated subnets are drawn. The paper also proposes a new approach to solving the problem of clustering and finding the optimal number of clusters when clustering objects are given by unstructured data (graphs) based on the spectral analysis of the stochastic matrix of the given graph. On this basis, a new method developed by the authors for determining the optimal number of clusters is proposed. Model examples are given and testing of the new method based on Monte Carlo simulation is performed. The optimal number of clusters was found by four methods: the "elbow" method, the k-core decomposition method, the silhouette method, and a new method developed by the authors. A conclusion is made concerning the accuracy of the developed new method, its advantages and disadvantages.
References
Yu. Holovatch, O. Olemskoi, C. von Ferber, T. Holovatch, O. Mryglod, I. Olemskoi, V. Palchykov, “Complex networks,” Journal of Physical Studies, vol. 10, no. 4, pp. 247–289, 2006. https://doi.org/10.30970/jps.10.247
M. E. J. Newman, “The structure of scientific collaboration networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 2, pp. 404–409, 2001. https://doi.org/10.1073/pnas.98.2.404.
M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, issue 2, pp. 167–256, 2003. https://doi.org/10.1137/S003614450342480.
S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, pp. 268-276, 2001. https://doi.org/10.1038/35065725.
M. E. J. Newman, “Models of the small world,” J. Stat. Phys, vol. 101, pp. 819–841, 2000. https://doi.org/10.1023/A:1026485807148.
A. Broder, R. Kumar, F. Maghoul et al. “Graph structure in the web,” Proceedings of the 9th World Wide Web Conference on Computer networks, 2000, vol. 33 (1), pp. 309-320.
A. Barrat, M. Weight, “On the properties of small-world networks models,” The European Physical Journal, vol. 13, pp. 547–560, 2000. https://doi.org/10.1007/s100510050067.
D. J. Watts, S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, pp. 440–442, 1998. https://doi.org/10.1038/30918.
L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley, “Classes of small-world networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 21, pp. 11149–11152, 2000. https://doi.org/10.1073/pnas.200327197.
D. J. Watts, Small Worlds: The Dynamics of Networks Between Order and Randomness, Princeton University Press, 1999, 262 p. https://doi.org/10.1515/9780691188331.
M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, issue 2, pp. 167-256, 2003. https://doi.org/10.1137/S003614450342480
R. Baeza-Yates, C. Castillo, E. N. Efthimiadis, “Characterization of national web domains,” Journal ACM Transactions on Internet Technology, vol. 7, no. 2, pp. 9–33, 2007. https://doi.org/10.1145/1239971.1239973.
V. V. Pasichnyk, N. M. Ivanushchak, “Research and modelling of complex networks,” Eastern-European Journal of Enterprise Technologies, vol. 2, no. 3(44), pp. 43–48, 2010.
J. M. Kleinberg, “Navigation in a small world,” Nature, vol. 406, p. 845, 2000. https://doi.org/10.1038/35022643
M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Random graphs with arbitrary degree distribution and their applications,” Physical Review E 64, 026118, 2001. https://doi.org/10.1103/PhysRevE.64.026118.
N. Mishra, R. Schreiber, I. Stanton, and R. Tarjan, “Clustering social networks,” In Anthony Bonato and Fan R. K. Chung, editors, Algorithms and Models for the Web-Graph, Vol. 4863 of Lecture Notes in Computer Science, chapter 5, pp. 56–67. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
P. Domingos and M. Richardson, “Mining the network value of customers,” Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’01, New York, pp. 57–66, 2001. https://doi.org/10.1145/502512.502525.
S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, issues 3-5, pp. 75–174, 2010. https://doi.org/10.1016/j.physrep.2009.11.002
R. C. Qiu, P. Antonik, Smart Grid using Big Data Analytics: A Random Matrix Theory Approach, Wiley, 2017, 632 p.
T Tao, Topics in Random Matrix Theory, American Mathematical Society, 2023. 282 p.
F. Lin and W. W. Cohen, “Power iteration clustering,” Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, June 21-24, 2010, pp. 1-10.
U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, pp. 395–416, 2007. https://doi.org/10.1007/s11222-007-9033-z.
T. Xiang and S. Gong, “Spectral clustering with eigenvector selection,” Pattern Recognition, vol. 41, issue 3, pp. 1012–1029, 2008. https://doi.org/10.1016/j.patcog.2007.07.023
A. Y. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Advances in Neural Information Processing Systems 14 (NIPS 2001), 2001, pp. 1-8.
T. C. Ramos, J. Mourão-Miranda and A. Fujita, “Spectral density-based clustering algorithms for complex networks,” Front. Neurosci., vol. 17, article 926321, 2023. https://doi.org/10.3389/fnins.2023.926321.
O. Kyrychenko, “Features of software architecture for collecting and analyzing statistical information on the global network,” Information Technology: Computer Science, Software Engineering and Cyber Security, no. 2, pp. 107–112, 2023. https://doi.org/10.32782/it/2023-2-13.
R. Xin, J. Gonzalez, M. Franklin, and I. Stoica “GraphX: A resilient distributed graph system on spark,” AMPLab, EECS, UC Berkeley 2013. https://doi.org/10.1145/2484425.2484427
S.-T. Cheng, Y.-C. Chen, and M.-S. Tsai, “Using k-Core decomposition to find cluster centers for k-Means algorithm in GraphX on Spark,” Proceedings of the 2017 Eighth International Conference on Cloud Computing, GRIDs, and Virtualization Cloud Computing’2017, 2017, pp. 93-98.
A. Kuraria, N. Jharbade, & M. Soni, Manish, “Centroid selection process using WCSS and elbow method for K-Mean clustering algorithm in data mining,” International Journal of Scientific Research in Science, Engineering and Technology, vol. 4, issue 11, pp. 190-195, 2018. https://doi.org/10.32628/IJSRSET21841122
Determining the number of clusters in a data set. [Online]. Available at: https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
O. Kyrychenko, “Information technology for statistical cluster analysis of information in complex networks,” Computer Systems and Information Technologies, vol. 4, pp. 47–51, 2022. https://doi.org/10.31891/csit-2022-4-7
O. L. Kyrychenko, S. Ostapov, I. Kanovsky, “Investigation of the certain internet domain statistical characteristics,” Eastern-European Journal of Enterprise Technologies, vol. 6, no. 12(66), pp. 91–96, 2014. https://doi.org/10.15587/1729-4061.2013.19698
O. L. Kyrychenko, I. V. Malyk, & S. E. Ostapov, “Stochastic models in artificial intelligence development,” Bulletin of Taras Shevchenko National University of Kyiv. Physics and Mathematics, no. 2, pp. 53–57, 2021. https://doi.org/10.17721/1812-5409.2021/2.7
E. P. Wigner, “On the distribution of the roots of certain symmetric matrices,” The Annals of Mathematics, Second Series, vol. 67, no. 2, pp. 325-327, 1958. https://doi.org/10.2307/1970008
E. P. Wigner, “Characteristic vectors of bordered matrices with infinite dimensions,” The Annals of Mathematics, second series, vol. 62, no. 3, pp. 548-564, 1955. https://doi.org/10.2307/1970079
L. Lenssen, E. Schubert, Erich, “Clustering by direct optimization of the medoid silhouette,” Proceedings of the International Conference on Similarity Search and Applications (SISAP’2022), 2022, pp. 190–204. https://doi.org/10.1007/978-3-031-17849-8_15
F. Batool, C. Hennig, “Clustering with the average silhouette width,” Computational Statistics & Data Analysis, vol. 158, 2021. https://doi.org/10.1016/j.csda.2021.107190
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.