Using Big Data Analytics to Identify Trends and Group Crimes through Clustering
DOI:
https://doi.org/10.47839/ijc.23.3.3658Keywords:
Big Data, crime, crime trends, clustering; crimes, data mining, securityAbstract
The incidence of crime in a city presents a challenge in the absence of trend analysis that impacts citizen security. The objective of this research was to analyze and visualize crime trends in the area, using the concepts and fundamentals of Big Data Analytics, Data Mining and Clustering, the problem is addressed with a quantitative approach, using the CRISP-DM process, Principal Component Analysis (PCA) and the K-Means algorithm for clustering. Validation is performed with the Elbow Score and the Average Silhouette method, ensuring the robustness of the data clustering. The results show that crimes against property, such as robbery and theft, are the most frequent. Four crime clusters are identified, each associated with a specific category, providing a detailed view of crime distribution. Comparison with previous studies highlights the effectiveness of Big Data technologies in reducing crime, providing a solid basis for more accurate security strategies.
References
P. R. Ventura Suclupe, & C. Etayo Pérez, “Informational treatment of crimes committed by minors,” Studies on Journalistic Message, vol. 23, issue 2, pp. 1005-1022, 2017. https://doi.org/10.5209/ESMP.58029.
GI-TOC, The Global Organized Crime Index 2021: Taking the measure of crime, 2021, [Online]. Available at: https://globalinitiative.net/analysis/ocindex-2021/
J. van Dijk, P. Nieuwbeerta, and J. Joudo Larsen, “Global crime patterns: An analysis of survey data from 166 countries around the world, 2006-2019,” Journal of Quantitative Criminology, vol. 38, no. 4, pp. 793-827, 2022. https://doi.org/10.1007/s10940-021-09501-0.
A. Corbacho, J. Philipp, and M. Ruiz-Vega, “Crime and erosion of trust: Evidence for Latin America,” World Development, vol. 70, pp. 400-415, 2015. https://doi.org/10.1016/j.worlddev.2014.04.013.
INEI, Crime, Citizen Security, and Violence Statistics: A view from administrative records, Technical Report, Lima, Peru: National Institute of Statistics and Informatics, April-June 2022.
“Metropolitan Lima Regional Citizen Security Action Plan 2022,” Metropolitan Lima Regional Citizen Security Committee, 2022
M. R. Keyvanpour, M. Javideh, and M. R. Ebrahimi, “Detecting and investigating crime by means of data mining: A general crime matching framework,” Procedia Computer Science, vol. 3, pp. 872-880, 2011. https://doi.org/10.1016/j.procs.2010.12.143.
M. Feng, J. Zheng, J. Ren, A. Hussain, X. Li, Y. Xi, and Q. Liu, “Big data analytics and mining for effective visualization and trends forecasting of crime data,” IEEE Access, vol. 7, pp. 106111-106123, 2019. https://doi.org/10.1109/ACCESS.2019.2930410.
P. R. Boppuru and K. Ramesha, “Spatio-temporal crime analysis using KDE and ARIMA models in the Indian context,” International Journal of Digital Crime and Forensics, vol. 12, no. 4, Art. 4, 2020. https://doi.org/10.4018/IJDCF.2020100101.
T. O. Adewuyi, P. A. Eneji, A. S. Baduku, and E. A. Olofin, “Spatio-temporal analysis of urban crime pattern and its implication for Abuja Municipal Area Council, Nigeria,” Indonesian Journal of Geography, vol. 49, no. 2, Art. 2, 2017. https://doi.org/10.22146/ijg.15341.
Y. Chen, J. Cai, and M. Deng, “Discovering spatio-temporal co-occurrence patterns of crimes with uncertain occurrence time,” ISPRS International Journal of Geo-Information, vol. 11, no. 8, Art. 454, 2022. https://doi.org/10.3390/ijgi11080454.
E. J. Medina Hernandez and P. N. Ortiz Alvarado, “What characterizes cell phone theft in Bogota? Multidimensional analysis of complaints to the National Police in the period 2015-2018,” Logos Ciencia & Tecnologia, vol. 13, no. 1, pp. 19-35, 2021. https://doi.org/10.22335/rlct.v13i1.1225.
U. M. Butt, S. Letchmunan, F. H. Hassan, and T. W. Koh, “Hybrid of deep learning and exponential smoothing for enhancing crime forecasting accuracy,” PLOS ONE, vol. 17, no. 9, 2022. https://doi.org/10.1371/journal.pone.0274172.
F. Dakalbab, M. A. Talib, O. A. Waraga, A. B. Nassif, S. Abbas, and Q. Nasir, “Artificial intelligence & crime prediction: A systematic literature review,” Social Sciences & Humanities Open, vol. 6, no. 1, Art. 100342, 2022. https://doi.org/10.1016/j.ssaho.2022.100342.
Z. Ke and Z. Jin, “Research of crime prediction technology based on mathematical model,” Open Cybernetics and Systemics Journal, vol. 8, no. 1, Art. 1, 2014. https://doi.org/10.2174/1874110X01408010860.
M. Saraiva, I. Matijošaitienė, S. Mishra, and A. Amante, “Crime prediction and monitoring in Porto, Portugal, using machine learning, spatial and text analytics,” ISPRS International Journal of Geo-Information, vol. 11, no. 7, Art. 7, 2022. https://doi.org/10.3390/ijgi11070400.
S. Sathyadevan, M. S. Devan, and S. S. Gangadharan, “Crime analysis and prediction using data mining,” Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies, 2014, pp. 406-412. https://doi.org/10.1109/CNSC.2014.6906719.
H. Hassani, X. Huang, E. S. Silva, and M. Ghodsi, “A review of data mining applications in crime,” Statistical Analysis and Data Mining, vol. 9, no. 3, pp. 139-154, 2016. https://doi.org/10.1002/sam.11312.
S. Khalid, S. A. Khan, and S. Q. Ifzal, “A fuzzy logic-based framework for mapping crime data on established sociological hypothesis for societal disorder identification and prevention,” IEEE Access, vol. 9, pp. 80197-80207, 2021. https://doi.org/10.1109/ACCESS.2021.3083542.
Open Data Peruvian Government, MPFN Crimes: Public Ministry Office of the Prosecutor General., 2022, [Online]. Available at: https://www.datosabiertos.gob.pe/dataset/mpfn-delitos (in Spanish)
Jason Brownlee, Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data, Machine Learning Mastery, 2020.
O. Llaha, “Crime analysis and prediction using machine learning,” Proceedings of the 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), 2020, pp. 496-501. https://doi.org/10.23919/MIPRO48935.2020.9245120.
A. Tymchyshyn, A. Semeniaka, S. Bondar, N. Akhtyrska, and O. Kostiuchenko, “The use of big data and data mining in the investigation of criminal offences,” Amazonia Investiga, vol. 11, no. 56, pp. 278-290, 2022. https://doi.org/10.34069/AI/2022.56.08.27.
Z. Wang and J. Wang, “Applications of machine learning in public security information and resource management,” Scientific Programming, vol. 2021, Article ID 4734187, pp. 1-9, 2021. https://doi.org/10.1155/2021/4734187.
T. Wang, C. Rudin, D. Wagner, and R. Sevieri, “Finding patterns with a rotten core: Data mining for crime series with cores,” Big Data, vol. 3, no. 1, pp. 3-21, 2015. https://doi.org/10.1089/big.2014.0021.
B. Arrigo and O. P. Shaw, “The de-realization of Black bodies in an era of mass digital surveillance: A techno-criminological critique,” Theoretical Criminology, vol. 27, issue 2, pp. 265-282, 2023. https://doi.org/10.1177/13624806221082318.
S. Changalasetty, W. Ghribi, A. Badawy, H. Bangali, A. Ahmed, L. Thota, R. Baireddy, and R. Pemula, “Using EM technique for juvenile crime zoning,” Proceedings of the 2021 IEEE International Conference on Soft Computing and Network Security (ISCON), 2021, pp. 1-6. https://doi.org/10.1109/ISCON52037.2021.9702353.
A. Kumar, A. Kumar, A. K. Bashir, M. Rashid, V. D. Ambeth Kumar, and R. Kharel, “Distance based pattern driven mining for outlier detection in high dimensional big dataset,” ACM Transactions on Management Information Systems, vol. 13, no. 1, Art. 1, 2022. https://doi.org/10.1145/3469891.
E. P. Patulin and R. E. Talingting, “Crime prediction using autoregressive integrated moving average (ARIMA) algorithm,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 3, Art. 3, 2019. https://doi.org/10.30534/ijatcse/2019/59832019.
G. Saltos and M. Cocea, “An exploration of crime prediction using data mining on open data,” International Journal of Information Technology and Decision Making, vol. 16, no. 5, Art. 5, 2017. https://doi.org/10.1142/S0219622017500250.
N. Tasnim, I. T. Imam, and M. M. A. Hashem, “A novel multi-module approach to predict crime based on multivariate spatio-temporal data using attention and sequential fusion model,” IEEE Access, vol. 10, pp. 48009-48030, 2022. https://doi.org/10.1109/ACCESS.2022.3171843.
S. Walczak, “Predicting crime and other uses of neural networks in police decision making,” Frontiers in Psychology, vol. 12, 2021. https://doi.org/10.3389/fpsyg.2021.587943.
P. Yerpude and V. Gudur, “Predictive modelling of crime dataset using data mining,” International Journal of Data Mining & Knowledge Management Process, vol. 7, no. 4, Art. 4, 2017. https://doi.org/10.5121/ijdkp.2017.7404.
J. Yin, “Crime prediction methods based on spatiotemporal data,” Discrete Dynamics in Nature and Society, vol. 2018, Art. 1601542, 2018. doi: 10.1155/2018/1601542.
A. Zeb, W. Rasheed, and A. Israr, “Spatiotemporal analysis of crime pattern and hotspot identification using GIS-based kernel density estimation,” Journal of Urban Management, vol. 10, no. 4, pp. 34-51, 2021. https://doi.org/10.1016/j.jum.2020.12.001.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.