Ensemble-based Disease Outbreak Detection: Comparative Analysis of Health News Information Retrieval Techniques
DOI:
https://doi.org/10.47839/ijc.23.2.3547Keywords:
Ensemble learning, Epidemic surveillance, Outbreak detection, Text mining, Natural language processingAbstract
In India, Kerala is the first state to report a COVID-19 infection case, in January 2020, in a medical student, who returned from Wuhan, China. More recently, in June 2022, Kerala also reported India's first case of monkeypox disease. News websites often publish articles dedicated to reporting disease occurrences and live updates of outbreaks. Through the utilization of data gathered from online digital resources, early detection of outbreaks is possible, and this potential is already identified by the research community. As webpages give a comprehensive collection of reports covering a wide range of themes through hyperlinks, precisely categorizing news articles based on their headlines and retrieving health news is a tedious operation. Hence, this paper proposes a novel and efficient news retrieval technique grounded on an ML-based classification method with an ensemble learning approach to identify reports of disease occurrences from web pages by focusing specifically on the health context of Kerala and a comparison with baseline methods for information retrieval such as keyword-based, phrase-based, and content-based latent semantic analysis method is made.
References
Clark C. Freifeld, Kenneth D. Mandl, Ben Y. Reis, John S. Brownstein, “HealthMap: Global infectious diseases monitoring through automated classification and visualization of internet media reports,” Journal of the American Medical Informatics Association: JAMIA, vol. 15, issue 2, pp. 150-157, 2008. https://doi.org/10.1197/jamia.M2544
S. Jayesh, S. Sreedharan, “Analysing the Covid-19 cases in Kerala: A visual exploratory data analysis approach,” SN Comprehensive Clinical Medicine, vol. 2, pp. 1337-1348. https://doi.org/10.1007/s42399-020-00451-5
A. Jain, J. Mandowara, “Text classification by combining text classifiers to improve efficiency of classification,” International Journal of Computer Applications, vol. 6, no. 2, pp. 126-129, 2016.
E. Arsevska, S. Valentin, J. Rabatel, J. de Goër de Hervé, S. Falala, R. Lancelot, M. Roche, “Web monitoring of emerging animal infectious diseases integrated in the French animal health epidemic intelligence system,” PLOS one, pp. 1-25, 2018. https://doi.org/10.1371/journal.pone.0199960
M. Kim, K. Chae, S. Lee, H. J. Jang and S. Kim, “Automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches,” International Journal of Environmental Research and Public Health, vol. 17, no. 24, 2020. https://doi.org/10.3390/ijerph17249467
B. Jang, M. Kim, I. Kim and J. Kim, “Eagle eye: A worldwide disease-related topic extraction system using deep learning based ranking algorithm and internet-source data,” Sensors, vol. 21, no. 14, 2021. https://doi.org/10.3390/s21144665
R. Hidayat and S. Minati, “Comparative analysis of text mining classification algorithms for English and Indonesian Qur'an translation,” International Journal on Informatics for Development, vol. 8, no. 1, pp. 47-51, 2019. https://doi.org/10.14421/ijid.2019.08108
I. A. Kandhro , S. Z. Jumani, A. A. Lashari, S. S. Nangraj, Q. A. Lakhan, M. T. Baig and S. Guriro, “Classification of Sindhi headline news documents based on TF-IDF text analysis scheme,” Indian Journal of Science and Technology, vol. 12, no. 33, pp. 1-10, 2019. https://doi.org/10.17485/ijst/2019/v12i33/146130
M. Fayaz, A. Khan, J. Ur Rahman, A. Alharbi, M. Irfan Uddin, B. Alouffi, “Ensemble machine learning model for classification of spam product reviews,” Hindawi, vol. 2020, Article ID 8857570, pp. 1-10. https://doi.org/10.1155/2020/8857570
M. Rott and P. Cerva, “Investigation of latent semantic analysis for clustering of Czech news articles,” Proceedings of the 25th IEEE International Workshop on Database and Expert Systems Applications, 2014, pp. 223-227. https://doi.org/10.1109/DEXA.2014.54
M. I. Rana, S. Khalid, M. U. Akbar, “News classification based on their headlines: A review,” Proceedings of the IEEE 17th International Multi-Topic Conference, 2014, pp. 211-216. https://doi.org/10.1109/INMIC.2014.7097339
X. Luo, “Efficient English text classification using selected machine learning techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401-3409, 2021. https://doi.org/10.1016/j.aej.2021.02.009
U. Suleymanov, S. Rustamov, “Automated news classification using machine learning methods,” Proceedings of the IOP Conference Series: Materials Science and Engineering, 2018 IOP Conf. Ser.: Mater. Sci. Eng. 459 012006. https://doi.org/10.1088/1757-899X/459/1/012006
S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu and J. Gao, “Deep learning based text classification: A comprehensive review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1-40, 2021. https://doi.org/10.1145/3439726
R. Singh, S. A. Chun, V. Atluri, “Developong machine learning models to automate news classification,” Proceedings of the 21st Annual International Conference on Digital Goverment Research, 2020. https://doi.org/10.1145/3396956.3397001
T. Xia, Y. Chai, “An improvement to TF-IDF: Term distribution based term weight algorithm,” Journal of Software, vol. 6, no. 3, pp. 413-420, 2011. https://doi.org/10.4304/jsw.6.3.413-420
M. Nasir , M. Bakhtyar, J. Baber, S. Lakho, B. Ahmed and W. Noor, "BIOPAK flasher: Epidemic disease monitoring and detection in Pakistan using text mining,” arXiv:2106.06720, 2021. https://doi.org/10.48550/arXiv.2106.06720
M. A. Fauzi, A. Z. Arifin, S. C. Gosaria, “Indonesian news classification using Naive Bayes and two-phase feature selection model,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 8, no. 3, pp. 610 - 615, 2017. http://doi.org/10.11591/ijeecs.v8.i3.pp610-615
T. Jacob John, K. Rajappan, K. K. Arjunan, “Communicable diseases monitored by disease surveillance in Kottayam District, Kerala state,” Indian J Med, vol. 120, no. 2, pp. 86-93, 2004.
S. V. Gaikwad, A. Chaugule, P. Patil, “Text mining methods and techniques,” International Journal of Computer Applications, vol. 85, pp. 42-45, 2014. https://doi.org/10.5120/14937-3507
L. M. Abualigah, A. T. Khader, M. A. Al-Betar, O. A. Alomari, “Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering,” Expert System with Applications, vol. 84, pp. 24-36, 2017. https://doi.org/10.1016/j.eswa.2017.05.002
L.-M. Chen, B.-X. Xiu, Z.-Y. Ding, “Multiple weak supervision for short text classification,” Applied Intelligence, vol. 52, pp. 9101-9116, 2022. https://doi.org/10.1007/s10489-021-02958-3
C. Dreisbach, T. A. Koleck, P. E. Bourne and S. Bakken, “A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data,” International Journal of Medical Informatics, vol. 125, pp. 37-46, 2019. https://doi.org/10.1016/j.ijmedinf.2019.02.008
L. Yao, Z. Pengzhou and Z. Chi, “Research on news keyword extraction technology based on TF-IDF and TextRank,” Proceedings of the IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), 2019, pp. 452-455. https://doi.org/10.1109/ICIS46139.2019.8940293
D. Wang, H. Zhang, “Inverse-category-frequency based supervised term weighting schemes for text categorization,” Journal of Information Science and Engineering, vol. 29, no. 2, pp. 209-225, 2013.
M. B. Khan, “Urdu news classification using application of machine learning algorithms on news headline,” International Journal of Computer Science and Network Security, vol. 21, no. 2, pp. 229-237, 2021. https://doi.org/10.22937/IJCSNS.2021.21.2.27
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.