Using Latent Dirichlet Allocation and Text Mining Techniques for Understanding Medical Literature
DOI:
https://doi.org/10.47839/ijc.20.4.2437Keywords:
text mining, data analysis, medical domain, trending topics, word association rulesAbstract
Over the past few years, numerous studies and research articles have been published in the medical literature review domain. The topics covered by these researches included medical information retrieval, disease statistics, drug analysis, and many other fields and application domains. In this paper, we employ various text mining and data analysis techniques in an attempt to discover trending topics and topic concordance in the healthcare literature and data mining field. This analysis focuses on healthcare literature and bibliometric data and word association rules applied to 1945 research articles that had been published between the years 2006 and 2019. Our aim in this context is to assist saving time and effort required for manually summarizing large-scale amounts of information in such a broad and multi-disciplinary domain. To carry out this task, we employ topic modeling techniques through the utilization of Latent Dirichlet Allocation (LDA), in addition to various document and word embedding and clustering approaches. Findings reveal that since 2010 the interest in the healthcare big data analysis has increased significantly, as demonstrated by the five most commonly used topics in this domain.
References
H. Liao, M. Tang, L. Luo, C. Li, F. Chiclana, and X.-J. Zeng, “A bibliometric analysis and visualization of medical big data research,” Sustainability, vol. 10, no. 1, p. 166, 2018. https://doi.org/10.3390/su10010166.
G. K. Savova et al., “Use of natural language processing to extract clinical cancer phenotypes from electronic medical records,” Cancer Research, vol. 79, no. 21, pp. 5463-5470, 2019. https://doi.org/10.1158/0008-5472.CAN-19-0579
T. Hulsen et al., “From big data to precision medicine,” Frontiers in Medicine, vol. 6, p. 34, 2019. https://doi.org/10.3389/fmed.2019.00034.
A. Amado, P. Cortez, P. Rita, and S. Moro, “Research trends on big data in marketing: A text mining and topic modeling based literature analysis,” European Research on Management and Business Economics, vol. 24, no. 1, pp. 1-7, 2018. https://doi.org/10.1016/j.iedeen.2017.06.002.
S. Dang and P. H. Ahmad, “Text mining: Techniques and its application,” International Journal of Engineering & Technology Innovations, vol. 1, no. 4, pp. 866-2348, 2014.
X. Liu, P. V. Singh, and K. Srinivasan, “A structured analysis of unstructured big data by leveraging cloud computing,” Marketing Science, vol. 35, no. 3, pp. 363-388, 2016. https://doi.org/10.1287/mksc.2015.0972.
M. Maree, I. Noor, K. Rabayah, M. Belkhatir, and S. M. Alhashmi, “Head concepts selection for verbose medical queries expansion,” IEEE Access, vol. 8, pp. 93987-93999, 2020. https://doi.org/10.1109/ACCESS.2020.2987568.
E. W. Ngai and P. T. Y. Lee, “A review of the literature on applications of text mining in policy making,” PACIS 2016 Proceedings, 2016, 343. https://aisel.aisnet.org/pacis2016/343.
M. Allahyari et al., “A brief survey of text mining: Classification, clustering and extraction techniques,” arXiv preprint arXiv:1707.02919, 2017.
S. S. Tandel, A. Jamadar, and S. Dudugu, “A survey on text mining techniques,” Proceedings of the 2019 5th IEEE International Conference on Advanced Computing & Communication Systems (ICACCS), 2019, pp. 1022-1026. https://doi.org/10.1109/ICACCS.2019.8728547.
R. Feldman and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007. https://doi.org/10.1017/CBO9780511546914
R. L. Patibandla and N. Veeranjaneyulu, “Survey on clustering algorithms for unstructured data,” in Intelligent Engineering Informatics: Springer, 2018, pp. 421-429. https://doi.org/10.1007/978-981-10-7566-7_41.
M. Maree, “Semantics-based key concepts identification for documents indexing and retrieval on the web,” International Journal of Innovative Computing and Applications, vol. 12, no. 1, pp. 1-12, 2021. https://doi.org/10.1504/IJICA.2021.113608.
Y. Xue, Y. Zhou, and S. Dasgupta, “Mining competitive intelligence from social media: A case study of IBM," in PACIS 2018 Proceedings, 2018, p. 313.
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
W. He, S. Zha, and L. Li, “Social media competitive analysis and text mining: A case study in the pizza industry,” International Journal of Information Management, vol. 33, no. 3, pp. 464-472, 2013. https://doi.org/10.1016/j.ijinfomgt.2013.01.001.
A. Ramdhani, M. A. Ramdhani, and A. S. Amin, “Writing a literature review research paper: A step-by-step approach,” International Journal of Basic and Applied Science, vol. 3, no. 1, pp. 47-56, 2014.
M. R. DiMatteo, “Variations in patients' adherence to medical recommendations: a quantitative review of 50 years of research,” Medical care, pp. 200-209, 2004. https://doi.org/10.1097/01.mlr.0000114908.90348.f9.
P. Kokol, H. Blažun Vošner, and J. Završnik, “Application of bibliometrics in medicine: a historical bibliometrics analysis,” Health Information & Libraries Journal, vol. 38, no. 2, pp. 125-138, 2021. https://doi.org/10.1111/hir.12295.
S. E. Campbell, D. G. Seymour, and W. R. Primrose, “A systematic literature review of factors affecting outcome in older medical patients admitted to hospital,” Age and Ageing, vol. 33, no. 2, pp. 110-115, 2004. https://doi.org/10.1093/ageing/afh036.
C. Fogg, P. Griffiths, P. Meredith, and J. Bridges, “Hospital outcomes of older people with cognitive impairment: an integrative review,” International Journal of Geriatric Psychiatry, vol. 33, no. 9, pp. 1177-1197, 2018. https://doi.org/10.1002/gps.4919.
F.-M. Hsu, C.-M. Lin, and C.-T. Fang, “The trend and intellectual structure of digital archives research,” in PACIS 2015 Proceedings, 2015, p. 128.
M. Pejić Bach, Ž. Krstić, S. Seljan, and L. Turulja, “Text mining for big data analysis in financial sector: A literature review,” Sustainability, vol. 11, no. 5, p. 1277, 2019. https://doi.org/10.3390/su11051277.
S. Moro, P. Cortez, and P. Rita, “Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation,” Expert Systems with Applications, vol. 42, no. 3, pp. 1314-1324, 2015. https://doi.org/10.1016/j.eswa.2014.09.024.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” the Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
A. Bhat, “K-medoids clustering using partitioning around medoids for performing face recognition,” International Journal of Soft Computing, Mathematics and Control, vol. 3, no. 3, pp. 1-12, 2014. https://doi.org/10.14810/ijscmc.2014.3301.
H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimedia Tools and Applications, vol. 78, no. 11, pp. 15169-15211, 2019. https://doi.org/10.1007/s11042-018-6894-4.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.