An Optimized Framework Based on Data Exploration and Dynamic Ensemble-Based Models for Breast Cancer Prediction

Ayman Alsabry; Hamzah Ali Abdulrahman Qasem; Malek Algabri; Amin Mohamed Ahsan; Mogeeb A. A. Mosleh; F. E. Hanash

doi:10.47839/ijc.23.2.3544

Authors

Ayman Alsabry
Hamzah Ali Abdulrahman Qasem
Malek Algabri
Amin Mohamed Ahsan
Mogeeb A. A. Mosleh
F. E. Hanash

DOI:

https://doi.org/10.47839/ijc.23.2.3544

Keywords:

Data Exploration, Ensemble Classifier, Hyperparameters Tuning, Machine Learning

Abstract

Breast cancer (BC) is a major global health concern. Detecting BC at an early stage gives more treatment options and can help avoid more aggressive treatments. The use of machine learning (ML) in BC prediction offers significant potential for improving the accuracy and speed of diagnosis, personalizing treatment, and identifying high-risk patients. However, there are significant challenges associated with the use of ML, including the need for high-quality data and more flexible models with optimal parameters to achieve high efficiency. In this paper, we propose an optimized framework based on multi-stage data exploration. This framework is designed to provide a comprehensive approach to data exploration, ensuring that the data is well-prepared for ML. In addition, the framework includes dynamic ensemble-based classifiers, which combine multiple independent classifiers to improve accuracy and mitigate the risk of overfitting in conjunction with the cross-validation techniques. These classifiers are optimized using Bayesian hyperparameter tuning, which involves selecting the optimal values for the various hyperparameters of the model. This approach can significantly improve the prediction accuracy of the resulting model. The study evaluates the framework using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) dataset and compares our results with other state-of-the-art models. The experimental results show that the best result is 100% for accuracy and recall with hyperparameters of (Ensemble Method = AdaBoost, Number of learners = 322, learning rate = 0.9350, and the Maximum number of splits = 1). The highest performance has been achieved with the proposed framework compared with the other models in terms of accuracy (mean = 99.35%, best = 100%, worst = 98.7%, and Standard Deviation = 0.325). The framework can potentially improve the accuracy and efficiency of BC prediction, ultimately leading to better outcomes for patients.

References

H. Saleh, H. Alyami, and W. Alosaimi, “Predicting breast cancer based on optimized deep learning approach,” Computational Intelligence and Neuroscience, vol. 2022, article ID 1820777, 2022. https://doi.org/10.1155/2022/1820777.

A. Bhardwaj, H. Bhardwaj, A. Sakalle, Z. Uddin, M. Sakalle, and W. Ibrahim, “Tree-based and machine learning algorithm analysis for breast cancer classification,” Computational Intelligence and Neuroscience, vol. 2022, article ID 6715406, 2022. https://doi.org/10.1155/2022/6715406.

A. N. Hurson, T. U. Ahearn, R. Keeman, M. Abubakar, A. Y. Jung, P. M. Kapoor, et al., “Systematic literature review of risk factor associations with breast cancer subtypes in women of African, Asian, Hispanic, and European descents,” Cancer Research, vol. 82, pp. 3670-3670, 2022. https://doi.org/10.1158/1538-7445.AM2022-3670.

W. H. W. Mamat, N. Jarrett, and S. Lund, “Diagnostic interval: experiences among women with breast cancer in Malaysia,” Open Access Macedonian Journal of Medical Sciences, vol. 9, pp. 54-59, 2022. https://doi.org/10.3889/oamjms.2021.7833.

H.-J. Wu and P.-Y. Chu, “Current and developing liquid biopsy techniques for breast cancer,” Cancers, vol. 14, p. 2052, 2022. https://doi.org/10.3390/cancers14092052.

Y. S. Younis, A. H. Ali, O. K. Alhafidhb, W. B. Yahia, M. B. Alazzam, A. A. Hamad, et al., “Early diagnosis of breast cancer using image processing techniques,” Journal of Nanomaterials, vol. 2022, article ID 2641239, 2022. https://doi.org/10.1155/2022/2641239.

K. H. Lau, A. M. Tan, and Y. Shi, “New and emerging targeted therapies for advanced breast cancer,” International Journal of Molecular Sciences, vol. 23, p. 2288, 2022. https://doi.org/10.3390/ijms23042288.

E. Cava, P. Marzullo, D. Farinelli, A. Gennari, C. Saggia, S. Riso, et al., “Breast cancer diet “BCD”: A review of healthy dietary patterns to prevent breast cancer recurrence and reduce mortality,” Nutrients, vol. 14, p. 476, 2022. https://doi.org/10.3390/nu14030476.

D. A. Zebari, D. A. Ibrahim, D. Q. Zeebaree, H. Haron, M. S. Salih, R. Damaševičius, et al., “Systematic review of computing approaches for breast cancer detection based computer aided diagnosis using mammogram images,” Applied Artificial Intelligence, vol. 35, pp. 2157-2203, 2021. https://doi.org/10.1080/08839514.2021.2001177.

M. Madani, M. M. Behzadi, and S. Nabavi, “The role of deep learning in advancing breast cancer detection using different imaging modalities: A systematic review,” Cancers, vol. 14, p. 5334, 2022. https://doi.org/10.3390/cancers14215334.

E. Fina, “Signatures of breast cancer progression in the blood: What could be learned from circulating tumor cell transcriptomes,” Cancers, vol. 14, p. 5668, 2022. https://doi.org/10.3390/cancers14225668.

F. Cardoso, S. Kyriakides, S. Ohno, F. Penault-Llorca, P. Poortmans, I. Rubio, et al., “Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up,” Annals of Oncology, vol. 30, pp. 1194-1220, 2019. https://doi.org/10.1093/annonc/mdz173.

M. A. Naji, S. El Filali, K. Aarika, E. H. Benlahmar, R. A. Abdelouhahid, and O. Debauche, “Machine learning algorithms for breast cancer prediction and diagnosis,” Procedia Computer Science, vol. 191, pp. 487-492, 2021. https://doi.org/10.1016/j.procs.2021.07.062.

R. Krithiga and P. Geetha, “Breast cancer detection, segmentation and classification on histopathology images analysis: a systematic review,” Archives of Computational Methods in Engineering, vol. 28, pp. 2607-2619, 2021. https://doi.org/10.1007/s11831-020-09470-w.

S. Bacha and O. Taouali, “A novel machine learning approach for breast cancer diagnosis,” Measurement, vol. 187, p. 110233, 2022. https://doi.org/10.1016/j.measurement.2021.110233.

M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease prediction by machine learning over big data from healthcare communities,” IEEE Access, vol. 5, pp. 8869-8879, 2017. https://doi.org/10.1109/ACCESS.2017.2694446.

M. M. Beno, I. R. Valarmathi, S. M. Swamy, and B. Rajakumar, “Threshold prediction for segmenting tumour from brain MRI scans,” International Journal of Imaging Systems and Technology, vol. 24, pp. 129-137, 2014. https://doi.org/10.1002/ima.22087.

S. Sengupta and A. K. Das, “Particle swarm optimization based incremental classifier design for rice disease prediction,” Computers and Electronics in Agriculture, vol. 140, pp. 443-451, 2017. https://doi.org/10.1016/j.compag.2017.06.024.

A. Parmar, R. Katariya, and V. Patel, “A review on random forest: An ensemble classifier,” Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI’2018), 2019, vol. 26, pp. 758-763. https://doi.org/10.1007/978-3-030-03146-6_86.

N. Liu and H. Wang, "Ensemble based extreme learning machine,” IEEE Signal Processing Letters, vol. 17, pp. 754-757, 2010. https://doi.org/10.1109/LSP.2010.2053356.

X. Ying, “An overview of overfitting and its solutions,” Journal of Physics: Conference Series, vol. 1168, p. 022022, 2019. https://doi.org/10.1088/1742-6596/1168/2/022022.

B. Ghojogh and M. Crowley, “The theory behind overfitting, cross validation, regularization, bagging, and boosting: tutorial,” arXiv preprint arXiv:1905.12787, 2019.

L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295-316, 2020. https://doi.org/10.1016/j.neucom.2020.07.061.

O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” Operations Research, vol. 43, pp. 570-577, 1995. https://doi.org/10.1287/opre.43.4.570.

S. Aalaei, H. Shahraki, A. Rowhanimanesh, and S. Eslami, “Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets,” Iranian Journal of Basic Medical Sciences, vol. 19, p. 476, 2016.

S. Jeyasingh and M. Veluchamy, “Modified bat algorithm for feature selection with the wisconsin diagnosis breast cancer (WDBC) dataset,” Asian Pacific Journal of Cancer Prevention: APJCP, vol. 18, p. 1257, 2017.

W. Yue, Z. Wang, H. Chen, A. Payne, and X. Liu, “Machine learning with applications in breast cancer diagnosis and prognosis,” Designs, vol. 2, p. 13, 2018. https://doi.org/10.3390/designs2020013.

T. Latchoumi, T. Ezhilarasi, and K. Balamurugan, “Bio-inspired weighed quantum particle swarm optimization and smooth support vector machine ensembles for identification of abnormalities in medical data,” SN Applied Sciences, vol. 1, pp. 1-10, 2019. https://doi.org/10.1007/s42452-019-1179-8.

M. I. H. Showrov, M. T. Islam, M. D. Hossain, and M. S. Ahmed, “Performance comparison of three classifiers for the classification of breast cancer dataset,” Proceedings of the 2019 4th International Conference on Electrical Information and Communication Technology (EICT’2019), 2019, pp. 1-5. https://doi.org/10.1109/EICT48899.2019.9068816.

P. D. Sheth, S. T. Patil, and M. L. Dhore, “Evolutionary computing for clinical dataset classification using a novel feature selection algorithm,” Journal of King Saud University – Computer and Information Sciences, vol. 8, pp. 5075-5082, 2020. https://doi.org/10.1016/j.jksuci.2020.12.012.

G. Chugh, S. Kumar, and N. Singh, “Survey on machine learning and deep learning applications in breast cancer diagnosis,” Cognitive Computation, vol. 13, pp. 1451-1470, 2021. https://doi.org/10.1007/s12559-020-09813-6.

S. Ara, A. Das, and A. Dey, “Malignant and benign breast cancer classification using machine learning algorithms,” Proceedings of the 2021 International Conference on Artificial Intelligence (ICAI), 2021, pp. 97-101, 2021. https://doi.org/10.1109/ICAI52203.2021.9445249.

V. N. Gopal, F. Al-Turjman, R. Kumar, L. Anand, and M. Rajesh, “Feature selection and classification in breast cancer prediction using IoT and machine learning,” Measurement, vol. 178, p. 109442, 2021. https://doi.org/10.1016/j.measurement.2021.109442.

T. A. Assegie, R. L. Tulasi, and N. K. Kumar, “Breast cancer prediction model with decision tree and adaptive boosting,” IAES International Journal of Artificial Intelligence, vol. 10, p. 184, 2021. https://doi.org/10.11591/ijai.v10.i1.pp184-190.

N. Hemavathi, R. Sriranjani, P. Arulmozhi, M. Meenalochani, and R. Deepak, “Deep learning based early prediction scheme for breast cancer,” Wireless Personal Communications, vol. 122, pp. 931-946, 2022. https://doi.org/10.1007/s11277-021-08933-y.

M. Monirujjaman Khan, S. Islam, S. Sarkar, F. I. Ayaz, M. K. Ananda, T. Tazin, et al., “Machine learning based comparative analysis for breast cancer prediction,” Journal of Healthcare Engineering, vol. 2022, article ID 4365855, 2022. https://doi.org/10.1155/2022/4365855.

M. Samieinasab, S. A. Torabzadeh, A. Behnam, A. Aghsami, and F. Jolai, “Meta-health stack: A new approach for breast cancer prediction,” Healthcare Analytics, vol. 2, p. 100010, 2022. https://doi.org/10.1016/j.health.2021.100010.

A. Rasool, C. Bunterngchit, L. Tiejian, M. R. Islam, Q. Qu, and Q. Jiang, “Improved machine learning-based predictive models for breast cancer diagnosis,” International Journal of Environmental Research and Public Health, vol. 19, p. 3211, 2022. https://doi.org/10.3390/ijerph19063211.

V. E. Christo, H. K. Nehemiah, J. Brighty, and A. Kannan, “Feature selection and instance selection from clinical datasets using co-operative co-evolution and classification using random forest,” IETE Journal of Research, vol. 68, pp. 2508-2521, 2022. https://doi.org/10.1080/03772063.2020.1713917.

R. O. Ogundokun, S. Misra, M. Douglas, R. Damaševičius, and R. Maskeliūnas, “Medical internet-of-things based breast cancer diagnosis using hyperparameter-optimized neural networks,” Future Internet, vol. 14, p. 153, 2022. https://doi.org/10.3390/fi14050153.

V. Lahoura, H. Singh, A. Aggarwal, B. Sharma, M. A. Mohammed, R. Damaševičius, et al., “Cloud computing-based framework for breast cancer diagnosis using extreme learning machine,” Diagnostics, vol. 11, p. 241, 2021. https://doi.org/10.3390/diagnostics11020241.

K. Papel, “Breast Cancer Wisconsin (Diagnostic) Dataset,” [Online]. Available at: https://www.kaggle.com/code/karan1210/breast-cancer/data, 1995.

International Journal of Computing

An Optimized Framework Based on Data Exploration and Dynamic Ensemble-Based Models for Breast Cancer Prediction

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information