Early Detection of Breast Cancer Using Machine Learning and Ensemble Techniques


  • Disha H. Parekh
  • Vishal Dahiya




Breast cancer prediction, Ensemble Machine Learning algorithms, AdaBoost, XGBoost, F1-Score


Breast Cancer is found as the most dangerous and most commonly affecting diseases in the world by WHO. The severity of breast cancer and early diagnosis of it has gained the attention of researchers to save humankind from such devastating disease. Early prediction of breast cancer has geared up its journey after the introduction to machine learning supervised algorithms. In the paper, the use of various machine learning algorithms along with the ensemble algorithms is shown. The results obtained are highly accurate to help one correctly predict cancer. The paper aims at early diagnosis of breast cancer with a humble motto of saving patients suffering from the disease by allowing them to know whether the diagnosed tumor is cancerous or non-cancerous, being Malignant and Benign respectively. This paper would be useful and aiding for those who are novel researchers in prediction and diagnosis of breast cancer using machine learning.


Breast cancer. (March 262021,). WHO | World Health Organization. [Online]. Available at: https://www.who.int/news-room/fact-sheets/detail/breast-cancer

Breast Cancer Statistics | Facts & Figures | NBCC. (n.d.). National Breast Cancer Coalition. [Online]. Available at: https://www.stopbreastcancer.org/information-center/facts-figures/

P. Bellmann, P. Thiam, F. Schwenker, “Multi-classifier-systems: Architectures, algorithms and applications,” In Computational Intelligence for Pattern Recognition, Pedrycz, W., Chen, S.M., Eds., Springer International Publishing: Cham, Switzerland, 2018, pp. 83–113. https://doi.org/10.1007/978-3-319-89629-8_4.

T. Boongoen, N. Iam-On, “Cluster ensembles: A survey of approaches with recent extensions and applications,” Comput. Sci. Rev., no. 28, pp. 1–25, 2018. https://doi.org/10.1016/j.cosrev.2018.01.003.

L. Caplan, “Delay in breast cancer: implications for stage at diagnosis and survival,” Front Public Health, no. 2, article no. 87, 2014. https://doi.org/10.3389/fpubh.2014.00087. PMID: 25121080; PMCID: PMC4114209.

T. Chen, C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785.

D. A. Omondiagbe, S. Veeramani, A. S. Sidhu, “Machine learning classification techniques for breast cancer diagnosis,” IOP Conf. Series: Materials Science and Engineering, vol. 495, issue 1, pp. 1-19, 2019. https://doi.org/10.1088/1757-899X/495/1/012033.

C. DeSantis, F. Bray, J. Ferlay, J. Lortet-Tieulent, B. Anderson, & A. Jemal, (n.d.), “International variation in female breast cancer incidence and mortality rates,” Cancer Epidemiol Biomarkers Prev., vol. 24, issue 10, pp. 1495-1506, 2015. https://doi.org/10.1158/1055-9965.EPI-15-0535.

M. M. Dundar, S. Badve, G. Bilgin, V. Raykar, R. Jain, O. Sertel, M. N. Gurcan, “Computerized classification of intraductal breast lesions using histopathological images,” IEEE Trans. Biomed. Eng., vol. 58, pp. 1977–1984, 2011. https://doi.org/10.1109/TBME.2011.2110648.

S. Fauber, F. Schwenker, “Neural network ensembles in reinforcement learning,” Neural Process. Lett., vol. 41, pp. 55–69, 2015. https://doi.org/10.1007/s11063-013-9334-5.

A. Hiba, M. Hajar, M. Hassan Al, & N. Thomas, “Using machine learning algorithms for breast cancer risk prediction and diagnosis,” Procedia Computer Science, Elsevier, vol. 83, issue 1, pp.1064–1069, 2016. https://doi.org/10.1016/j.procs.2016.04.224.

J. Wu, & C. Hicks, (n.d.). “Breast cancer type classification using machine learning,” Journal of Personalized Medicine, vol. 11, issue 2, article no. 6), pp. 1-12, 2021. https://doi.org/10.3390/jpm11020061.

M. Kächele, P. Thiam, G. Palm, F. Schwenker, M. Schels, “Ensemble methods for continuous affect recognition: Multi-modality, temporality, and challenges,” Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, Brisbane Australia, 26 October 2015, pp. 9–16. https://doi.org/10.1145/2808196.2811637.

K. B. Kuchenbaecker, J. L. Hopper, D. R. Barnes et al, “Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers,” JAMA, vol. 317, issue 23, pp. 2402–2416, 2017.

Howlader N, Noone AM, Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics Review, 1975-2017, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2017/, based on November 2019 SEER data submission, posted to the SEER web site, April 2020.

S. I. Niwas, P. Palanisamy, W. Zhang, N. A. M. Isa, R. Chibbar, “Log-gabor wavelets based breast carcinoma classification using least square support vector machine,” Proceedings of the 2011 IEEE International Conference on Imaging Systems and Techniques, Batu Ferringhi, Malaysia, 17–18 May 2011, pp. 219-223, https://doi.org/10.1109/IST.2011.5962184.

O. N. Oyelade, A. A. Obiniyi, S. B. Junaidu, and S. A. Adewuyi, “ST-ONCODIAG: A semantic rule-base approach to diagnosing breast cancer base on Wisconsin datasets,” Informat. Med. Unlocked, vol. 10, pp. 117–125, 2018, https://doi.org/10.1016/j.imu.2017.12.008.

O. I. Obaid, M. Mazin Abed, M. K. Abd Ghani, S. A. Mostafa, & F. T. AL-Dhief, “Evaluating the performance of machine learning techniques in the classification of Wisconsin breast cancer,” International Journal of Engineering and Technology, vol. 7, issue 4, pp. 160-166, 2018.

D. H. Parekh, & V. Dahiya, “Predicting breast cancer using machine learning classifiers and enhancing the output by combining the predictions to generate optimal F1-score,” Biomedical and Biotechnology Research Journal (BBRJ), vol. 5, issue 3, pp. 331–334, 2021. https://doi.org/10.4103/bbrj.bbrj_131_21.

K. Polyak, “Heterogeneity in breast cancer,” The Journal of Clinical Investigation, vol. 121, issue 10, pp. 3786–3788, 2011. https://doi.org/10.1172/JCI60534. pmid:21965334.

F. Schwenker, F. Roli, J. Kittler, (Eds.), “Multiple classifier systems,” In Proceedings of the 12th International Workshop, Günzburg, Germany, 29 June–1 July 2015, Lecture Notes in Computer Science, Springer, Berlin, Germany, 2015, vol. 9132. https://doi.org/10.1007/978-3-319-20248-8.

W. Wu, H. Zhou, “Data-driven diagnosis of cervical cancer with support vector machine-based approaches,” IEEE Access, vol. 5, pp. 25189–25195, 2017. https://doi.org/10.1109/ACCESS.2017.2763984.

N. I. R. Yassin, S. Omran, E. M. F. El Houby, H. Allam, “Machine learning techniques for breast cancer computer aided diagnosis using different image modalities: A systematic review,” Computer Methods and Programs in Biomedicine, vol. 156, pp. 25–45, 2018. https://doi.org/10.1016/j.cmpb.2017.12.012. pmid:29428074

L. Juwara, N. Arora, M. Gornitsky, P. Saha-Chaudhuri, A. M. Velly, “Identifying predictive factors for neuropathic pain after breast cancer surgery using machine learning,” International Journal of Medical Informatics, vol. 141, 104170, 2020. https://doi.org/10.1016/j.ijmedinf.2020.104170. pmid:32544823.

T. Rymarczyk, E. Kozłowski, G. Kłosowski, & K. Niderla, “Logistic regression for machine learning in process tomography,” Sensors, vol. 19, issue 15, 3400, 2019. https://doi.org/10.3390/s19153400.

A. Alarabeyyat, & M. Alhanahnah, “Breast cancer detection using k-nearest neighbor machine learning algorithm,” Proceedings of the 2016 9th IEEE International Conference on Developments in eSystems Engineering (DeSE), 2016, pp. 35-39.

D. Berrar, “Bayes’ theorem and naive Bayes classifier,” Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 403-412, 2019. https://doi.org/10.1016/B978-0-12-809633-8.20473-1.

A. Blanco-Justicia, & J. Domingo-Ferrer, “Machine learning explainability through comprehensible decision trees,” Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Springer, Cham, 2019, pp. 15-26. https://doi.org/10.1007/978-3-030-29726-8_2.

P. H. Prastyo, I. G. Y. Paramartha, M. S. M. Pakpahan, & I. Ardiyanto, “Predicting breast cancer: A comparative analysis of machine learning algorithms,” Proceeding of the International Conference on Science and Engineering, 2020, vol. 3, pp. 455-459. https://doi.org/10.14421/icse.v3.545.




How to Cite

Parekh, D. H., & Dahiya, V. (2023). Early Detection of Breast Cancer Using Machine Learning and Ensemble Techniques. International Journal of Computing, 22(2), 231-237. https://doi.org/10.47839/ijc.22.2.3093