Determination of the Best Feature Subset for Learner Migration in Limpopo

Frans Ramphele; Zenghui Wang; Adedayo Yusuff

doi:10.47839/ijc.23.2.3534

Authors

Frans Ramphele
Zenghui Wang
Adedayo Yusuff

DOI:

https://doi.org/10.47839/ijc.23.2.3534

Keywords:

Boruta, RPART, J48, Adaboost.M1, Mutual-Information, Spearman Correlation, Feature-Selection, Learner Migration

Abstract

The South African Education Management Information Systems (EMIS) hosts longitudinal data on school inventory, learners, and educators. One of the most prevailing and yet ignored phases in machine learning is Feature Selection (FS). Neglecting this phase can adversely impact the outcome of the machine-learning exercise. This study seeks to explore informative features from the EMIS system which can predict the possibility of learners prematurely transitioning to alternative learning spaces in the Limpopo education system. The Ravenstein migration theory was used to assemble the initial features which were then subjected to Boruta, RPART, Adaboost.M1, and J48 algorithms. The feature subsets generated by the FS algorithms were compared with filter-based statistical methods such as Spearman Correlation and Mutual Information to aid in the final selection of the best feature subset for the study. All machine learning FS methods performed well. Feature subset generated by Boruta was considered optimal due to relatively low importance score variance among the selected features compared to RPART, J48, and Adaboost.M1. It is believed that the low variance in the feature set will improve the model's stability and its ability to generalize with previously unseen data.

References

A. Sîrbu et al., “Human migration: The big data perspective,” Int J Data Sci Anal, vol. 11, no. 4, pp. 341–360, 2021, https://doi.org/10.1007/s41060-020-00213-5.

R. J. Lennox et al., “Conservation physiology of animal migration,” Conserv Physiol, vol. 4, no. 1, pp. 1–15, 2016, https://doi.org/10.1093/conphys/cov072.

F. Chiororo, “Leadership for learning in Zimbabwean secondary schools: Narratives of school heads (Publication No. 18905),” Doctoral Thesis, University of Kwa-Zulu Natal, Durban, 2020. Accessed: Aug. 25, 2023. [Online]. Available at: https://researchspace.ukzn.ac.za/handle/10413/18905

R. J. Botha and T. G. Neluvhola, “An investigation into factors that contribute to learner migration in South African schools,” The Journal of Social Sciences Research, vol. 6, no. 63, pp. 224–235, 2020, https://doi.org/10.32861/jssr.63.224.235.

I. C. Simelani, “Learner migration and its Impact on rural schools : A case study of two rural schools in Kwazulu- Natal (Publication No.12637),” Masters Thesis, University of Kwazulu-Natal, Durban, 2016. Accessed: Aug. 25, 2023. [Online]. Available at: https://researchspace.ukzn.ac.za/handle/10413/12637

H. Van der Merwe, “Migration patterns in rural schools in South Africa: Moving away from poor quality education,” Education as Change, vol. 15, no. 1, pp. 107–120, 2011, https://doi.org/10.1080/16823206.2011.576652.

H. Hanna, “Being a migrant learner in a South African primary school: Recognition and racialisation,” Child Geogr, pp. 1–16, 2022, https://doi.org/10.1080/14733285.2022.2084601.

G. Tati, “Student migration in South Africa,” Espace Popul Soc, vol. 2, no. 3, pp. 281–296, 2010, https://doi.org/10.4000/eps.4160.

A. K. Hallberg, “Student migration aspirations and mobility in the global knowledge society: The case of Ghana,” Journal of international Mobility, vol. 7, no. 1, pp. 23–43, 2020, https://doi.org/10.3917/jim.007.0023.

A. Algarni, “Data mining in Education,” (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 7, no. 6, pp. 58–77, 2016, https://doi.org/10.4018/978-1-5225-1877-8.ch005.

L. Juan, “Analysis of the Mental Health of Urban Migrant Children Based on Cloud Computing and Data Mining Algorithm Models,” Sci Program, vol. 2021, 2021, https://doi.org/10.1155/2021/7615227.

S. M. R. Islam, N. N. Moon, M. M. Islam, R. A. Hossain, S. Sharmin, and A. Mostafiz, “Prediction of migration outcome using machine learning,” Springer link, International Conference on Deep Learning, Artificial Intelligence and Robotics, vol. 441, pp. 169–182, 2022, https://doi.org/10.1007/978-3-030-98531-8_17.

C. Robinson and B. Dilkina, “A machine learning approach to modeling human migration,” Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, COMPASS 2018, no. 1, 2018, https://doi.org/10.1145/3209811.3209868.

S. Gregor, “A Theory of Theories in Information Systems,” Information Systems Foundations, pp. 1–18, 2002. https://doi.org/10.3127/ajis.v10i1.439.

D. B. Grigg, “E . G . Ravenstein and the ‘ laws of migration ,’” J Hist Geogr, vol. 3, no. 1, pp. 41–54, 1977. https://doi.org/10.1016/0305-7488(77)90143-8.

E. S. Lee, “A Theory of Migration,” Demography, vol. 3, no. 1, pp. 47–57, 1996, Accessed: Sep. 04, 2023. https://doi.org/10.2307/2060063.

R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J Big Data, vol. 7, no. 1, pp. 7–52, 2020, https://doi.org/10.1186/s40537-020-00327-4.

Z. M. Hira and D. F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data,” Adv Bioinformatics, vol. 2015, no. 1, pp. 1–13, 2015, https://doi.org/10.1155/2015/198363.

J. Miao and L. Niu, “A survey on feature selection,” Procedia Comput Sci, no. 91, pp. 919–926, 2016, https://doi.org/10.1016/j.procs.2016.07.111.

S. Velliangiri, S. Alagumuthukrishnan, and S. I. Thankumar Joseph, “A review of dimensionality reduction techniques for efficient computation,” Procedia Comput Sci, vol. 165, pp. 104–111, 2019, https://doi.org/10.1016/j.procs.2020.01.079.

M. B. Kursa, A. Jankowski, and W. R. Rudnicki, “Boruta – A system for feature selection,” Fundam Inform, vol. 101, no. 4, pp. 271–285, 2010, https://doi.org/10.3233/FI-2010-288.

P. Thereza, G. Lumacad, and R. Catrambone, “Predicting Student Performance Using Feature Selection Algorithms for Deep Learning Models,” Proceedings of the 2021 XVI Latin American Conference on Learning Technologies (LACLO), 2021, pp. 1–7. https://doi.org/10.1109/LACLO54177.2021.00009.

N. Saravanan and V. Gayathri, “Performance and classification evaluation of J48 algorithm and kendall’s based J48 algorithm (KNJ48),” International Journal of Computer Trends and Technology, vol. 59, no. 2, pp. 73–80, 2018, https://doi.org/10.14445/22312803/IJCTT-V59P112.

C. Strobl, J. Malley, and G. Tutz, “An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests,” Psychol Methods, vol. 14, no. 4, pp. 323–348, 2009, https://doi.org/10.1037/a0016973.

J. Smucny, I. Davidson, and C. S. Carter, “Comparing machine and deep learning-based algorithms for prediction of clinical improvement in psychosis with functional magnetic resonance imaging,” Hum Brain Mapp, vol. 42, no. 4, pp. 1197–1205, 2021, https://doi.org/10.1002/hbm.25286.

P. Pandey and R. Prabhakar, “An analysis of machine learning techniques (J48 & AdaBoost)-for classification,” Proceedings of the 2016 1st India International Conference on Information Processing (IICIP), 2016, pp. 1–6. https://doi.org/10.1109/IICIP.2016.7975394.

B. Nithya and V. Ilango, “Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction,” SN Appl Sci, vol. 1, no. 6, 2019. https://doi.org/10.1007/s42452-019-0645-7.

L. Breiman, “Random forests,” International Journal of Advanced Computer Science and Applications, no. 6, pp. 1–33, 2001, https://doi.org/10.14569/IJACSA.2016.070603.

F. Degenhardt, S. Seifert, and S. Szymczak, “Evaluation of variable selection methods for random forests and omics data sets,” Brief Bioinform, vol. 20, no. 2, pp. 492–503, 2017, https://doi.org/10.1093/bib/bbx124.

M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata Journal, vol. 20, no. 1, pp. 3–29, 2020, https://doi.org/10.1177/1536867X20909688.

R. Couronné, P. Probst, and A. L. Boulesteix, “Random forest versus logistic regression: A large-scale benchmark experiment,” BMC Bioinformatics 19, pp. 19–270, 2018, https://doi.org/10.1186/s12859-018-2264-5.

T. M. Therneau and E. J. Atkinson, “An introduction to recursive partitioning using the RPART routines,” Mayo foundation, pp. 1–60, 2022.

N. J. Tierney, F. A. Harden, M. J. Harden, and K. L. Mengersen, “Using decision trees to understand structure in missing data,” BMJ Open, vol. 5, no. 6, pp. 1–11, 2015, https://doi.org/10.1136/bmjopen-2014-007450.

R. Wang, “AdaBoost for feature selection, classification and its relation with SVM, A Review,” Phys Procedia, vol. 25, no. 2012, pp. 800–807, 2012, https://doi.org/10.1016/j.phpro.2012.03.160.

P. Schober and L. A. Schwarte, “Correlation coefficients: Appropriate use and interpretation,” Anesth Analg, vol. 126, no. 5, pp. 1763–1768, 2018, https://doi.org/10.1213/ANE.0000000000002864.

M. M. Mukaka, “Statistics corner: A guide to appropriate use of correlation coefficient in medical research.,” Malawi Med J, vol. 24, no. 3, pp. 69–71, Sep. 2012, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/23638278

S. Kumar and I. Chong, “Correlation analysis to identify the effective data in machine learning: Prediction of depressive disorder and emotion states,” Int J Environ Res Public Health, pp. 2–24, 2018, https://doi.org/10.3390/ijerph15122907.

S. Greenland et al., “Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,” Eur J Epidemiol, vol. 31, no. 4, pp. 337–350, 2016, https://doi.org/10.1007/s10654-016-0149-3.

L. Song, P. Langfelder, and S. Horvath, “Comparison of co-expression measures: Mutual information, correlation, and model based indices,” BMC Bioinformatics, pp. 13–328, 2012, https://doi.org/10.1186/1471-2105-13-328.

N. Barraza, S. Moro, M. Ferreyra, and A. de la Pena, “Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study,” J Inf Sci, vol. 45, no. 1, pp. 53–67, 2019, https://doi.org/10.1177/0165551518770967.

P. Laarne, M. A. Zaidan, and T. Nieminen, “ennemi: Non-linear correlation detection with mutual information,” SoftwareX, vol. 14, pp. 2–5, 2021, https://doi.org/10.1016/j.softx.2021.100686.

J. R. Vergara and P. A. Estévez, “A review of feature selection methods based on mutual information,” Neural Comput Appl, vol. 24, no. 1, pp. 175–186, 2014, https://doi.org/10.1007/s00521-013-1368-0.

S. Akanmu and S. Jaja, “Knowledge Discovery in Database: A knowledge management strategic approach,” Oct. 2012.

H. Patel, D. S. Rajput, G. T. Reddy, C. Iwendi, K. A. Bashir, and O. Jo, “A review on classification of imbalanced data for wireless sensor networks,” Int J Distrib Sens Netw, vol. 16, no. 4, pp. 1–15, 2020, https://doi.org/10.1177/1550147720916404.

F. Afghah, A. Razi, R. Soroushmehr, H. Ghanbari, and K. Najarian, “Game theoretic approach for systematic feature selection: Application in false alarm detection in intensive care units,” Entropy, vol. 20, pp. 1–16, 2018, https://doi.org/10.3390/e20030190.

M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for data classification evaluations,” International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 2, pp. 01–11, 2015, https://doi.org/10.5121/ijdkp.2015.5201.

S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci Rep, vol. 12, no. 1, pp. 1–9, 2022, https://doi.org/10.1038/s41598-022-09954-8.

M. L. McHugh, “Interrater reliability:the kappa statistic,” Biochemia Medica (Zagreb), pp. 278–282, 2012, https://doi.org/10.11613/BM.2012.031.

A. Venkatasubramaniam, J. Wolfson, N. Mitchell, T. Barnes, M. Jaka, and S. French, “Decision trees in epidemiological research,” Emerg Themes Epidemiol, vol. 14, no. 1, pp. 1–12, 2017, https://doi.org/10.1186/s12982-017-0064-4.

C. Porzelius, M. Schumacher, and H. Binder, “The benefit of data-based model complexity selection via prediction error curves in time-to-event data,” Comput Stat, vol. 26, pp. 293–302, 2011. https://doi.org/10.1007/s00180-011-0236-6.

G. Stiglic, S. Kocbek, I. Pernek, and P. Kokol, “Comprehensive decision tree models in bioinformatics,” PLoS One, vol. 7, no. 3, 2012, https://doi.org/10.1371/journal.pone.0033812.

M. B. Kursa and W. R. Rudnicki, “Feature selection with the boruta package,” J Stat Softw, vol. 36, no. 11, pp. 1–13, 2010, https://doi.org/10.18637/jss.v036.i11.

K. Kang and J. Michalak, “Enhanced version of AdaBoostM1 with J48 Tree learning method,” 1802.03522, Feb. 2018.

G. Eibl and K. P. Pfeiffer, “How to make adaboost.m1 work for weak base classifiers by changing only one line of the code,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2430, pp. 72–83, 2002, https://doi.org/10.1007/3-540-36755-1_7.

I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

International Journal of Computing

Determination of the Best Feature Subset for Learner Migration in Limpopo

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information