Optimizing Multi-Layer Perceptron for Predicting Learner Migration Patterns: A Methodological Exploration

Authors

  • Frans Ramphele
  • Zenghui Wang
  • Adedayo Yusuff

DOI:

https://doi.org/10.47839/ijc.23.4.3753

Keywords:

Multi-Layer Perceptron (MLP), Cultural algorithm (CA), Social Ski-driver (SSD), Multi-Objective Optimization, Learner Migration, Exploration-Exploitation framework

Abstract

This research presents a novel approach to multilayer perceptron's (MLP) hyper-parameter optimization in solving learner migration problems in Limpopo, South Africa. While acknowledging the presence of various hyper-parameter optimization techniques, their applicability, strengths, and limitations differ. Our approach utilizes meta-heuristics, offering an efficient and adaptable method for complex search spaces and global exploration of optimal solution candidates. The social ski-driver (SSD) algorithm -originally designed for optimizing support vector machines (SVMs)- and cultural algorithm (CA) were utilized to determine the optimal hyper-parameter configuration for the MLP. The MLP was intended to predict the likelihood of a learner migrating, the reasons for migration, and the distance the learner will migrate to the next school. The two optimizers were run on sample data split into five folds, producing ten hyper-parameter sets (five pairs). The MLP was then built with each parameter set and subsequently run on a new dataset partitioned into five folds. The model performance results were compared using evaluation metrics such as the f1 score, variance analysis, and the Wilcoxon Signed-Rank test. There were no significant performance differences between the SSD and CA hyper-parameters, demonstrating the effectiveness of the SSD algorithm in optimizing neural networks. The CA-derived parameter set was selected due to its lowest variances across the datasets and its strong alignment with the convergence principles of the Berger-Tal multidisciplinary framework on the exploration-exploitation trade-off, providing a solid foundation for our findings.

References

A. Mountz and S. Mohan, “Human migration in a new era of mobility: intersectional and transnational approaches,” Global Social Challenges Journal, vol. 1, no. 1, pp. 59–75, 2022, https://doi.org/10.1332/RFXW5601.

A. Algarni, “Data mining in Education,” (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 7, no. 6, pp. 58–77, 2016, https://doi.org/10.4018/978-1-5225-1877-8.ch005.

D. Hrehová and K. Teplická, “The informational communication technology is a tool of global education,” in Globalization and its Socio-Economic Consequences, Slovakia: Sciences, EDP, 2020, pp. 1–6. https://doi.org/10.1051/shsconf/20207406008.

A. Maheri, S. Jalili, Y. Hosseinzadeh, R. Khani, and M. Miryahyavi, “A comprehensive survey on cultural algorithms,” Swarm Evol Comput, vol. 62, pp. 4–63, 2021, https://doi.org/10.1016/j.swevo.2021.100846.

Y. Abdi and Y. Seyfari, “Search manager: A framework for hybridizing different search strategies,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 5, pp. 525–540, 2018, https://doi.org/10.14569/IJACSA.2018.090568.

P. Pramanik, S. Mukhopadhyay, S. Mirjalili, and R. Sarkar, “Deep feature selection using local search embedded social ski-driver optimization algorithm for breast cancer detection in mammograms,” Neural Comput Appl, vol. 35, no. 7, pp. 5479–5499, Mar. 2023, https://doi.org/10.1007/s00521-022-07895-x.

Z. Shao, H. Sun, X. Wang, and Z. Sun, “An optimized mining algorithm for analyzing students’ learning degree based on dynamic data,” IEEE Access, vol. 8, pp. 1–16, 2020, https://doi.org/10.1109/ACCESS.2020.3001749.

A. Tharwat and T. Gabel, “Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm,” Neural Comput Appl, vol. 32, no. 11, pp. 6925–6938, Jun. 2020, https://doi.org/10.1007/s00521-019-04159-z.

B. Chatterjee, T. Bhattacharyya, K. K. Ghosh, P. K. Singh, Z. W. Geem, and R. Sarkar, “Late Acceptance Hill Climbing Based Social Ski Driver Algorithm for Feature Selection,” IEEE Access, vol. 8, pp. 75393–75408, 2020, https://doi.org/10.1109/ACCESS.2020.2988157.

S. Ahmad, M. A. El-Affendi, M. S. Anwar, and R. Iqbal, “Potential Future Directions in Optimization of Students’ Performance Prediction System,” Comput Intell Neurosci, vol. 2022, 2022, https://doi.org/10.1155/2022/6864955.

L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, Nov. 2020, https://doi.org/10.1016/j.neucom.2020.07.061.

Y. Shi, H. Qi, X. Qi, and X. Mu, “An efficient hyper-parameter optimization method for supervised learning,” Appl Soft Comput, vol. 126, p. 109266, Sep. 2022, https://doi.org/10.1016/j.asoc.2022.109266.

G. Moore, C. Bergeron, and K. P. Bennett, “Model selection for primal SVM,” Mach Learn, vol. 85, no. 1–2, pp. 175–208, Oct. 2011, https://doi.org/10.1007/s10994-011-5246-7.

Lei Xu, “Ying-Yang Learning [from:The Handbook f Brain Theory and Neural Networks],” 2002.

D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradient-based Hyperparameter Optimization through Reversible Learning,” Journal of Machine Learning Research, vol. 37, pp. 2113–2122, 2015.

Y. Bao, Z. Hu, and T. Xiong, “A PSO and pattern search based memetic algorithm for SVMs parameters optimization,” Neurocomputing, vol. 117, pp. 98–106, Oct. 2013, https://doi.org/10.1016/j.neucom.2013.01.027.

G. A. Galanti, “An introduction to cultural differences,” Western Journal of Medicine, vol. 172, no. 5, pp. 335–336, 2000, https://doi.org/10.1136/ewjm.172.5.335.

S. Jalili and Y. Hosseinzadeh, “A cultural algorithm for optimal design of truss structures,” Latin American Journal of Solids and Structures, vol. 12, no. 9, pp. 1721–1747, 2015, https://doi.org/10.1590/1679-78251547.

R. G. Reynolds, “An introduction to cultural algorithms,” in An Introduction to Cultural Algorithms, ResearchGate, 2014, p. 10. Accessed: Sep. 04, 2023. [Online]. Available at: https://www.researchgate.net/publication/201976967

Y. Khorrami, D. Fathi, and R. Rumpf, “Fast optimal design of optical components using the cultural algorithm,” Opt Express, vol. 28, no. 11, p. 15954, 2020, https://doi.org/10.1364/OE.391354.

Y. Xuesong, L. Wei, C. Wei, L. Wenjing, C. Zhang, and L. Hanmin, “Cultural algorithm for engineering design problems,” Int J Comp Sci, vol. 9, no. 6, pp. 53–61, 2012.

S. Upadhyayula, “Dominance in multi-population cultural algorithm,” Thesis, University of Windsor, 2015. https://doi.org/10.1109/ICMLA.2015.102.

A. Tharwat, A. Darwish, and A. E. Hassanien, “Rough sets and social ski-driver optimization for drug toxicity analysis,” Comput Methods Programs Biomed, vol. 197, p. 105702, Dec. 2020, https://doi.org/10.1016/j.cmpb.2020.105702.

H. Su-Hyun, K. Ko Woon, K. SangYun, and Y. Young Chul, “Artificial Neural Network: Understanding the Basic Concepts without Mathematics,” Dement Neurocogn Disord, vol. 17, no. 3, p. 83, 2018, https://doi.org/10.12779/dnd.2018.17.3.83.

K. Shiruru, “An introduction to artificial neural network,” International Journal of Advance Research And Innovative Ideas In Education, vol. 1, no. 5, pp. 27–30, 2016, [Online]. Available at: https://www.researchgate.net/publication/319903816

R. Keim, “How to Create a Multilayer Perceptron Neural Network in Python,” Technical Article. Accessed: Aug. 04, 2024. [Online]. Available at: https://www.allaboutcircuits.com/technical-articles/

I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,” Foreign Affairs, vol. 91, no. 5, pp. 1689–1699, 2016.

L. Vanneschi and M. Castelli, Multilayer perceptrons, vol. 1–3. 2018. https://doi.org/10.1016/B978-0-12-809633-8.20339-7.

V. E. Balas, N. E. Mastorakis, M.-C. Popescu, and V. E. Balas, “Multilayer perceptron and neural networks HISTORICAL PHOTOGRAPHS View project BioCell-NanoART = Novel Bio-inspired Cellular Nano-Architectures-For Digital Integrated Circuits View project Multilayer Perceptron and Neural Networks,” 2009. [Online]. Available at: https://www.researchgate.net/publication/228340819

Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, https://doi.org/10.1038/nature14539.

D. Rumelhart, G. Hinton, and R. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986, https://doi.org/10.1038/323533a0.

H. Wang, R. Czerminski, and A. C. Jamieson, “Neural Networks and Deep Learning,” The Machine Age of Customer Insight, pp. 91–101, 2021, https://doi.org/10.1108/978-1-83909-694-520211010.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical Learning, vol. 26, no. 4. 1967.

E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strength pareto evolutionary algorithm,” ETH Zurich Research Collection, pp. 95–100, 2001, doi: 10.1.1.28.7571.

Wikipedia Contributors., “Multi Objective Optimization,” Wikipedia. Wikepidia, 2023.

C. A. C. Coello, G. B. Lamont, and D. A. Van Veldhuizen, Evolutionary algorithms for solving multi-objective problems. 2007. https://doi.org/10.1007/978-0-387-36797-2.

J. Reddy and N. Kumar, “Multi-Objective optimization using evolutionary algorithms,” Water Resources Management, vol. 20, no. 6, pp. 861–878, 2006. https://doi.org/10.1007/s11269-005-9011-1.

O. Berger-Tal, J. Nathan, E. Meron, and D. Saltz, “The exploration-exploitation dilemma: A multidisciplinary framework,” PLoS One, vol. 9, no. 4, p. 95693, 2014, https://doi.org/10.1371/journal.pone.0095693.t001.

Jie Chen, Bin Xin, Zhihong Peng, Lihua Dou, and Juan Zhang, “Optimal Contraction Theorem for Exploration–Exploitation Tradeoff in Search and Optimization,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 39, no. 3, pp. 680–691, May 2009, https://doi.org/10.1109/TSMCA.2009.2012436.

S. Akanmu and S. Jaja, “Knowledge Discovery in Database: A knowledge management strategic approach,” Oct. 2012.

F. Ramphele, Z. Wang, and A. Yusuff, “Determination of the Best Feature Subset for Learner Migration in Limpopo,” International Journal of Computing, vol. 23, no. 2, pp. 165–176, Jul. 2024, https://doi.org/10.47839/ijc.23.2.3534.

D. B. Grigg, “E . G . Ravenstein and the ‘ laws of migration ,’” J Hist Geogr, vol. 3, no. 1, pp. 41–54, 1977. https://doi.org/10.1016/0305-7488(77)90143-8.

E. S. Lee, “A Theory of Migration,” Demography, vol. 3, no. 1, pp. 47–57, 1996, Accessed: Sep. 04, 2023. https://doi.org/10.2307/2060063

E. G. Raveinstein, “The Lawas of Migration,” Journal of Statistical Society of London, vol. 48, no. 2, pp. 167–235, 1885. https://doi.org/10.2307/2979181.

R. J. Botha and T. G. Neluvhola, “An investigation into factors that contribute to learner migration in South African schools,” The Journal of Social Sciences Research, vol. 6, no. 63, pp. 224–235, 2020, https://doi.org/10.32861/jssr.63.224.235.

S. Zhang, C. Zhang, and Q. Yang, “Data preparation for data mining,” Applied Artificial Intelligence, vol. 17, no. 5–6, pp. 375–381, May 2003, https://doi.org/10.1080/713827180.

A. Kochański, “Data preparation,” Computer Methods In Materials Science, vol. 10, no. 1, 2010, Accessed: Sep. 04, 2023. [Online]. Available at: https://www.researchgate.net/publication/299350639

I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

I. C. Simelani, “Learner migration and its Impact on rural schools : A case study of two rural schools in Kwazulu- Natal (Publication No.12637),” Masters Thesis, University of Kwazulu-Natal, Durban, 2016. Accessed: Aug. 25, 2023. [Online]. Available at: https://researchspace.ukzn.ac.za/handle/10413/12637

H. Van der Merwe, “Migration patterns in rural schools in South Africa: Moving away from poor quality education,” Education as Change, vol. 15, no. 1, pp. 107–120, 2011, https://doi.org/10.1080/16823206.2011.576652.

S. Gregor, “A Theory of Theories in Information Systems,” Information Systems Foundations, pp. 1–18, 2002. https://doi.org/10.3127/ajis.v10i1.439.

H. Patel, D. S. Rajput, G. T. Reddy, C. Iwendi, K. A. Bashir, and O. Jo, “A review on classification of imbalanced data for wireless sensor networks,” Int J Distrib Sens Netw, vol. 16, no. 4, pp. 1–15, 2020, https://doi.org/10.1177/1550147720916404.

P. Thereza P. P., G. Lumacad, and R. Catrambone, “Predicting Student Performance Using Feature Selection Algorithms for Deep Learning Models,” in 2021 XVI Latin American Conference on Learning Technologies (LACLO), IEEE, Oct. 2021, pp. 1–7. https://doi.org/10.1109/LACLO54177.2021.00009.

F. Afghah, A. Razi, R. Soroushmehr, H. Ghanbari, and K. Najarian, “Game theoretic approach for systematic feature selection: Application in false alarm detection in intensive care units,” Entropy, vol. 20, pp. 1–16, 2018, https://doi.org/10.3390/e20030190.

X.-S. Yang, “Metaheuristic Optimization: Algorithm Analysis and Open Problems,” Dec. 2012. https://doi.org/10.1007/978-3-642-20662-7_2.

V. Osuna-Enciso, E. Cuevas, and B. Morales Castañeda, “A diversity metric for population-based metaheuristic algorithms,” Inf Sci (N Y), vol. 586, pp. 192–208, Mar. 2022, https://doi.org/10.1016/j.ins.2021.11.073.

T. Gabor, T. Phan, and C. Linnhoff-Popien, “Productive fitness in diversity-aware evolutionary algorithms,” Nat Comput, vol. 20, no. 3, pp. 363–376, Sep. 2021, https://doi.org/10.1007/s11047-021-09853-3.

A. E. Ezugwu et al., “Metaheuristics: a comprehensive overview and classification along with bibliometric analysis,” Artif Intell Rev, vol. 54, no. 6, pp. 4237–4316, Aug. 2021, https://doi.org/10.1007/s10462-020-09952-0.

Z. Raziei, R. Tavakkoli-Moghaddam, and S. Tabrizian, “Performance Analysis of Meta-heuristic Algorithms for a Quadratic Assignment Problem,” Jul. 2020.

Downloads

Published

2025-01-12

How to Cite

Ramphele, F., Wang, Z., & Yusuff, A. (2025). Optimizing Multi-Layer Perceptron for Predicting Learner Migration Patterns: A Methodological Exploration. International Journal of Computing, 23(4), 536-551. https://doi.org/10.47839/ijc.23.4.3753

Issue

Section

Articles