Speech Emotion Recognition using Hybrid Architectures
DOI:
https://doi.org/10.47839/ijc.23.1.3430Keywords:
Emotion recognition, Artificial Intelligence, Dendritic Layer, Capsule Networks, EnsembleAbstract
The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.
References
K. Hartmann, I. Siegert, D. Philippou-Hübner, and A. Wendemuth, “Emotion detection in hci: From speech features to emotion space,” IFAC Proceedings Volumes, vol. 46, no. 15, pp. 288–295, 2013. [Online]. Available: http://dx.doi.org/10.3182/20130811-5-us-2037.00049
M. Mohan, P. Dhanalakshmi, and R. S. Kumar, “Speech emotion classification using ensemble models with mfcc,” Procedia Computer Science, vol. 218, pp. 1857–1868, 2023, iD: 280203. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050923001631
R. A. M. T. H. A. A. O. Mohammad Subhi Al-Batah, Mazen Alzyoud, “Early prediction of cervical cancer using machine learning techniques,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 04, pp. 357 – 369, 2022.
T. Ahammad, “Risk factor identification for stroke prognosis using machine-learning algorithms,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 03, pp. 282 – 296, 2022.
F. A. Meaad Alrehaili, “Development of ensemble machine learning model to improve covid-19 outbreak forecasting,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 02, pp. 159–169, 2022.
S. O. A. E. Leen Al Qadi, Hozayfa El Rifai, “A scalable shallow learning approach for tagging arabic news articles,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 06, no. 03, pp. 263 – 280, 2020.
X. Wu, S. Liu, Y. Cao, X. Li, J. Yu, dongyang dai, X. Ma, S. Hu, Z. Wu, X. Liu, and H. Meng, Speech Emotion Recognition Using Capsule Networks, 2019.
I. Shahin, N. Hindawi, A. B. Nassif, A. Alhudhaif, and K. Polat, “Novel dual-channel long short-term memory compressed capsule networks for emotion recognition,” Expert Systems with Applications, vol. 188, p. 116080, 2022, iD: 271506. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417421014172
X. Wu, Y. Cao, H. Lu, S. Liu, D. Wang, Z. Wu, X. Liu, and H. M. Meng, “Speech emotion recognition using sequential capsule networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3280–3291, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:243766683
L. T. Van, Q. H. Nguyen, and T. D. T. Le, “Emotion recognition with capsule neural network,” Computer Systems Science and Engineering, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:243997015
J. Poncelet and H. V. hamme, “Multitask learning with capsule networks for speech-to-intent applications,” ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8494–8498, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:211146449
X. Wu, S. Liu, Y. Cao, X. Li, J. Yu, D. Dai, X. Ma, S. Hu, Z. Wu, X. Liu, and H. M. Meng, “Speech emotion recognition using capsule networks,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6695–6699, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:146062005
S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, pp. 601–614, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:51625695
S. Gao, M. Zhou, Z. Wang, D. Sugiyama, J. Cheng, J. Wang, and Y. Todo, “Fully complex-valued dendritic neuron model,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 2105–2118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:237432737
Z. Wang, S. Gao, J. Wang, H. Yang, and Y. Todo, “A dendritic neuron model with adaptive synapses trained by differential evolution algorithm,” Computational Intelligence and Neuroscience, vol. 2020, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:210954439
J. Ji, C. Tang, J. Zhao, Z. Tang, and Y. Todo, “A survey on dendritic neuron model: Mechanisms, algorithms and practical applications,”
Neurocomputing, vol. 489, pp. 390–406, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247491484
S. Gao, M. Zhou, Z. Wang, D. Sugiyama, J. Cheng, J. Wang, and Y. Todo, “Fully complex-valued dendritic neuron model,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 2105–2118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:237432737
Y. Yu, Z. Lei, Y. Wang, T. Zhang, C. Peng, and S. Gao, “Improving dendritic neuron model with dynamic scale-free network-based differential evolution,” IEEE/CAA Journal of Automatica Sinica, vol. 9, pp. 99–110, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:239041270
J. Ji, C. Tang, J. Zhao, Z. Tang, and Y. Todo, “A survey on dendritic neuron model: Mechanisms, algorithms and practical applications,”
Neurocomputing, vol. 489, pp. 390–406, 2022. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2021.08.153
E. Egrioglu, E. Ba¸s, and M.-Y. Chen, “Recurrent dendritic neuron model artificial neural network for time series forecasting,” Information Sciences, vol. 607, pp. 572–584, 2022, iD: 271625. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025522005941
J. M. Macías-Macías, J. A. Ramírez-Quintana, M. I. Chacón-Murguía, A. A. Torres-García, and L. F. Corral-Martínez, “Interpretation of a deep analysis of speech imagery features extracted by a capsule neural network,” Computers in biology and medicine, vol. 159, p. 106909, 2023, iD: 271150. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010482523003748
H. H. Gul, E. Egrioglu, and E. Bas, “Statistical learning algorithms for dendritic neuron model artificial neural network based on sine cosine algorithm,” Information Sciences, vol. 629, pp. 398–412, 2023, iD: 271625. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025523001792
D. Morrison, R. Wang, and L. C. D. Silva, “Ensemble methods for spoken emotion recognition in call-centres,” Speech Communication, vol. 49, no. 2, pp. 98–112, 2007, iD: 271578. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167639306001713
M. Alrehaili, F. Assiri, and K. Omari, “Development of ensemble machine learning model to improve covid-19 outbreak forecasting,” Jordanian Journal of Computers and Information Technology, vol. 8, no. 2, pp. 1–169, Jun 1, 2022. [Online]. Available: https://search.proquest.com/docview/2672019932
A. Agga, A. Abbou, M. Labbadi, Y. E. Houm, and I. H. O. Ali, “Cnn-lstm: An efficient hybrid deep learning architecture for predicting short-term photovoltaic power production,” Electric Power Systems Research, vol. 208, p. 107908, 2022, iD: 271091. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378779622001389
R. Yang, S. K. Singh, M. Tavakkoli, N. Amiri, Y. Yang, M. A. Karami, and R. Rai, “Cnn-lstm deep learning architecture for computer vision-based modal frequency detection,” Mechanical Systems and Signal Processing, vol. 144, p. 106885, 2020, iD: 272413. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0888327020302715
K. Adewole, A. Balogun, M. Raheem, M. Jimoh, R. Jimoh, M. Mabayoje, F. Hamza, A. Akintola, and A. Gbolagade, “Hybrid feature selection framework for sentiment analysis on large corpora,” Jordanian Journal of Computers and Information Technology, vol. 7, no. 2, pp. 1–151, Jun 1, 2021. [Online]. Available: https://search.proquest.com/docview/2672361477
T.-Y. Kim and S.-B. Cho, “Web traffic anomaly detection using c-lstm neural networks,” Expert Systems with Applications, vol. 106, pp. 66–76, 2018. [Online]. Available: http://dx.doi.org/10.1016/j.eswa.2018.04.004
S. Ravuri and A. Stolcke, “Recurrent neural network and lstm models for lexical utterance classification,” 2015. [Online]. Available: http://dx.doi.org/10.21437/interspeech.2015-42
S. Tao, Y. Todo, T. Zheng, B. Li, Z. Zhang, and R. Inoue, “A novel artificial visual system for motion direction detection in grayscale images,” Mathematics, vol. 10, p. 2975, 2022.
Y. Chen, L. Li, W. Li, Q. Guo, Z. Du, and Z. Xu, “Fundamentals of neural networks,” pp. 17–51, 2024. [Online]. Available: http://dx.doi.org/10.1016/b978-0-32-395399-3.00008-1
M. K. Patrick, A. F. Adekoya, A. A. Mighty, and B. Y. Edward, “Capsule networks – a survey,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 1, pp. 1295–1310, 2022, iD: 280416. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1319157819309322
S. J. Pawan and J. Rajan, “Capsule networks for image classification: A review,” Neurocomputing, vol. 509, pp. 102–120, 2022, iD: 271597. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231222010657
R. PLUTCHIK, Chapter 1 - A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION, ser. Theories of Emotion. Academic Press, 1980, pp. 3–33, iD: 303393. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780125587013500077
A. Mondal and S. S. Gokhale, “Mining emotions on plutchik’s wheel,” 2020. [Online]. Available: http://dx.doi.org/10.1109/snams52053.2020.9336534
M. Norval and Z. Wang, “Creation of an afrikaans speech corpora for speech emotion recognition,” 2022. [Online]. Available: http://dx.doi.org/10.1109/raai56146.2022.10092988
V. Singh and S. Prasad, “Speech emotion recognition system using gender dependent convolution neural network,” Procedia Computer Science, vol. 218, pp. 2533–2540, 2023, iD: 280203. [Online]. Available:https://www.sciencedirect.com/science/article/pii/S1877050923002272
J. Gondohanindijo, E. Noersasongko, and D. R. M. Setiadi, “Multifeatures audio extraction for speech emotion recognition based on deep learning,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 14, no. 6, /23/30 2023. [Online]. Available: https://thesai.org/Publications/ViewPaper?Volume=14&Issue=6&Code=IJACSA&SerialNo=23
A. Maxwell, T. Warner, and L. A. Guillén, “Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—part 1: Literature review,” Remote Sensing, vol. 13, p. 2450, 2021.
D. Powers, “Evaluation: From precision, recall and f-factor to roc, informedness, markedness correlation,” Mach.Learn.Technol., vol. 2, 2008.
M. Vakili, M. Ghamsari, and M. Rezaei, Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification, 2020.
D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomedical Signal Processing and Control, vol. 59, p. 101894, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1746809420300501
S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english,” PloS one, vol. 13, no. 5, p. e0196391, 2018.
C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, pp. 335–359, 2008.
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, B. Weiss et al., “A database of german emotional speech.” in Interspeech, vol. 5, 2005, pp. 1517–1520.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.