Speech Emotion Recognition using Hybrid Architectures

Authors

  • Michael Norval
  • Zenghui Wang

DOI:

https://doi.org/10.47839/ijc.23.1.3430

Keywords:

Emotion recognition, Artificial Intelligence, Dendritic Layer, Capsule Networks, Ensemble

Abstract

The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.

References

K. Hartmann, I. Siegert, D. Philippou-Hübner, and A. Wendemuth, “Emotion detection in hci: From speech features to emotion space,” IFAC Proceedings Volumes, vol. 46, no. 15, pp. 288–295, 2013. [Online]. Available: http://dx.doi.org/10.3182/20130811-5-us-2037.00049

M. Mohan, P. Dhanalakshmi, and R. S. Kumar, “Speech emotion classification using ensemble models with mfcc,” Procedia Computer Science, vol. 218, pp. 1857–1868, 2023, iD: 280203. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050923001631

R. A. M. T. H. A. A. O. Mohammad Subhi Al-Batah, Mazen Alzyoud, “Early prediction of cervical cancer using machine learning techniques,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 04, pp. 357 – 369, 2022.

T. Ahammad, “Risk factor identification for stroke prognosis using machine-learning algorithms,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 03, pp. 282 – 296, 2022.

F. A. Meaad Alrehaili, “Development of ensemble machine learning model to improve covid-19 outbreak forecasting,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 08, no. 02, pp. 159–169, 2022.

S. O. A. E. Leen Al Qadi, Hozayfa El Rifai, “A scalable shallow learning approach for tagging arabic news articles,” Jordanian Journal of Computers and Information Technology (JJCIT), vol. 06, no. 03, pp. 263 – 280, 2020.

X. Wu, S. Liu, Y. Cao, X. Li, J. Yu, dongyang dai, X. Ma, S. Hu, Z. Wu, X. Liu, and H. Meng, Speech Emotion Recognition Using Capsule Networks, 2019.

I. Shahin, N. Hindawi, A. B. Nassif, A. Alhudhaif, and K. Polat, “Novel dual-channel long short-term memory compressed capsule networks for emotion recognition,” Expert Systems with Applications, vol. 188, p. 116080, 2022, iD: 271506. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417421014172

X. Wu, Y. Cao, H. Lu, S. Liu, D. Wang, Z. Wu, X. Liu, and H. M. Meng, “Speech emotion recognition using sequential capsule networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3280–3291, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:243766683

L. T. Van, Q. H. Nguyen, and T. D. T. Le, “Emotion recognition with capsule neural network,” Computer Systems Science and Engineering, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:243997015

J. Poncelet and H. V. hamme, “Multitask learning with capsule networks for speech-to-intent applications,” ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8494–8498, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:211146449

X. Wu, S. Liu, Y. Cao, X. Li, J. Yu, D. Dai, X. Ma, S. Hu, Z. Wu, X. Liu, and H. M. Meng, “Speech emotion recognition using capsule networks,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6695–6699, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:146062005

S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, pp. 601–614, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:51625695

S. Gao, M. Zhou, Z. Wang, D. Sugiyama, J. Cheng, J. Wang, and Y. Todo, “Fully complex-valued dendritic neuron model,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 2105–2118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:237432737

Z. Wang, S. Gao, J. Wang, H. Yang, and Y. Todo, “A dendritic neuron model with adaptive synapses trained by differential evolution algorithm,” Computational Intelligence and Neuroscience, vol. 2020, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:210954439

J. Ji, C. Tang, J. Zhao, Z. Tang, and Y. Todo, “A survey on dendritic neuron model: Mechanisms, algorithms and practical applications,”

Neurocomputing, vol. 489, pp. 390–406, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247491484

S. Gao, M. Zhou, Z. Wang, D. Sugiyama, J. Cheng, J. Wang, and Y. Todo, “Fully complex-valued dendritic neuron model,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 2105–2118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:237432737

Y. Yu, Z. Lei, Y. Wang, T. Zhang, C. Peng, and S. Gao, “Improving dendritic neuron model with dynamic scale-free network-based differential evolution,” IEEE/CAA Journal of Automatica Sinica, vol. 9, pp. 99–110, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:239041270

J. Ji, C. Tang, J. Zhao, Z. Tang, and Y. Todo, “A survey on dendritic neuron model: Mechanisms, algorithms and practical applications,”

Neurocomputing, vol. 489, pp. 390–406, 2022. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2021.08.153

E. Egrioglu, E. Ba¸s, and M.-Y. Chen, “Recurrent dendritic neuron model artificial neural network for time series forecasting,” Information Sciences, vol. 607, pp. 572–584, 2022, iD: 271625. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025522005941

J. M. Macías-Macías, J. A. Ramírez-Quintana, M. I. Chacón-Murguía, A. A. Torres-García, and L. F. Corral-Martínez, “Interpretation of a deep analysis of speech imagery features extracted by a capsule neural network,” Computers in biology and medicine, vol. 159, p. 106909, 2023, iD: 271150. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010482523003748

H. H. Gul, E. Egrioglu, and E. Bas, “Statistical learning algorithms for dendritic neuron model artificial neural network based on sine cosine algorithm,” Information Sciences, vol. 629, pp. 398–412, 2023, iD: 271625. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025523001792

D. Morrison, R. Wang, and L. C. D. Silva, “Ensemble methods for spoken emotion recognition in call-centres,” Speech Communication, vol. 49, no. 2, pp. 98–112, 2007, iD: 271578. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167639306001713

M. Alrehaili, F. Assiri, and K. Omari, “Development of ensemble machine learning model to improve covid-19 outbreak forecasting,” Jordanian Journal of Computers and Information Technology, vol. 8, no. 2, pp. 1–169, Jun 1, 2022. [Online]. Available: https://search.proquest.com/docview/2672019932

A. Agga, A. Abbou, M. Labbadi, Y. E. Houm, and I. H. O. Ali, “Cnn-lstm: An efficient hybrid deep learning architecture for predicting short-term photovoltaic power production,” Electric Power Systems Research, vol. 208, p. 107908, 2022, iD: 271091. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378779622001389

R. Yang, S. K. Singh, M. Tavakkoli, N. Amiri, Y. Yang, M. A. Karami, and R. Rai, “Cnn-lstm deep learning architecture for computer vision-based modal frequency detection,” Mechanical Systems and Signal Processing, vol. 144, p. 106885, 2020, iD: 272413. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0888327020302715

K. Adewole, A. Balogun, M. Raheem, M. Jimoh, R. Jimoh, M. Mabayoje, F. Hamza, A. Akintola, and A. Gbolagade, “Hybrid feature selection framework for sentiment analysis on large corpora,” Jordanian Journal of Computers and Information Technology, vol. 7, no. 2, pp. 1–151, Jun 1, 2021. [Online]. Available: https://search.proquest.com/docview/2672361477

T.-Y. Kim and S.-B. Cho, “Web traffic anomaly detection using c-lstm neural networks,” Expert Systems with Applications, vol. 106, pp. 66–76, 2018. [Online]. Available: http://dx.doi.org/10.1016/j.eswa.2018.04.004

S. Ravuri and A. Stolcke, “Recurrent neural network and lstm models for lexical utterance classification,” 2015. [Online]. Available: http://dx.doi.org/10.21437/interspeech.2015-42

S. Tao, Y. Todo, T. Zheng, B. Li, Z. Zhang, and R. Inoue, “A novel artificial visual system for motion direction detection in grayscale images,” Mathematics, vol. 10, p. 2975, 2022.

Y. Chen, L. Li, W. Li, Q. Guo, Z. Du, and Z. Xu, “Fundamentals of neural networks,” pp. 17–51, 2024. [Online]. Available: http://dx.doi.org/10.1016/b978-0-32-395399-3.00008-1

M. K. Patrick, A. F. Adekoya, A. A. Mighty, and B. Y. Edward, “Capsule networks – a survey,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 1, pp. 1295–1310, 2022, iD: 280416. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1319157819309322

S. J. Pawan and J. Rajan, “Capsule networks for image classification: A review,” Neurocomputing, vol. 509, pp. 102–120, 2022, iD: 271597. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231222010657

R. PLUTCHIK, Chapter 1 - A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION, ser. Theories of Emotion. Academic Press, 1980, pp. 3–33, iD: 303393. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780125587013500077

A. Mondal and S. S. Gokhale, “Mining emotions on plutchik’s wheel,” 2020. [Online]. Available: http://dx.doi.org/10.1109/snams52053.2020.9336534

M. Norval and Z. Wang, “Creation of an afrikaans speech corpora for speech emotion recognition,” 2022. [Online]. Available: http://dx.doi.org/10.1109/raai56146.2022.10092988

V. Singh and S. Prasad, “Speech emotion recognition system using gender dependent convolution neural network,” Procedia Computer Science, vol. 218, pp. 2533–2540, 2023, iD: 280203. [Online]. Available:https://www.sciencedirect.com/science/article/pii/S1877050923002272

J. Gondohanindijo, E. Noersasongko, and D. R. M. Setiadi, “Multifeatures audio extraction for speech emotion recognition based on deep learning,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 14, no. 6, /23/30 2023. [Online]. Available: https://thesai.org/Publications/ViewPaper?Volume=14&Issue=6&Code=IJACSA&SerialNo=23

A. Maxwell, T. Warner, and L. A. Guillén, “Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—part 1: Literature review,” Remote Sensing, vol. 13, p. 2450, 2021.

D. Powers, “Evaluation: From precision, recall and f-factor to roc, informedness, markedness correlation,” Mach.Learn.Technol., vol. 2, 2008.

M. Vakili, M. Ghamsari, and M. Rezaei, Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification, 2020.

D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomedical Signal Processing and Control, vol. 59, p. 101894, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1746809420300501

S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english,” PloS one, vol. 13, no. 5, p. e0196391, 2018.

C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, pp. 335–359, 2008.

F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, B. Weiss et al., “A database of german emotional speech.” in Interspeech, vol. 5, 2005, pp. 1517–1520.

Downloads

Published

2024-10-11

How to Cite

Norval, M., & Wang, Z. (2024). Speech Emotion Recognition using Hybrid Architectures. International Journal of Computing, 23(1), 1-10. https://doi.org/10.47839/ijc.23.1.3430

Issue

Section

Articles