A ROBUST SPEECH RECOGNITION SYSTEM USING A GENERAL REGRESSION NEURAL NETWORK

Authors

  • Abderrahmane Amrouche
  • Abdelmalik Taleb-Ahmed
  • Jean Michel Rouvaen
  • Mustapha C. E. Yagoub

DOI:

https://doi.org/10.47839/ijc.6.3.445

Keywords:

Arabic Digits, General Regression Neural Network, Hidden Markov Model, Non Parametric Density Estimation, Speech Recognition, Noisy Environment

Abstract

General Regression Neural Networks (GRNN) have been applied to phoneme identification and isolated word recognition in clean speech. In this paper, the authors extended this approach to Arabic spoken word recognition in adverse conditions. In fact, noise robustness is one of the most challenging problems in Automatic Speech Recognition (ASR) and most of the existing recognition methods, which have shown to be highly efficient under noise free conditions, drastically fail drastically in noisy environments. The proposed system has been tested for Arabic digit recognition at different Signal-to-Noise Ratio (SNR) levels and under four noisy conditions: multitalker babble background, car production hall (factory), military vehicle (leopard tank) and fighter jet cockpit (buccaneer) issued from NOISEX-92 data base. The proposed scheme was successfully compared to similar recognizers based on the Multilayer Perceptron (MLP), the Elman Recurrent Neural Network (RNN) and the discrete Hidden Markov Model (HMM). The experimental results show that the use of nonparametric regression with an appropriate smoothing factor (spread) improved the generalization power of the neural network and the global performance of the speech recognizer in noisy environments.

References

L. Rabiner. A Tutorial on Hidden Markov Model and Selected Applications, Proc. of the IEEE, 77 (2), (1989), pp. 257-286.

F. Jelinek. Statistical Methods for Speech Recognition, Cambridge, Massachusetts, MIT Press, 1997.

S. Haykin. Neural Networks: A Comprehensive Foundation”, 2nd ed., Cliffs, NJ, 1999.

R. P. Lippman. Review of Neural Networks for Speech Recognition, Neural Computation 1 (1989), pp.1-38.

H. Bourlard, N. Morgan. Connexionist Speech Recognition: A Hybrid Approach, Kluwer Academic Press, 1994.

S. Renals, N. Morgan, H. Bourlard, M. Cohen H. Franco. Connectionist Probability Estimators in HMM Speech Recognition, Proc. of the IEEE ICASSP’98, 12-15 May 1998, Seattle, USA, pp. 9-12.

G. Rigoll and C. Neukirchen. A New Approach to Hybrid HMM/ANN Speech Recognition Using Mutual Information Neural Networks, Advances in Neural Information Processing Systems (NIPS-96). 3-5 Dec. 1996, Denver, USA, pp. 772-778.

H. Bourlard, N. Morgan. Connexionist Techniques, available at: http://cslu.cse.ogi. edu/ HLT survey/ch11node7.html, March 2003.

Y. A. Alotaibi. Investigation of Spoken Arabic Digits in Speech Recognition Setting, Informatics and Computer Sciences 173(1-3), (2005), pp. 105-139.

A. Waibel, T. Harazawa, G. Hinton, K. Shakano, K.J. Lang. Phoneme Recognition using Time-Delay Neural Networks. IEEE Trans. on ASSP, 37 (3), (1989), pp. 328–339.

T. Cacoulos. Estimation of a Multivariate Density, Ann. Inst. Math. Tokyo, 18 (2), (1966), pp. 179–189.

D. F. Specht. A General Regression Neural Networks, IEEE Trans. on Neural Networks, 2 (6), (1991), pp. 568–576.

D.F. Specht, Probabilistic Neural Networks and General Regression Neural Networks, Fuzzy Logic and Neural Network Handbook, Chap3. Mac Graw Hill Inc. 1996.

L. Rutkowski. Generalized Regression Neural Networks in Time Varying Environments. IEEE Trans. on Neural Networks, 15 (3), (2004), pp. 576-596.

K. Chung and R. Tognieri. Extraction of Speech Signal in the Presence of Musical Note Signal Using the Generalized Regression Neural Networks, Proc. of the Sixth Australian Conf. on Speech Sciences and Technology, Dec.1996, Adelaide, Australia, pp. 133-137.

T. Hoya and A. G. Constantinides, An Heuristic Pattern Correction Scheme for GRNNs and its Application to Speech Recognition, Proc of the IEEE Workshop on NNSP'98, 31 Aug- 02 Sept. 1998, Cambridge, U.K., pp. 351-359.

M. W. Bhatti, W. Yongin,G. Ling. A Neural Network Approach for Human Emotion Recognition in Speech, Proc. of the ISCAS 2004, (2), 23-26 May 2004, Vancouver, Canada, pp. 181-184.

B. Bolat and O. Kucuk. Speech Music Classification by Using Statistical Neural Networks, Proc. of the. 12th IEEE Signal Proc. and Com. Appl. Conf. 28-30 April 2004, Kusadasi, Turkey, pp. 227-229.

S. Datta, M. Al Zabibi, O. Farook. Exploitation of Morphological in Large Vocabulary Arabic Speech Recognition, Int. Journal of Computer Processing of Oriental Language 18(4), (2005), pp. 291-302.

K. Kirschoff & al. Novel Approach to Arabic Speech Recognition, Final Report from the JHU Summer School Workshop, 2002, Proc. of the Int. Conf. on ASSP (ICASSP’03), 6-10 April 2003, Hong Kong, pp. 344-347..

M. Debyeche, J. P. Haton, A. Houacine. A New Vector Quantization Approach for Discrete HMM Speech Recognition System, Int. Scientific Journal of Computing 5 (1), (2006), pp. 72-78.

S.A. Selouani, J. Caelen. Arabic Word Recognition by Classifiers and Context, Journal of Computer Science and Technology 20 (3), (2005), pp. 402-410.

M. Shoaib, M. Awais, S. Masud, S. Shamail, J. Akhbar. Application of Concurrent Generalized Regression Neural Networks for Arabic Speech Recognition, Proc. of the IASTED Int. NCI 2004, 23-25 Feb. 2004, Grindelwald, Switzerland, pp. 206-210.

K. Saeed. M. Nammous. Heuristic Method of Arabic Speech Recognition, Proc. of the IEEE Int. Conf. on Digital Signal Processing and its Applications (IEEE DSPA’05), Moscow, Russia, 2005, pp. 528-530

L. Lazli. M. Sellami. Connectionist Probability Estimation in HMM Arabic Speech Recognition Using Fuzzy Logic, Lectures Notes in LNCS (2734), (2003), pp. 379-388.

H. Bourouba, R. Djemili, M.Bedda, C. Snani. New Hybrid System (Supervised Classifier/HMM) for Isolated Arabic Speech Recognition. Proc. of the 2nd. IEEE Int. Conf ICTTA’06, 24-28 April 2006, Damascus, Syria, pp. 1264-1269.

A. Amrouche. J.M. Rouvaen. Arabic Isolated Word Recognition Using General Regression Neural Network., Proc. of the 46th. IEEE Int. MWSCAS’03, 27-30 Dec. 2003. Cairo, Egypt, pp. 689-692.

A. Amrouche. J.M. Rouvaen. On the Use of the Nonparametric Regression in Neural Network Based Approach Applied to Arabic Speech Recognition. Proc. of the 9th. Int. Conf. Speech and Computer (SPECOM’2004), 20-22 Sept. 2004, St Petersburg, Russia. pp. 276-281.

X. Cui. A. Alwan. Noise Robust Speech Recognition Using Feature Compensation Based on Polynomial Regression of Utterance SNR, IEEE Trans. on Speech and Audio Processing, 13(6), (2005), pp. 1161-1172.

U.H. Yapanel. J.H.L. Hansen. A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. Proc. of the 8th (Eurospeech’03), 1-4 Sept. 2003, Geneva, Switzerland, pp. 1281-1284.

C. Kermorvant. A. Morris. A Comparison of Two Strategies for ASR in Additive Noise: Missing data and Spectral Subtraction, Proc. of the 6t European Conf. on Speech Communication and Technology (Eurospeech’99), 5-9 Sept. 1999, Budapest, Hungary, pp. 2841-2844.

Downloads

Published

2014-08-01

How to Cite

Amrouche, A., Taleb-Ahmed, A., Rouvaen, J. M., & Yagoub, M. C. E. (2014). A ROBUST SPEECH RECOGNITION SYSTEM USING A GENERAL REGRESSION NEURAL NETWORK. International Journal of Computing, 6(3), 6-15. https://doi.org/10.47839/ijc.6.3.445

Issue

Section

Articles