A ROBUST SPEECH RECOGNITION SYSTEM USING A GENERAL REGRESSION NEURAL NETWORK
DOI:
https://doi.org/10.47839/ijc.6.3.445Keywords:
Arabic Digits, General Regression Neural Network, Hidden Markov Model, Non Parametric Density Estimation, Speech Recognition, Noisy EnvironmentAbstract
General Regression Neural Networks (GRNN) have been applied to phoneme identification and isolated word recognition in clean speech. In this paper, the authors extended this approach to Arabic spoken word recognition in adverse conditions. In fact, noise robustness is one of the most challenging problems in Automatic Speech Recognition (ASR) and most of the existing recognition methods, which have shown to be highly efficient under noise free conditions, drastically fail drastically in noisy environments. The proposed system has been tested for Arabic digit recognition at different Signal-to-Noise Ratio (SNR) levels and under four noisy conditions: multitalker babble background, car production hall (factory), military vehicle (leopard tank) and fighter jet cockpit (buccaneer) issued from NOISEX-92 data base. The proposed scheme was successfully compared to similar recognizers based on the Multilayer Perceptron (MLP), the Elman Recurrent Neural Network (RNN) and the discrete Hidden Markov Model (HMM). The experimental results show that the use of nonparametric regression with an appropriate smoothing factor (spread) improved the generalization power of the neural network and the global performance of the speech recognizer in noisy environments.References
L. Rabiner. A Tutorial on Hidden Markov Model and Selected Applications, Proc. of the IEEE, 77 (2), (1989), pp. 257-286.
F. Jelinek. Statistical Methods for Speech Recognition, Cambridge, Massachusetts, MIT Press, 1997.
S. Haykin. Neural Networks: A Comprehensive Foundation”, 2nd ed., Cliffs, NJ, 1999.
R. P. Lippman. Review of Neural Networks for Speech Recognition, Neural Computation 1 (1989), pp.1-38.
H. Bourlard, N. Morgan. Connexionist Speech Recognition: A Hybrid Approach, Kluwer Academic Press, 1994.
S. Renals, N. Morgan, H. Bourlard, M. Cohen H. Franco. Connectionist Probability Estimators in HMM Speech Recognition, Proc. of the IEEE ICASSP’98, 12-15 May 1998, Seattle, USA, pp. 9-12.
G. Rigoll and C. Neukirchen. A New Approach to Hybrid HMM/ANN Speech Recognition Using Mutual Information Neural Networks, Advances in Neural Information Processing Systems (NIPS-96). 3-5 Dec. 1996, Denver, USA, pp. 772-778.
H. Bourlard, N. Morgan. Connexionist Techniques, available at: http://cslu.cse.ogi. edu/ HLT survey/ch11node7.html, March 2003.
Y. A. Alotaibi. Investigation of Spoken Arabic Digits in Speech Recognition Setting, Informatics and Computer Sciences 173(1-3), (2005), pp. 105-139.
A. Waibel, T. Harazawa, G. Hinton, K. Shakano, K.J. Lang. Phoneme Recognition using Time-Delay Neural Networks. IEEE Trans. on ASSP, 37 (3), (1989), pp. 328–339.
T. Cacoulos. Estimation of a Multivariate Density, Ann. Inst. Math. Tokyo, 18 (2), (1966), pp. 179–189.
D. F. Specht. A General Regression Neural Networks, IEEE Trans. on Neural Networks, 2 (6), (1991), pp. 568–576.
D.F. Specht, Probabilistic Neural Networks and General Regression Neural Networks, Fuzzy Logic and Neural Network Handbook, Chap3. Mac Graw Hill Inc. 1996.
L. Rutkowski. Generalized Regression Neural Networks in Time Varying Environments. IEEE Trans. on Neural Networks, 15 (3), (2004), pp. 576-596.
K. Chung and R. Tognieri. Extraction of Speech Signal in the Presence of Musical Note Signal Using the Generalized Regression Neural Networks, Proc. of the Sixth Australian Conf. on Speech Sciences and Technology, Dec.1996, Adelaide, Australia, pp. 133-137.
T. Hoya and A. G. Constantinides, An Heuristic Pattern Correction Scheme for GRNNs and its Application to Speech Recognition, Proc of the IEEE Workshop on NNSP'98, 31 Aug- 02 Sept. 1998, Cambridge, U.K., pp. 351-359.
M. W. Bhatti, W. Yongin,G. Ling. A Neural Network Approach for Human Emotion Recognition in Speech, Proc. of the ISCAS 2004, (2), 23-26 May 2004, Vancouver, Canada, pp. 181-184.
B. Bolat and O. Kucuk. Speech Music Classification by Using Statistical Neural Networks, Proc. of the. 12th IEEE Signal Proc. and Com. Appl. Conf. 28-30 April 2004, Kusadasi, Turkey, pp. 227-229.
S. Datta, M. Al Zabibi, O. Farook. Exploitation of Morphological in Large Vocabulary Arabic Speech Recognition, Int. Journal of Computer Processing of Oriental Language 18(4), (2005), pp. 291-302.
K. Kirschoff & al. Novel Approach to Arabic Speech Recognition, Final Report from the JHU Summer School Workshop, 2002, Proc. of the Int. Conf. on ASSP (ICASSP’03), 6-10 April 2003, Hong Kong, pp. 344-347..
M. Debyeche, J. P. Haton, A. Houacine. A New Vector Quantization Approach for Discrete HMM Speech Recognition System, Int. Scientific Journal of Computing 5 (1), (2006), pp. 72-78.
S.A. Selouani, J. Caelen. Arabic Word Recognition by Classifiers and Context, Journal of Computer Science and Technology 20 (3), (2005), pp. 402-410.
M. Shoaib, M. Awais, S. Masud, S. Shamail, J. Akhbar. Application of Concurrent Generalized Regression Neural Networks for Arabic Speech Recognition, Proc. of the IASTED Int. NCI 2004, 23-25 Feb. 2004, Grindelwald, Switzerland, pp. 206-210.
K. Saeed. M. Nammous. Heuristic Method of Arabic Speech Recognition, Proc. of the IEEE Int. Conf. on Digital Signal Processing and its Applications (IEEE DSPA’05), Moscow, Russia, 2005, pp. 528-530
L. Lazli. M. Sellami. Connectionist Probability Estimation in HMM Arabic Speech Recognition Using Fuzzy Logic, Lectures Notes in LNCS (2734), (2003), pp. 379-388.
H. Bourouba, R. Djemili, M.Bedda, C. Snani. New Hybrid System (Supervised Classifier/HMM) for Isolated Arabic Speech Recognition. Proc. of the 2nd. IEEE Int. Conf ICTTA’06, 24-28 April 2006, Damascus, Syria, pp. 1264-1269.
A. Amrouche. J.M. Rouvaen. Arabic Isolated Word Recognition Using General Regression Neural Network., Proc. of the 46th. IEEE Int. MWSCAS’03, 27-30 Dec. 2003. Cairo, Egypt, pp. 689-692.
A. Amrouche. J.M. Rouvaen. On the Use of the Nonparametric Regression in Neural Network Based Approach Applied to Arabic Speech Recognition. Proc. of the 9th. Int. Conf. Speech and Computer (SPECOM’2004), 20-22 Sept. 2004, St Petersburg, Russia. pp. 276-281.
X. Cui. A. Alwan. Noise Robust Speech Recognition Using Feature Compensation Based on Polynomial Regression of Utterance SNR, IEEE Trans. on Speech and Audio Processing, 13(6), (2005), pp. 1161-1172.
U.H. Yapanel. J.H.L. Hansen. A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. Proc. of the 8th (Eurospeech’03), 1-4 Sept. 2003, Geneva, Switzerland, pp. 1281-1284.
C. Kermorvant. A. Morris. A Comparison of Two Strategies for ASR in Additive Noise: Missing data and Spectral Subtraction, Proc. of the 6t European Conf. on Speech Communication and Technology (Eurospeech’99), 5-9 Sept. 1999, Budapest, Hungary, pp. 2841-2844.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.