Machine Transliteration of Handwritten MODI Script to Devanagari using Deep Neural Networks

Solley Joseph; Jossy George

doi:10.47839/ijc.23.2.3540

Authors

Solley Joseph
Jossy George

DOI:

https://doi.org/10.47839/ijc.23.2.3540

Keywords:

Machine Transliteration, MODI script, Calamari OCR, CRNN, Deep Neural Networks

Abstract

The transliteration process involves transcribing words from the source language into the target language that uses a different script. Language and scriptural hurdles can be overcome via transliteration systems. There is a demand for automated transliteration systems due to the existence of several languages and the growing number of multilingual speakers. This study focuses on the Machine Transliteration of handwritten MODI script to Devanagari. MODI script was the official script for Marathi till 1950. Although Devanagari has, since then, taken over as the Marathi language's official script, the MODI script has historical significance as large volumes of its manuscripts are preserved in libraries across different parts of India. However, MODI into Devanagari transliteration is a difficult task because MODI script documents are complex in nature and there is no standard dataset available for the experiment. Machine Transliteration can be approached either as a Natural Language Processing task or as a pattern recognition task. In this research work, the transliteration task is carried out using the pattern recognition technique. The transliteration of MODI script to Devanagari is implemented using Convolutional Recurrent Neural Network (CRNN) based Calamari OCR, which is open-source software. An accuracy of 88.14% is achieved in character level matching of each word in the MODI to Devanagari transliteration process. When considering the entire word matching, the accuracy achieved is 61%. Machine Transliteration of

References

K. Kaur and P. Singh, “Review of machine transliteration techniques,” Int J Comput Appl, vol. 107, no. 20, pp. 13–16, 2014, https://doi.org/10.5120/18866-0061.

J. H. Oh, K. S. Choi, and H. Isahara, “A comparison of different machine transliteration models,” Journal of Artificial Intelligence Research, vol. 27, pp. 119–151, 2006, https://doi.org/10.1613/jair.1999.

A. Pandey, “Final Proposal to Encode the MODI Script in ISO / IEC 10646,” pp. 26–27, 2011, [Online]. Available at: https://www.unicode.org/L2/L2011/11212r2-n4034-MODI.pdf

S. Joseph and J. George, “Handwritten character recognition of MODI script using convolutional neural network based feature extraction method and support vector machine classifier,” Proceedings of the 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 2020, 2020, pp. 32–36. https://doi.org/10.1109/ICSIP49896.2020.9339435.

S. Joseph and J. George, “Feature extraction and classification techniques of MODI script character recognition,” Pertanika J Sci Technol, vol. 27, no. 4, pp. 1649–1669, 2019.

S. Joseph, J. George, “Convolutional Autoencoder Based Feature Extraction and KNN Classifier for Handwritten MODI Script Character Recognition,” in S. Shukla, A. Unal, J. V. Kureethara, D. K. Mishra, and D. S. Han (Eds) Lecture Notes in Networks and Systems book series (LNNS), vol. 290, Springer, 2021, pp. 142–149. https://doi.org/10.1007/978-981-16-4486-3_15.

N. T. A. N. Le, F. Sadat, L. Menard, and D. Dinh, “Low-Resource Machine Transliteration Using Recurrent Neural Networks,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, no. 2, pp. 1–14, 2019. https://doi.org/10.1145/3265752.

M. S. H. Ameur, F. Meziane, A. Guessoum “Arabic machine transliteration using an attention-based encoder-decoder model,” Procedia Comput Sci, vol. 117, pp. 287–297, 2017. https://doi.org/10.1016/j.procs.2017.10.120.

G. S. Josan and G. S. Lehal, “A Punjabi to Hindi machine translation system,” Proceedings of the COLING’2008 22nd International Conference on Computational Linguistics, Proceedings of the Conference, vol. 1, no. 2, pp. 157–160, 2008, doi: 10.30019/IJCLCLP.201006.0001.

E. K. Vellingiriraj, M. Balamurugan, and P. Balasubramanie, “Text analysis and information retrieval of historical Tamil ancient documents using machine translation in image zoning,” International Journal of Languages, Literature and Linguistics, vol. 2, no. 4, pp. 164–168, 2016, https://doi.org/10.18178/IJLLL.2016.2.4.88.

A. Das, A. Ekbal, T. Mandal, and S. Bandyopadhyay, “English to Hindi machine transliteration system at NEWS 2009,” Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), 2009, pp. 80–83. https://doi.org/10.3115/1699705.1699726.

P. H. Rathod, M. L. Dhore, and R. M. Dhore, “Hindi and Marathi to English machine transliteration using SVM,” International Journal on Natural Language Computing, vol. 2, no. 4, pp. 55–71, 2013, https://doi.org/10.5121/ijnlc.2013.2404.

M. Alkhatib and K. Shaalan, “Boosting Arabic named entity recognition transliteration with deep learning,” Proceedings of the Thirty-Third International FLAIRS Conference (FLAIRS-33), vol. 6, pp. 484–487, 2020.

P. Sanjanaashree and M. Anand Kumar, “Joint layer based deep learning framework for bilingual machine transliteration,” Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics, ICACCI’2014, pp. 1737–1743, 2014, https://doi.org/10.1109/ICACCI.2014.6968553.

Y. Shao and J. Nivre, “Applying neural networks to English-Chinese named entity transliteration,” no. 2011, pp. 73–77, 2016, https://doi.org/10.18653/v1/W16-2710.

S. Kundu, S. Paul, and S. Pal, “A deep learning based approach to transliteration,” Proceedings of the Seventh Named Entities Workshop, 2018, pp. 79–83. https://doi.org/10.18653/v1/W18-2411.

N. Dershowitz and O. Terner, Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks, 2020, arXiv preprint arXiv:2004.11405 (2020). [Online]. Available: https://arxiv.org/abs/2004.11405,

V. Chavan, A. Malage, K. Mehrotra, and M. K. Gupta, “Printed text recognition using BLSTM and MDLSTM for Indian languages,” Proceedings of the 2017 4th International Conference on Image Information Processing, ICIIP 2017, vol. 2018, pp. 345–350, 2018, https://doi.org/10.1109/ICIIP.2017.8313738.

T. Deselaers, S. Hasan, O. Bender, and H. Ney, “A deep learning approach to machine transliteration,” Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009, pp. 233–241. https://doi.org/10.3115/1626431.1626476.

N. Sankaran, A. Neelappa, and C. V. Jawahar, “Devanagari text recognition: A transcription based formulation,” Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 678–682, 2013, https://doi.org/10.1109/ICDAR.2013.139.

P. Keshri, P. Kumar, and R. Ghosh, “RNN based online handwritten word recognition in Devanagari script,” Proceedings of the International Conference on Frontiers in Handwriting Recognition, ICFHR, vol. 2018, pp. 517–522, 2018, https://doi.org/10.1109/ICFHR-2018.2018.00096.

P. Krishnan, N. Sankaran, A. K. Singh, and C. V. Jawahar, “Towards a robust OCR system for Indic scripts,” Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, DAS 2014, pp. 141–145, 2014, https://doi.org/10.1109/DAS.2014.74.

C. Wick, C. Reul, and F. Puppe, Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition. 2018, arXiv preprint arXiv:1807.02004. [Online]. Available at: http://arxiv.org/abs/1807.02004.

C. Wick, C. Reul, and F. Puppe, “Comparison of OCR accuracy on early printed books using the open source engines Calamari and OCR opus,” J. Lang. Technol. Comput. Linguistics, vol. 33, no. 1, pp. 79–96, 2018. https://doi.org/10.21248/jlcl.33.2018.219.

C. Reul, U. Springmann, C. Wick, and F. Puppe, State of the Art Optical Character Recognition of 19th Century Fraktur Scripts Using Open Source Engines, 2018, p. 1810.03436. [Online]. Available at: https://doi.org/10.48550/arXiv.1810.03436.

S. Kulkarni, P. Borde, R. Manza, and P. Yannawar, “Review on recent advances in automatic handwritten MODI script recognition,” Int J Comput Appl, vol. 115, no. 19, pp. 975–8887, 2015, https://doi.org/10.5120/20257-2636.

International Journal of Computing

Machine Transliteration of Handwritten MODI Script to Devanagari using Deep Neural Networks

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information