N-gram based Machine Translation for English-Assamese: Two Languages with High Syntactical Dissimilarity
Zakir Hussain1, Malaya Dutta Borah2, Abdul Hannan3
1Zakir Hussain*, Research Scholar, Department of CSE, NIT Silchar, Assam, India.
2Malaya Dutta Borah, Assistant Professor, Department of CSE, NIT Silchar, Assam, India.
3Abdul Hannan, Faculty, Department of IT, Gauhati University, Assam, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 08, 2019. | Manuscript published on December 30, 2019. | PP: 2940-2949 | Volume-9 Issue-2, December, 2019. | Retrieval Number: B2320129219/2019©BEIESP | DOI: 10.35940/ijeat.B2320.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.
Keywords: Statistical Machine Translation, N-gram, MARIE, English-Assamese Translation, Tuple Extraction