A Higher-Order N-gram Model to enhance automatic Word Prediction for Assamese sentences containing ambiguous Words
M.P. Bhuyan1, S.K. Sarma2
1M.P. Bhuyan*, Department of Information Technology, Gauhati University, Guwahati, India.
2S.K. Sarma, Department of Information Technology, Gauhati University, Guwahati, India.
Manuscript received on August 03, 2019. | Revised Manuscript received on August 28, 2019. | Manuscript published on August 30, 2019. | PP: 2921-2926 | Volume-8 Issue-6, August 2019. | Retrieval Number: F8706088619/2019©BEIESP | DOI: 10.35940/ijeat.F8706.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Word prediction is a technique which tries to suggest the users’ words after knowing the few input letters of the user. This predictive model also tries to generate the future words or next words of a sentence by observing earlier words of the sentence. In this research, two problems are combined, one is word prediction and the next is handling of ambiguous words. A word prediction model predicts the future words of a sentence by using n-gram based model. In general, predictive models use unigram, bigram or trigram models to predict the next words. In case of sentences consisting of ambiguous words, the predictive model by using only bigram or trigram cannot perform well to predict the next words. To enhance this prediction for ambiguous words, maximum of six previous input words are observed and try to predict almost the exact words after the ambiguous words in those particular contexts. Different level of experiments are done and the results are compared for modified or enhanced prediction model with the traditional prediction model, improvement on accuracy and failure rate are found in the enhanced model. The accuracy of the Traditional Model is 60.68% on the hand the accuracy of the Enhanced Model is 66.88%. The failure rate of the Traditional Model is 32.35% and the Enhanced Model is 29.17%.
Keywords: N-gram, ambiguous words, word prediction..