Statistics based evaluation of English Multi-Word Expressions
Rakhi Joon1, Archana Singhal2
1Rakhi Joon*, Department of Computer Science, University of Delhi, India.
2Archana Singhal, Department of Computer Science, IP College for Women, University of Delhi, India.
Manuscript received on September 26, 2019. | Revised Manuscript received on October 15, 2019. | Manuscript published on October 30, 2019. | PP: 908-912 | Volume-9 Issue-1, October 2019 | Retrieval Number: A9421109119/2019©BEIESP | DOI: 10.35940/ijeat.A9421.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The linguistic and statistical information extraction is an important aspect of text processing. The extraction of Multi Word Expression (MWEs) plays a key role in text processing as these are used to find correct meaning of a text phrase. MWEs are the lexical phrases consisting of two or more words conveying some different meaning together other than its constituent words. The linguistics in MWEs extraction is mainly related to the text information including the Part of Speech (POS) tags, grammar rules, related literature, and so on. It is important to extract the correct MWEs for a particular language as there exists variety and veracity in languages. The selection of MWEs are based on the statistical analysis of the MWEs extraction process. In the proposed work, the MWEs extraction is done for English dataset. Along with the existing statistical measures, i.e. Pointwise Mutual Information (PMI), Dice Coefficient (DC) and Modified Dice Coefficient (MDC), the additional measures, Lexical Fixedness (LF), Syntactic Fixedness (SF) and Relevance Measure (RM) are also been evaluated. The results are compared with the other existing approaches applied for English MWEs. The results shows that the proposed measures LF, SF and RM are more significant than existing measures to find the best statistics for the MWEs extraction process. The process model is generic in nature and not adhered to a particular language. It can also be applied for other languages by selecting POS tags for that particular language.
Keywords: English MWEs, linguistics, Statistical measures, Text processing.