Acronym Disambiguation using Web Scraping
K. Premkumar1, V. Atchayaa2, P. Idayavalli3, R. Gayathri4
1K. Premkumar*, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India.
2V.Atchayaa, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India.
3P. Idayavalli, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India.
4R. Gayathri, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, India.
Manuscript received on April 05, 2020. | Revised Manuscript received on April 17, 2020. | Manuscript published on April 30, 2020. | PP: 256-260 | Volume-9 Issue-4, April 2020. | Retrieval Number: D6812049420/2020©BEIESP | DOI: 10.35940/ijeat.D6812.049420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Web Scraping is one of the current technologies that uses scraping tools to perform tasks similar to humans. It is adopted in many applications like e-commerce, dataset creating in machine learning, advertising etc. This work focuses on acronym disambiguation which is part of natural language processing. Acronym disambiguation is mainly used in chat bot, named entity recognition, natural language processing and so on. In this paper, an acronym disambiguation system is built by web scraping using Jsoup and cosine similarity score is used to identify the most suitable acronym. Our goal is to identify the acronym suitable for the abbreviation based on context of the paragraph where it lies. For this we use cosine similarity to calculate the score, the acronym which obtains maximum score is the concluded as suitable expansion.
Keywords: Web scraping, JSoup, cosine similarity, acronym, abbreviation.