Comprehensive Analysis of Variants of TF-IDF Applied on LDA and LSA Topic Modelling
S. Sai Manasa Bala1, Santoshi Kumari2
1S. Sai Manasa Bala, M.Sc in Applied Mathematics, M S Ramaiah University of Applied Sciences, Bangalore, Inida.
2Santoshi Kumari*, Department of Computer Science and Engineering, M S Ramaiah University of Applied Sciences, Bangalore, Inida.
Manuscript received on August 07, 2020. | Revised Manuscript received on August 15, 2020. | Manuscript published on August 30, 2020. | PP: 531-533 | Volume-9 Issue-6, August 2020. | Retrieval Number: D7669049420/2020©BEIESP | DOI: 10.35940/ijeat.D7669.089620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Present generation is fully connected virtually through many sources of social media. In social media, opinions of people for any post, news or about any product through comments or emoticon designed to express the satisfactory note. Market standards improve on this basis. There are different online markets like Amazon, Flipkart, Myntra improve their businesses using these reviews passed. Analyzing large scale opinion or feedback of individual’s helps to identify hidden insights and work towards customer satisfaction. This paper proposes for applying different weighting scheme of TF-IDF (Term Frequency-Inverse Document Frequency) for topic modeling methods LSA and LDA to cluster the topics of discussion from large scale reviews related to booming online market ‘Amazon’. The main focus of the paper is to observe the changes in the topic modeling by applying different weighting schemes of TF-IDF. In this work topic-based models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Allocation) applied to various weighting schemes of TF-IDF and observed the changes of weights leads to variation of term frequency of different topics with respect to its documents. Results also show that the variation of term weights results changes in topic modeling. Visualization results of topic modeling clusters with different TF-IDF weighting schemes are presented.
Keywords: Data Analysis, LDA, LSA, TF-IDF Weights, Topic Modeling