Similarity Matching of Pairs of Text using CACT Algorithm
CH N Santhosh Kumar1, V Pavan Kumar2, K S Reddy3
1Ch. N. Santhosh Kumar*, Dept. of CSE, Anurag Engineering College, Kodada, India.
2V Pavan Kumar, Dept. of CSE, Anurag Engineering College, Kodada, India.
3 Dr. K.S. Reddy , Researcher, Anurag Group of Institutions, Hyderabad, India.
Manuscript received on July 20, 2019. | Revised Manuscript received on August 10, 2019. | Manuscript published on August 30, 2019. | PP: 2296-2298 | Volume-8 Issue-6, August 2019. | Retrieval Number: F8685088619/2019©BEIESP | DOI: 10.35940/ijeat.F8685.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation.
Keywords: Text mining, Cosine’s similarity coefficient, Jar -Winkler similarity.