A Relative Study on Search Results Clustering Algorithms – K-means, Suffix Tree and LINGO
R.Mahalakshmi1, V. Lakshmi Praba2
1R.Mahalakshmi, Research Scholar, M.S. University Tirunelveli, (Tamil Nadu), India.
2Dr. V.Lakshmi Praba, Assistant Professor, Sivaganga Women’s College, Madurai, (Tamil Nadu), India.
Manuscript received on July 30, 2013. | Revised Manuscript received on August 17, 2013. | Manuscript published on August 30, 2013. | PP: 31-35 | Volume-2, Issue-6, August 2013. | Retrieval Number: F1942082613/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The performance of the web search engines could be improved by properly clustering the search result documents.. Most of the users are not able to give the appropriate query to get what exactly they wanted to retrieve. So the search engine will retrieve a massive list of data , which are ranked by the page rank algorithm(7) or relevancy algorithm or human judgment algorithm. The user will always find himself with the unrelated information related to the search due to the ambiguity in the query by the user. Evaluating the performance of a clustering algorithm is not as trivial as counting the number of errors or the precision and recall of a supervised classification algorithm In this paper a comparative analysis is done on three common search results of clustering algorithms to study the performance enhancement in the web search engine. If we effectively organize the web documents through the proper means of clustering techniques, we could definitely increase the performance of the search engines .. A systematic evaluation of the three clustering algorithms viz., Suffix tree clustering Lingo, and K-Means using multiple test collections and evaluation measures . It turns out that STC works well, when one wants to get a quick overview of documents relevant to distinct subtopics, whereas clustering is more useful when one is interested in retrieving multiple documents relevant to each subtopic.
Keywords: Information retrieval, Search engines, clustering, STC, Lingo, K-Means.