Efficient Retrieval of Html Documents Using Hybrid Meta-Heuristic Approaches in Web Document Clustering
Manjit Singh1, Anshu Bhasin2, Surender3
1Manjit Singh, Ph.D Research Scholar, Department of Computer Applications, IKG Punjab Technical University, Kapurthala (Punjab), India.
2Anshu Bhasin, Assistant Professor, Department of Computer Science & Engineering, IKG Punjab Technical University, Main Campus, Kapurthala (Punjab), India.
3Surender, Assistant Professor, Department of Computer Science, Guru Tegh Bahadur College, Bhawani garh, Sangrur (Punjab), India.
Manuscript received on 02 September 2019 | Revised Manuscript received on 12 September 2019 | Manuscript Published on 23 September 2019 | PP: 1350-1354 | Volume-8 Issue-5C, May 2019 | Retrieval Number: E11920585C19/19©BEIESP | DOI: 10.35940/ijeat.E1192.0585C19
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the rapid growth of web documents on WWW, it is becoming difficult to organize, analyze and present these documents efficiently. Web search engines return many documents to the web user, out of which some are relevant and some irrelevant documents to the topic, for the given query. Web search is usually performed using only features extracted from the web page text. HTML tags with particular meanings have been found to improve the efficiency of the information retrieval System. However, organizing documents in a way that will improve search without additional cost or complexity is still a great challenge. Clustering can play an important role to organize such a large number of documents into several groups. However due to limitations in existing techniques of clustering, scientists have begun using Meta-heuristic algorithms for the clustering problem of documents. In this paper, we presented a document clustering method that uses HTML tags and Metaheuristic approaches. The hybrid PSO+ACO+K-means algorithm is used for clustering the documents. In the proposed approach, results are analyzed on WEBKB dataset
Keywords: Clustering Retrieval Web Algorithm Hybrid.
Scope of the Article: Clustering