A Hybrid Implementation of K-Means and HAC Algorithm and Its Comparison with other Clustering Algorithms
Anita Ganpati1, Jyoti Sharma2
1Anita Ganpati, Faculty, Department of Computer Science, Himachal Pradesh University Summer Hill, Shimla (Himachal Pradesh), India.
2Jyoti Sharma, Research Scholar, Department of Computer Science, Himachal Pradesh University Summer Hill, Shimla (Himachal Pradesh), India.
Manuscript received on 15 October 2015 | Revised Manuscript received on 25 October 2015 | Manuscript Published on 30 October 2015 | PP: 136-138 | Volume-5 Issue-1, October 2015 | Retrieval Number: A4316105115/15©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: There is a huge amount of data which is being produced everyday in Information Technology industry but it is of no use until converted into useful information. Data mining is defined as the process of extracting of hidden predictive information from large databases. Data mining provides an easy and timesaving concept to extract the useful information from large database instead of going through the whole database. There are various data mining techniques and clustering is one of them. Clustering algorithms especially draws significant attention of researchers all around the world because it makes an easy availability of the same data in form of clusters. There are various types of clustering algorithms available in the literature, with each algorithm having its own pro and cons. In this research paper, a hybrid implementation of k-Means and HAC clustering algorithm is presented. Also, the hybrid approach is compared with four other clustering algorithm namely k-Means, DT, HAC, VARCHA. The hybrid implementation has been done using Python scripting language and SCIKIT LEARN open source tool was used for the performance comparison of the algorithms. The various parameters used for comparison were accuracy, precision, recall and f-score. The results show that the performance of hybrid algorithm is found to be quite better than the existing ones.
Keywords: Data Mining, Clustering, k-Means, DT, HAC, VARCHA, Python and SCIKIT.
Scope of the Article: Data Mining