Identification of Valid Clusters for Datasets Whose Number of Clusters are Unknown
Gazala Yusufi1, Smita Prava Mishra2
1Gazala Yusufi, Department of Computer Science & IT, Institute of Technical Education, Siksha O Anusandhan University, Bhubaneswar, India.
2Smita Prava Mishra, Department of Computer Science & IT, Institute of Technical Education, Siksha O Anusandhan University, Bhubaneswar, India.
Manuscript received on January 25, 2014. | Revised Manuscript received on February 19, 2014. | Manuscript published on February 28, 2014. | PP: 25-29 | Volume-3, Issue-3, February 2014. | Retrieval Number: C2528023314/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The true use of clustering is not exploited properly as humans try to cluster datasets whose class labels are already known. In order to make best use of clustering, an attempt has been made in this work to find a mechanism to identify the number of clusters in the datasets whose class labels are unknown. The cluster validity techniques like Dunn’s index, Davies-Bouldin index, Silhouette index, C index, Goodman-Kruskal index, etc. have been used to validate the number of clusters generated. These techniques access the clustering tendency and measure the quality of the clusters. These indexing techniques are used in conjunction with clustering algorithms like k-means, k-medoid, etc. to measure the validity of the clusters identified by the said algorithms depending on application specific data. The current work applies the above mentioned techniques to several classified datasets taken as benchmark as well as unclassified datasets so as to find the number of clusters in those datasets. Hence, suggests a better use of clustering.
Keywords: Clustering, Cluster Validity Techniques, Indexing, Internal Cluster Validation, Unclassified Dataset Validation.