Loading

An Empirical Perusal of Distance Measures for Clustering with Big Data Mining
Kamlesh Kumar Pandey1, Diwakar Shukla2

1Kamlesh Kumar Pandey, Research Scholar, Department of Computer Science and Applications, Dr. Hari Singh Gour Vishwavidyalaya, Sagar, India.
2Prof. Diwakar Shukla, Dean & Ho D, Department of Computer Science and Applications, Dr. Hari Singh Gour Vishwavidyalaya, Sagar, India.
Manuscript received on July 20, 2019. | Revised Manuscript received on August 10, 2019. | Manuscript published on August 30, 2019. | PP: 606-616 | Volume-8 Issue-6, August 2019. | Retrieval Number: F8078088619/2019©BEIESP | DOI: 10.35940/ijeat.F8078.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The distance measure is the core idea of data mining techniques such as classification, clustering, and statistical analysis and so on. All clustering taxonomies such as partition, hierarchical, density, grid, model, fuzzy and graphs used to distance measures for the data point’s categorization under difference cluster, cluster construction and validation. Big data mining is the advanced concept of data mining respect to the big data dimensions. When traditional clustering algorithm is used under the big data mining the distance measure is needed for scalable under big data mining and support to a huge size dataset, heterogeneous data and sources, and velocity characteristics of the big data. From a theoretically, practically and the existing research perspective, the paper focuses on volume, variety, and velocity big data criterion for identifying a distance measure for the big data mining and recognize how to distance measure works under clustering taxonomy. This study also analyzed all distance measures accuracy with the help of a confusion matrix through clustering.
Keywords: Big Data, Big Data Mining, Big Data Characteristics, Clustering, Clustering Taxonomy, Distance Measure, Distance Measure Families.