Loading

Density-Based Clustering Based on Probability Distribution for Uncertain Data
Pramod Patil1, Ashish Patel2, Parag Kulkarni3
1Prof. Pramod Patil, Research Scholar, University of Pune, India.
2Mr. Ashish Patel, Student, PDDYPIET, University of Pune, India.
3Dr. Parag Kulkarni, College of Engineering, Pune, India.
Manuscript received on May 27, 2014. | Revised Manuscript received on June 09, 2014. | Manuscript published on June 30, 2014. | PP: 154-158  | Volume-3, Issue-5, June 2014.  | Retrieval Number:  E3166063514/2013©BEIESP

Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Today we have seen so much digital uncertain data produced. Handling of this uncertain data is very difficult. Commonly, the distance between these uncertain object descriptions are expressed by one numerical distance value. Clustering on uncertain data is one of the essential and challenging tasks in mining uncertain data. The previous methods extend partitioning clustering methods like k-means and density-based clustering methods like DBSCAN on uncertain data based on geometric distances between objects. Such method facing the problems with the data that they cannot handle uncertain objects that are geometrically indistinguishable ( such as weather data across the world at same time). In this paper, we model uncertain objects in both continuous and discrete domains with the help of probability distribution. We use Kull back-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete Values, and integrate that into partitioning and density-based clustering methods to cluster uncertain objects. We first find out uncertain objects and then we cluster uncertain data according to partitioning based clustering. Then remaining data we clustered by using any traditional method of clustering.
Keywords: Clustering, Uncertain Data, Probabilistic Mass Function, Probabilistic Density Estimation, Fast Gaussian Transform.