Provissional Access For Improving Classification Accuracy On Diabetes Dataset
A.Sumathi1, S.Meganathan2, S.Revathi3
1A.Sumathi, Dept. Of Computer Science & Engineering, SRC, Sastra Deemed University, Thanjavur, Tamil Nadu, India.
2S.Meganathan, Dept. Of Computer Science & Engineering, SRC, Sastra Deemed University, Thanjavur, Tamil Nadu, India.
3S.Revathi, Dept. Of Computer Science & Engineering, SRC, Sastra Deemed University, Thanjavur, Tamil Nadu, India.
Manuscript received on July 20, 2019. | Revised Manuscript received on August 10, 2019. | Manuscript published on August 30, 2019. | PP: 5245-5249 | Volume-8 Issue-6, August 2019. | Retrieval Number: F9389088619/19©BEIESP | DOI: 10.35940/ijeat.F9389.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Data mining helps to solve many problems in the area of medical diagnosis using real-world data. However, much of the data is unrealizable as it does not have desirable features and contains a lot of gaps and errors. A complete set of data is a prerequisite for precise grouping and classification of a dataset. Preprocessing is a data mining technique that transforms the unrefined dataset into reliable and useful data. It is used for resolving the issues and changes raw data for next level processing. Discretization is a necessary step for data preprocessing task. It reduces the large chunks of numeric values to a group of well-organized values. It offers remarkable improvements in speed and accuracy in classification. This paper investigates the impact of preprocessing on the classification process. This work implements three techniques such as Naive Bayes, Logistic Regression, and SVM to classify Diabetes dataset. The experimental system is validated using discretize techniques and various classification algorithms.
Keywords: Data preprocessing, Discretization, Naive Bayes, Support Vector Machine (SVM), Logistic Regression.