Detection of Outliers in High Dimensional Data with Lasso Regression
Ch. Anuradha1, M. Ramesh2
1Ch. Anuradha*, Research Scholar, Dept. of CSE, ANU, Guntur, Andhra Pradesh, India.
2Dr. M. Ramesh, Research Supervisor, Dept. of CSE, ANU, Guntur, Andhra Pradesh, India.
Manuscript received on September 23, 2019. | Revised Manuscript received on October 15, 2019. | Manuscript published on October 30, 2019. | PP: 7342-7346 | Volume-9 Issue-1, October 2019 | Retrieval Number: A1478109119/2019©BEIESP | DOI: 10.35940/ijeat.A1478.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Detecting Outliers has become a significant research area in data mining in last few years. The focus of this research has been to identify patterns or objects in huge data sets of a database that are exceptional from normal pattern, specifically dissimilar, and unpredictable with reference to the most of the datasets. As billions of personal computers, and internet users rose phenomenally, huge data sets of real life applications have been created for new challenges as well as explorations in research for Outlier detection. Many traditional techniques to detect outliers have unable to yield good results in such environments. So, developing a method to detect Outliers has become a critical task. A method to identify anomalies in high dimensional data based on Lasso Regression has been study in this research. This framework has been implemented in the open source JMP software. The parameters such as RSquare 0.001162, RMSE 0.031806 and Mean Response 0.007889 are calculated using Spambase dataset. The results from the experiments have shown that the proposed method detects Outliers in high dimensional data with potentially higher accuracy.
Keywords: Outlier detection, Lasso regression, High-dimensional data, JMP, Spambse Dataset.