Rat Protein’s Enzyme Class Classification Using Machine Learning
Chhote Lal Prasad Gupta1, Anand Bihari2, Sudhakar Tripathi3
1Chhote Lal Prasad Gupta, Computer Science & Engineering, Dr. APJ Abdul Kalam Technical University, Lucknow, India.
2Anand Bihari, School of Information Technology & Engineering, VIT University, Vellore, Tamil Nadu, India.
3Sudhakar Tripathi, Deportment of Information Technology, Rajkiya Engineering College, Ambedkarnagar, Uttar Pradesh, India.
Manuscript received on July 20, 2019. | Revised Manuscript received on August 10, 2019. | Manuscript published on August 30, 2019. | PP: 655-663 | Volume-8 Issue-6, August 2019. | Retrieval Number: F8098088619/2019©BEIESP | DOI: 10.35940/ijeat.F8098.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the current era, bioinformatics has been an emerging research area in the context of protein enzyme classification from the unknown protein data. In bioinformatics, the prime goal is to manipulate the protein data and develop a computational technique to classify and predict the appropriate features for function predictions. In this context, several machine learning and statistical technique have been designed for classification of data. The classification of protein data is one the challenging task and generally the classification of protein data has been done on human protein data. In this article, we have considered rat enzyme class for classification and predictions. Here we have used like CRT, CHAID, C5.0, NEURAL, SVM, and Bayesian for classification of protein data and to measure the performance of the model, the accuracy, specificity, sensitivity, precision, recall, f-measures and MCC have been used. The experimental result highlights that the some of the protein data are imbalance that affects the performance. In this experiment, the Lyases, Isomerases and Ligases class of data are imbalanced and affect the performance of the models. The experimental results highlight that the C5.0 gives 91.5% accuracy and takes only 4 second for computation and can be used for protein classification and prediction of protein data.
Keywords: Machine learning tools, enzyme class, Uni Prot, protein encoded sequence, Protein classification..