Minimizing False Negatives of Measles Prediction Model: An Experimentation of Feature Selection Based On Domain Knowledge and Random Forest Classifier
Wan Muhamad Taufik Wan Ahmad1, Nur Laila Ab Ghani2, Sulfeeza Mohd Drus3
1Wan Muhamad Taufik Wan Ahmad*, College of Computing and Informatics, University Tenaga Nasional, Jalan IKRAM-UNITEN, Kajang, Selangor, Malaysia.
2Nur Laila Ab Ghani, College of Computing and Informatics, University Tenaga Nasional, Jalan IKRAM-UNITEN, Kajang, Selangor, Malaysia.
3Sulfeeza Mohd Drus, College of Computing and Informatics, University Tenaga Nasional, Jalan IKRAM-UNITEN, Kajang, Selangor, Malaysia.
Manuscript received on September 04, 2019. | Revised Manuscript received on September 22, 2019. | Manuscript published on October 30, 2019. | PP: 3411-3414 | Volume-9 Issue-1, October 2019 | Retrieval Number: A2640109119/2019©BEIESP | DOI: 10.35940/ijeat.A2640.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the context of disease prediction model, false negative error occurs when the patient is wrongly predicted as free from the disease. A prediction model development involves the process of data collection and feature selection which extracts relevant features from the dataset. Two commonly employed feature selection approaches are domain knowledge and data driven, that suffer from bias towards past or current knowledge when applied alone. In this research, we have studied the developmentof measles prediction model by incorporating both the domain knowledge and the data-driven approaches, in particular, the Random Forest classifier. The domain expert has earlier on set the important features based uponhisprior knowledgeon measles for the purpose of minimizing the size of features. Afterward, the attributes became the input in Random Forest classifier and the least important attributes are excluded using the Mean Decrease Gini, in order to experiment its effect on the result. It is found that the removal of several attributes after domain knowledge consultation can provide a good model with less false negative errors.
Keywords: Feature selection, Classification, Random forest, Variable importance, Mean decrease gini.