Anomaly Detection Model Training for New York Times Bestseller Fiction Novels
M. Madhuram1, Chinmaya Joshi2, Ananthajith TCA3, Anoushka Dutta4
1M. Madhuram, Assistant Professor, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
2Chinmaya Joshi, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
3Ananthajith TCA, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
4Anoushka Dutta, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
Manuscript received on 18 April 2019 | Revised Manuscript received on 25 April 2019 | Manuscript published on 30 April 2019 | PP: 1301-1306 | Volume-8 Issue-4, April 2019 | Retrieval Number: D6615048419/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The New York Times bestsellers list is one of the ultimate authority when it comes to ranking the best selling novels in the world. Their list is globally recognized and respected by everyone when it comes to which books are among the best. We propose an idea to use machine learning to predict in the early stages of a book’s lifecycle whether they have a chance of becoming a NY Times Bestseller or not. The idea is to take the bestselling fiction novels dataset and train a machine learning model to teach it to try and recognize a pattern in all the novels which are in the list in between the years 2008-2018. The features used to train the model are the publisher’s name, author’s name, date of publication, title and description of the novel. This would be an unsupervised learning problem due to the reason that the dataset only contains the list of bestseller novels and not those which can be used to train the model to recognize what a non-bestseller novel looks like. This would be an example of a one-class classification. These type of machine learning problems are categorized as “anomaly/novelty detection” problems. In these problems, we train the model to associate the data with the bestseller novels, and when we check the model for some new data it checks whether that data is the same as those which the model is trained to recognize as bestseller and depending on that it classifies that specific tuple of data/novel as a bestseller or not (anomaly or not).
Keywords: New York Times Bestseller, Novels, Fiction, Data Anomaly, Unsupervised, Bag-of-Words, One Class SVC
Scope of the Article: Real-Time Information Systems