Content Based Movie Scene Retrieval using Spatio-Temporal Features
Vidit Kumar1, Vikas Tripathi2, Bhaskar Pant3
1Vidit Kumar, Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India.
2Vikas Tripathi, Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India.
3Bhaskar Pant, Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 1492-1496 | Volume-9 Issue-2, December, 2019. | Retrieval Number: B3495129219/2020©BEIESP | DOI: 10.35940/ijeat.B3495.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Thousands of movies along with TV shows, documentaries are being produced each year around the world with different genres and languages. Making a movie scene impactful as well as original is challenging task for the director. On the other hand, users demands to retrieve similar scenes from their queries is also challenging task as there is no proper maintenance of database of movie scene videos with proper semantic tags associated with it. So to fulfill the requirement of these two (but not the least) application areas there is a need of content based retrieval system for movie scenes. Content based video retrieval is a problem of retrieving most similar videos to a given query video by analyzing the visual contents of videos. Traditional video level features based on key frame level hand engineered features which does not exploit rich dynamics present in the video. In this paper we propose a Content based Movie Scene Retrieval (CB-MSR) framework using spatio-temporal features learned by deep learning. Specifically deep CNN along with LSTM is deploy to learn spatio-temporal representations of video. On the basis of these learned features similar movie scenes can be retrieve from the collection of movies. Hollywood2 dataset is used to test the proposed system. Two types of features: spatial and spatio-temporal features are used to evaluate the proposed framework.
Keywords: CNN, LSTM, CB-MSR, Deep learning.