Image Captioning for Affine Transformed Images using Image Hashing
M. Nivedita1, Asnath Victy Phamila Y2

1M.Nivedita, School of Computing Science and Engineering, Vellore Institute of Technology, Chennai, India.
2Asnath Victy Phamila Y, School of Computing Science and Engineering, Vellore Institute of Technology, Chennai, India.
Manuscript received on September 23, 2019. | Revised Manuscript received on October 15, 2019. | Manuscript published on October 30, 2019. | PP: 4736-4741 | Volume-9 Issue-1, October 2019 | Retrieval Number: A2022109119/2019©BEIESP | DOI: 10.35940/ijeat.A2022.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Image captioning is the process of generating a meaningful textual description to the image. The perfect caption for the image not only consists of objects and their attributes, it also concentrates on the actions involved by the objects. There are two main tasks in Image captioning. The first and foremost task is correctly identifying objects present in the given image. Once all the objects are identified along with their attributes, the dense model is trained in order to identify the correct verbs or the actions in which these identified objects are involved. The second part in Image captioning is generating the syntactically correct natural language sentence which connects all the identified objects along with their attributes and actions. In this paper we have generated the captioning for affine transformed images using Flickr 8K dataset.
Keywords: deep learning, Convolution neural networks, Recurrent Neural Networks, dense model, language model.