Unified Deep Semantic Search on Code
Ashwin Patil1, Sonal Pachpute2, Rushika Bhattad3, Amisha Pandit4, Anita Gunjal5
1Ashwin Patil*, Department of Computer Engineering, M.I.T College of Engineering, Pune, India.
2Sonal Pachpute, Department of Computer Engineering, M.I.T College of Engineering, Pune, India.
3Rushika Bhattad, Department of Computer Engineering, M.I.T College of Engineering, Pune, India.
4Amisha Pandit, Department of Computer Engineering, M.I.T College of Engineering, Pune, India.
5Gunjal, Department of Computer Engineering, M.I.T College of Engineering, Pune, India
Manuscript received on June 01, 2020. | Revised Manuscript received on June 08, 2020. | Manuscript published on June 30, 2020. | PP: 872-876 | Volume-9 Issue-5, June 2020. | Retrieval Number: E9861069520/2020©BEIESP | DOI: 10.35940/ijeat.E9861.069520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: A tool that can search over large code corpus directly and list ranked snippets can prove to be an invaluable resource to programmers looking for similar code snippets using natural language queries. It must have a deep understanding of the semantics of source code and queries to evaluate their intent correctly. Over the years, many tools that rely on the textual similarity between source code and query have proven to be ineffective as they fail to learn the high- level semantic understanding of source code and query. While the previous models for code search using deep neural networks do a good job but, most of them only evaluate their models on only a single programming language, mostly Java. In this paper, we propose a novel deep neural network model called Unified Code Net that can handle the intricacies of different programming languages. This model borrows several vital features from different previous models and builds on top of those ideas to make a unified model that can generate document vector embeddings from source code, and using similarity search with the query vector embedding can return the most similar code snippets in any language. This tool can drastically reduce the programmer’s efforts to look for an efficient and viable code snippet for problem at hand which ideally can replace use of search engines for the same.
Keywords: Semantic code search, natural language processing, information retrieval