Real-Time Speech-To-Text / Text-To-Speech Converter With Automatic Text Summarizer using Natural Language Generation And Abstract Meaning Representation
K. P. Vijayakumar1, Hemant Singh2, Animesh Mohanty3
1K.P. Vijayakumar*, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India.
2Hemant Singh, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India.
3Animesh Mohanty, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India.
Manuscript received on March 25, 2020. | Revised Manuscript received on April 15, 2020. | Manuscript published on April 30, 2020. | PP: 2361-2365 | Volume-9 Issue-4, April 2020. | Retrieval Number: D7911049420/2020©BEIESP | DOI: 10.35940/ijeat.D7911.049420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: : Due to extensive needs for growth in various sectors, which include software, telecom, healthcare, defence, etc., there is a necessary increase in the number as well as the duration of meetings, conference calls, reconnaissance stakeouts, financial reviews. The obtained reports of these play a significant role in defining the plan of actions. The proposed model is to convert real-time speech to corresponding text and then to its respective summary using Natural Language Grammar (NLG) and Abstract Meaning Representation (AMR) graphs and then again turned back the obtained summary to speech. The proposed model intends to achieve the task using two major algorithms, 1) Deep Speech 2, 2) AMR graphs. The speech-recognition model recommended has a speedup of 4x if the algorithm runs on a Central Processing Unit (CPU), and the use of particular Graphics Processing Units (GPUs) for running deep learning algorithms can give a speedup of 21x. The performance of the summarizer used is close to the Lead-3-AMR-Baseline model, which is a solid baseline for the CNN/Dailymail dataset. The summarizer we use scores ROGUE score close to the Lead-3- AMR-Baseline model with an accuracy of 99.37%.
Keywords: Sequence to Sequence models, Neural Networks, Deep Speech 2, AMR parsing, Batch Normalization, SortaGrad, NLP, NLG, CTC.