Multiple Speaker Recognition
Deepanshi Bansal1, Pooja Gupta2
1Deepanshi Bansal, Department of CSE, Maharaja Agrasen Institute of Technology(MAIT), GGSIPU, New Delhi, India.
2Pooja Gupta, Assistant Prof. Department of CSE, Maharaja Agrasen Institute of Technology (MAIT), GGSIPU, New Delhi, India.
Manuscript received on 18 February 2019 | Revised Manuscript received on 27 February 2019 | Manuscript published on 28 February 2019 | PP: 358-362 | Volume-8 Issue-3, February 2019 | Retrieval Number: C5964028319/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Multiple Speaker Recognition is a powerful tool determining the number of speakers in a random speech along with determination of time span and segregation of voice signal of each speaker according to his/her MFCC and delta MFCC , chroma factors etc. which are different for different people because of variation in frequency of vocal chord which in turn effect the places of stress and syllables used.The methodology used in the research paper enable user to extract and isolate the individual voice streams at the receiver end. It can be used in various situations such as office meetings, the country parliament sessions to identify the speaker and his/her content to draw conclusions. The system may help the user to highlight a particular speaker’s voice amongst other speakers. Even if the number of speakers is not known, the algorithm used in the paper will be able to determine the number of speakers based on the concept of clustering and then extracting common features of voice , it will be able to distinguish the speakers and separate their voice with each other thus giving the output as the duration of the discussion done by each speaker along with separated voice saved in wav format. If the voice sample of the speaker is stored in the speaker recognition model then, the model will also be able to show the name of the speaker else the representation of different speakers will be in the form speaker 1 , speaker 2 and so on. The methodology discussed in the paper could be used in present day interview process during group discussions and in admission process for higher studies and this could ease the work load on the recruiters giving them a idea about contribution of different speakers in the discussion.
Keywords: Clustering, Diarization, Voice Activity Detection
Scope of the Article: Clustering