Speaker Independent Text to Speech for Malayalam
Sajini T1, Neetha George2
1Sajini T, Electronics and Communication, Kerala university, College of Engineering, Trivandrum, India.
2Neetha George, Electronics and Communication, Kerala university, College of Engineering, Trivandrum, India.
Manuscript received on 13 June 2017 | Revised Manuscript received on 20 June 2017 | Manuscript Published on 30 June 2017 | PP: 96-103 | Volume-6 Issue-5, June 2017 | Retrieval Number: E5014066517/17©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Text to speech (TTS) relates is software which converts text to speech output. TTS has wide range of applications which includes assistive technologies like communication devices for providing voice for voice disabled. These applications require flexibility to provide diverse speakers voice or unique voice as output. Existing corpus based TTS does not provide this flexibility, and changing a voice is time consuming, expensive and tedious since it requires hours of high quality speech corpus. In this work we explore the speaker adaptation technology available in Hidden Markov Model based Text to speech (HTS) for providing speaker variability in Malayalam TTS. Speaker adaptation (SA) using HTS framework has been successfully implemented for foreign languages like English, Japanese etc. but not yet been tried for Indian languages. In this work we try to implement SA using HTS framework as a solution for providing diverse voices, reducing the expenses, time and effort required, in the usual approach for creating a variant/new TTS voice. We have used a combination of the constrained maximum likelihood linear regression (CMLLR) and maximum a posterior probability (MAP) for generating variant voices. A five speaker database with one hour speech from each speaker is used for SA, in which four speakers database is used for training speaker independent average model (SI). SI model was trained with different number of speakers. Average model with 3 speakers gave an intelligible noisy output, and four speakers gave intelligible, good quality and similarity output with rarely occurring distortions. Quality of the system was determined using perceptual scores tested with 15 native speakers. An average word error rate (WER) for 3 and 4 speaker model was 15.65% and 16.2% for paragraphs selected from different domains and 30 sentences gave an average score of 26.82% and 21.14%. The adapted voice model gave a 3.39, 3.59, 3.55 and 3.38 as the Mean opinion score (MOS) for naturalness, intelligibility, degradation and similarity index. The results show that the SA technique for HTS is a quick, easy & less expensive technique that can be successfully used for a phonetic language like Malayalam for providing generating diverse voices for TTS.
Keywords: Speaker Adaptation, HMM Based TTS, Constrained Maximum Likelihood Linear Regression, Maximum A Posterior, MAP.
Scope of the Article: Regression and Prediction