Learning vocal tract shapes of thai vowels from acoustical data: A preliminary study

Conference proceedings article


Authors/Editors


Strategic Research Themes

No matching items found.


Publication Details

Author listProm-On S.

PublisherHindawi

Publication year2013

Volume number3

Start page2526

End page2530

Number of pages5

eISSN1745-4557

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84897077431&partnerID=40&md5=8772b9cfd877037dd9cb52cdcc459a58

LanguagesEnglish-Great Britain (EN-GB)


Abstract

This paper investigates the vocal tract shapes of Thai vowels by implementing an analysis-by-synthesis strategy for parameter estimation with the VocalTractLab, an articulatory synthesizer capable of synthesizing a full range of speech sounds from the articulatory movements defined by a sequence of vocal tract shapes and the target approximation process. Sentence stimuli were designed to highlight the contextual variations of Thai vowels by varying nine Thai long vowels (/a:/, /i:/, /u:/, /e:/, /ε:/, /ω:/, /α/, /o:/, /c:/) on two syllables. For this preliminary study, speech data, consisting of 81 disyllabic utterances, were recorded from a native Thai speaker. Vocal tract shapes were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of Mel-Frequency Cepstral Coefficients (MFCC) between original and synthesized speech based on an articulatory synthesizer. Stochastic gradient descent algorithm was used to optimize the shape parameters. Parameters of all shapes were first initialized to those of the neutral vowel (schwa) and then iteratively and randomly adjusted toward the new articulatory target. Each new target position is accepted only when it results in a lower total error between synthesized and original MFCC data. The optimization process was repeated a number of times until there are no more significant changes in errors. The optimized vocal tract shapes can then be used to accurately synthesize Thai vowels either as an isolated syllable or a continuous utterance. They also closely resembled the actual pronunciation. This result indicates the potential of this analysis strategy that allows us to effectively and economically estimate the vocal tract shapes without using the actual imaging data.


Keywords

No matching items found.


Last updated on 2022-06-01 at 15:56