Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach

Journal article

Authors/Editors

SANTITHAM PROM-ON

Strategic Research Themes

No matching items found.

Publication Details

Author list: Prom-On S., Birkholz P., Xu Y.

Publisher: SpringerOpen

Publication year: 2014

Journal: EURASIP Journal on Audio, Speech, and Music Processing (1687-4714)

Volume number: 2014

ISSN: 1687-4714

eISSN: 1687-4722

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84901798657&doi=10.1186%2f1687-4722-2014-23&partnerID=40&md5=2b683392ea263890c7ef8b211e4423ad

Languages: English-Great Britain (EN-GB)

View in Web of Science | View on publisher site | View citing articles in Web of Science

Abstract

This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic idea is to simulate the process of learning speech production as a distal learning task, with acoustic signals of natural utterances in the form of Mel-frequency cepstral coefficients (MFCCs) as input, VocalTractLab - a 3D articulatory synthesizer controlled by target approximation models as the learner, and stochastic gradient descent as the target training method. To test the effectiveness of this approach, a speech corpus was designed to contain contextual variations of Thai vowels by juxtaposing nine Thai long vowels in two-syllable sequences. A speech corpus consisting of 81 disyllabic utterances was recorded from a native Thai speaker. Nine vocal tract shapes, each corresponding to a vowel, were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of MFCCs between original and synthesized speech. The stochastic gradient descent algorithm was used to iteratively optimize the shape parameters. The optimized vocal tract shapes were then used to synthesize Thai vowels both in monosyllables and in disyllabic sequences. The results, both numerically and perceptually, indicate that this model-based analysis strategy allows us to effectively and economically estimate the vocal tract shapes to synthesize accurate Thai vowels as well as smooth formant transitions between adjacent vowels. ฉ 2014 Prom-on et al.; licensee Springer.

Keywords

Articulatory target, Thai vowels