Learning vocal tract shapes of thai vowels from acoustical data: A preliminary study
Conference proceedings article
ผู้เขียน/บรรณาธิการ
กลุ่มสาขาการวิจัยเชิงกลยุทธ์
ไม่พบข้อมูลที่เกี่ยวข้อง
รายละเอียดสำหรับงานพิมพ์
รายชื่อผู้แต่ง: Prom-On S.
ผู้เผยแพร่: Hindawi
ปีที่เผยแพร่ (ค.ศ.): 2013
Volume number: 3
หน้าแรก: 2526
หน้าสุดท้าย: 2530
จำนวนหน้า: 5
eISSN: 1745-4557
ภาษา: English-Great Britain (EN-GB)
บทคัดย่อ
This paper investigates the vocal tract shapes of Thai vowels by implementing an analysis-by-synthesis strategy for parameter estimation with the VocalTractLab, an articulatory synthesizer capable of synthesizing a full range of speech sounds from the articulatory movements defined by a sequence of vocal tract shapes and the target approximation process. Sentence stimuli were designed to highlight the contextual variations of Thai vowels by varying nine Thai long vowels (/a:/, /i:/, /u:/, /e:/, /ε:/, /ω:/, /α/, /o:/, /c:/) on two syllables. For this preliminary study, speech data, consisting of 81 disyllabic utterances, were recorded from a native Thai speaker. Vocal tract shapes were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of Mel-Frequency Cepstral Coefficients (MFCC) between original and synthesized speech based on an articulatory synthesizer. Stochastic gradient descent algorithm was used to optimize the shape parameters. Parameters of all shapes were first initialized to those of the neutral vowel (schwa) and then iteratively and randomly adjusted toward the new articulatory target. Each new target position is accepted only when it results in a lower total error between synthesized and original MFCC data. The optimization process was repeated a number of times until there are no more significant changes in errors. The optimized vocal tract shapes can then be used to accurately synthesize Thai vowels either as an isolated syllable or a continuous utterance. They also closely resembled the actual pronunciation. This result indicates the potential of this analysis strategy that allows us to effectively and economically estimate the vocal tract shapes without using the actual imaging data.
คำสำคัญ
ไม่พบข้อมูลที่เกี่ยวข้อง