Learnability of English diphthongs: One dynamic target vs. two static targets

บทความในวารสาร

ผู้เขียน/บรรณาธิการ

สันติธรรม พรหมอ่อน

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Xu A.; van Niekerk D.R.; Gerazov B.; Krug P.K.; Prom-on S.; Birkholz P.; Xu Y.

ผู้เผยแพร่: Elsevier

ปีที่เผยแพร่ (ค.ศ.): 2025

วารสาร: Speech Communication (0167-6393)

Volume number: 170

หน้าแรก: 103225

นอก: 0167-6393

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-86000730783&doi=10.1016%2fj.specom.2025.103225&partnerID=40&md5=2cb9ca862867eae3aa77ee683d9ebef3

ภาษา: English-Great Britain (EN-GB)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

As vowels with intrinsic movements, diphthongs are among the most elusive sounds of speech. Previous research has characterized diphthongs as a combination of two vowels, a vowel followed by a formant transition, or a constant rate of formant change. These accounts are based on acoustic patterns, perceptual cues, and either acoustic or articulatory synthesis, but no consensus has been reached. In this study, we explore the nature of diphthongs by exploring how they can be acquired through vocal learning. The acquisition is simulated by a three-dimensional (3D) vocal tract model with built-in target approximation dynamics, which can learn articulatory targets of phonetic categories under the guidance of a speech recognizer. The simulation attempts to learn to articulate diphthong-embedded monosyllabic English words with either a single dynamic target or two static targets, and the learned synthetic words were presented to native listeners for identification. The results showed that diphthongs learned with dynamic targets were consistently more intelligible across variable durations than those learned with two static targets, with only the exception of /aɪ/. From the perspective of learnability, therefore, English diphthongs are likely unitary vowels with dynamic targets. © 2025 Elsevier B.V.

คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง