Lip Shape Classification of Sounds for Speech Therapy using SlowFast Networks

Conference proceedings article


ผู้เขียน/บรรณาธิการ


กลุ่มสาขาการวิจัยเชิงกลยุทธ์


รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่งPenpicha Boonsri , Salita Eiamboonsert , Punnarai Siricharoen

ปีที่เผยแพร่ (ค.ศ.)2023

ชื่อชุดISBN: 978-981-18-7950-0

หน้าแรก83

หน้าสุดท้าย87

จำนวนหน้า5

ภาษาEnglish-United States (EN-US)


ดูบนเว็บไซต์ของสำนักพิมพ์


บทคัดย่อ

It is important for patients with Aphasia and Dysarthria to have speech and language therapy to practice breathing exercises, tongue strengthening exercises, and especially speech sounds such as short vowel sounds. To ensure the clarity of the pronunciation sound and the correct position of mouth shape, it is required to be monitored by a therapist. We proposed an automated method using convolutional networks to identify the motion of pronunciation of 9 short vowel sounds which is required for speech exercises in the Thai language. Firstly, videos of vowel sound pronunciation are captured, then preprocessed to crop only the mouth area using Dlib library. The cropped image sequence is then fed into audiovisual SlowFast Networks based on convolutional networks which have Slow and Fast visual pathways to capture spatial and temporal information of a video. We compared our selected model with the transformer-based state-of-the-art model, such as TimeSformer. Our proposed framework using SlowFast networks achieved average accuracy at 97.3% for 9-class video classification of Thai vowel sounds. It shows our proposed framework has a potential for use as a tool for speech sound self-exercises and therapy.


คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง


อัพเดทล่าสุด 2024-13-02 ถึง 23:05