Lip Shape Classification of Sounds for Speech Therapy using SlowFast Networks

Conference proceedings article

ผู้เขียน/บรรณาธิการ

สาลิตา เอี่ยมบุญเสริฐ

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

การเปลี่ยนแปลงด้วยเทคโนโลยีดิจิตอล (รูปแบบการวิจัยเชิงกลยุทธ์)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Penpicha Boonsri , Salita Eiamboonsert , Punnarai Siricharoen

ปีที่เผยแพร่ (ค.ศ.): 2023

ชื่อชุด: ISBN: 978-981-18-7950-0

หน้าแรก: 83

หน้าสุดท้าย: 87

จำนวนหน้า: 5

ภาษา: English-United States (EN-US)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

It is important for patients with Aphasia and Dysarthria to have speech and language therapy to practice breathing exercises, tongue strengthening exercises, and especially speech sounds such as short vowel sounds. To ensure the clarity of the pronunciation sound and the correct position of mouth shape, it is required to be monitored by a therapist. We proposed an automated method using convolutional networks to identify the motion of pronunciation of 9 short vowel sounds which is required for speech exercises in the Thai language. Firstly, videos of vowel sound pronunciation are captured, then preprocessed to crop only the mouth area using Dlib library. The cropped image sequence is then fed into audiovisual SlowFast Networks based on convolutional networks which have Slow and Fast visual pathways to capture spatial and temporal information of a video. We compared our selected model with the transformer-based state-of-the-art model, such as TimeSformer. Our proposed framework using SlowFast networks achieved average accuracy at 97.3% for 9-class video classification of Thai vowel sounds. It shows our proposed framework has a potential for use as a tool for speech sound self-exercises and therapy.

คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง