Lip Shape Classification of Sounds for Speech Therapy using SlowFast Networks

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listPenpicha Boonsri , Salita Eiamboonsert , Punnarai Siricharoen

Publication year2023

Title of seriesISBN: 978-981-18-7950-0

Start page83

End page87

Number of pages5

LanguagesEnglish-United States (EN-US)


View on publisher site


Abstract

It is important for patients with Aphasia and Dysarthria to have speech and language therapy to practice breathing exercises, tongue strengthening exercises, and especially speech sounds such as short vowel sounds. To ensure the clarity of the pronunciation sound and the correct position of mouth shape, it is required to be monitored by a therapist. We proposed an automated method using convolutional networks to identify the motion of pronunciation of 9 short vowel sounds which is required for speech exercises in the Thai language. Firstly, videos of vowel sound pronunciation are captured, then preprocessed to crop only the mouth area using Dlib library. The cropped image sequence is then fed into audiovisual SlowFast Networks based on convolutional networks which have Slow and Fast visual pathways to capture spatial and temporal information of a video. We compared our selected model with the transformer-based state-of-the-art model, such as TimeSformer. Our proposed framework using SlowFast networks achieved average accuracy at 97.3% for 9-class video classification of Thai vowel sounds. It shows our proposed framework has a potential for use as a tool for speech sound self-exercises and therapy.


Keywords

No matching items found.


Last updated on 2024-13-02 at 23:05