Improving OpenAI's Whisper Model for Transcribing Homophones in Legal News

อื่นๆ


ผู้เขียน/บรรณาธิการ

ไม่พบข้อมูลที่เกี่ยวข้อง


กลุ่มสาขาการวิจัยเชิงกลยุทธ์


รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่งSiriket L.; Jitkajornwanich K.; Jaiyen S.; Intakosum S.

ผู้เผยแพร่Institute of Electrical and Electronics Engineers Inc.

ปีที่เผยแพร่ (ค.ศ.)2024

หน้าแรก108

หน้าสุดท้าย113

จำนวนหน้า6

ISBN979-835038594-6

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85197245952&doi=10.1109%2fICEAST61342.2024.10554018&partnerID=40&md5=e70739b29569db992e2767accbb088f8

ภาษาEnglish-Great Britain (EN-GB)


ดูบนเว็บไซต์ของสำนักพิมพ์


บทคัดย่อ

The 'Whisper' model provides a tool for those who require transcription of human voice. It equips with opensource features and diverse functionalities. The model is capable of effectively deciphering messages in multiple languages, including support for the Thai language. This paper focuses on improving the transcription process of Thai homophones using the Whisper model in reducing the word error rate (WER). We focus on words in the legal news category and identify factors that lead to Whisper's incorrect sound predictions. We examined homophones using snippets of legal news video clips and compiled them into a homophone dictionary. We compare words extracted from the Whisper model by determining the word error rate and spelling of words. Based on the initial results obtained from the original Whisper model and the created homophone dictionary, 48 % of the words were incorrectly transcribed out of a total of 94 words. Then, we propose a methodology by which the performance of the Whisper is improved. That way, the automatic speech recognition of Thai language using the Whisper model can fully be utilized and used in other applications. © 2024 IEEE.


คำสำคัญ

Automatic Speech Recognitionnatural language processingpenAIVoice to TextWhisper Model


อัพเดทล่าสุด 2024-25-11 ถึง 12:00