Benchmarking Cloud-based Speech-to-Text APIs for Multilingual Meetings: A Comparative Study on English, Thai, and Malay

Conference proceedings article

ผู้เขียน/บรรณาธิการ

วชิรศักดิ์ วานิชชา

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Ampuan Zuhairah Ampuan Hj Zainal, Vajirasak Vanijja, Arif Bramantoro

ปีที่เผยแพร่ (ค.ศ.): 2025

หน้าแรก: 1

หน้าสุดท้าย: 6

จำนวนหน้า: 6

URL: https://ieeexplore.ieee.org/document/11338283

ภาษา: English-United States (EN-US)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

This study evaluates three leading cloud-based speech-to-text transcription services—Google Cloud, Amazon Transcribe, and Azure Speech Services—for transcribing multilingual meeting audio in Thai, English, and Malay. The evaluation considers transcription accuracy, processing time, and cost-efficiency, using real-world meeting recordings incorporating noise reduction, audio segmentation, and overlapping techniques to simulate practical conditions. Performance is measured by Word Error Rate (WER) and Character Error Rate (CER), with CER applied to Thai due to its non-segmented script. Results show English as the highest transcription accuracy (up to 90.4%) with extensive training resources Thai improved from 55.1% to 77.1%, and Malay from below 50% to ~71.4% after preprocessing. Azure and Amazon outperformed Google for Thai, with segmentation (60s chunks, 2s overlaps) reducing latency by 72%. Azure Batch Transcription was most cost-effective, averaging 0.18 USD per hour with 76.4% mean accuracy. These findings highlight both the strengths and gaps of current cloud ASR, especially for low-resource languages, and provide actionable insights for organizations seeking scalable multilingual transcription solutions.

คำสำคัญ

Automatic Speech Recognition, Azure Batch Transcription, Character Error Rate, cloud API benchmarking, English, low-resource languages, Malay, speech-to-text evaluation, Thai, Word Error Rate