Benchmarking Cloud-based Speech-to-Text APIs for Multilingual Meetings: A Comparative Study on English, Thai, and Malay
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Ampuan Zuhairah Ampuan Hj Zainal, Vajirasak Vanijja, Arif Bramantoro
Publication year: 2025
Start page: 1
End page: 6
Number of pages: 6
URL: https://ieeexplore.ieee.org/document/11338283
Languages: English-United States (EN-US)
Abstract
This study evaluates three leading cloud-based speech-to-text transcription services—Google Cloud, Amazon Transcribe, and Azure Speech Services—for transcribing multilingual meeting audio in Thai, English, and Malay. The evaluation considers transcription accuracy, processing time, and cost-efficiency, using real-world meeting recordings incorporating noise reduction, audio segmentation, and overlapping techniques to simulate practical conditions. Performance is measured by Word Error Rate (WER) and Character Error Rate (CER), with CER applied to Thai due to its non-segmented script. Results show English as the highest transcription accuracy (up to 90.4%) with extensive training resources Thai improved from 55.1% to 77.1%, and Malay from below 50% to ~71.4% after preprocessing. Azure and Amazon outperformed Google for Thai, with segmentation (60s chunks, 2s overlaps) reducing latency by 72%. Azure Batch Transcription was most cost-effective, averaging 0.18 USD per hour with 76.4% mean accuracy. These findings highlight both the strengths and gaps of current cloud ASR, especially for low-resource languages, and provide actionable insights for organizations seeking scalable multilingual transcription solutions.
Keywords
Automatic Speech Recognition, Azure Batch Transcription, Character Error Rate, cloud API benchmarking, English, low-resource languages, Malay, speech-to-text evaluation, Thai, Word Error Rate






