Benchmarking Cloud-based Speech-to-Text APIs for Multilingual Meetings: A Comparative Study on English, Thai, and Malay

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listAmpuan Zuhairah Ampuan Hj Zainal, Vajirasak Vanijja, Arif Bramantoro

Publication year2025

Start page1

End page6

Number of pages6

URLhttps://ieeexplore.ieee.org/document/11338283

LanguagesEnglish-United States (EN-US)


View on publisher site


Abstract

This study evaluates three leading cloud-based speech-to-text transcription services—Google Cloud, Amazon Transcribe, and Azure Speech Services—for transcribing multilingual meeting audio in Thai, English, and Malay. The evaluation considers transcription accuracy, processing time, and cost-efficiency, using real-world meeting recordings incorporating noise reduction, audio segmentation, and overlapping techniques to simulate practical conditions.  Performance is measured by Word Error Rate (WER) and Character Error Rate (CER), with CER applied to Thai due to its non-segmented script. Results show English as the highest transcription accuracy (up to 90.4%) with extensive training resources Thai improved from 55.1% to 77.1%, and Malay from below 50% to ~71.4% after preprocessing. Azure and Amazon outperformed Google for Thai, with segmentation (60s chunks, 2s overlaps) reducing latency by 72%. Azure Batch Transcription was most cost-effective, averaging 0.18 USD per hour with 76.4% mean accuracy. These findings highlight both the strengths and gaps of current cloud ASR, especially for low-resource languages, and provide actionable insights for organizations seeking scalable multilingual transcription solutions.


Keywords

Automatic Speech RecognitionAzure Batch TranscriptionCharacter Error Ratecloud API benchmarkingEnglishlow-resource languagesMalayspeech-to-text evaluationThaiWord Error Rate


Last updated on 2026-24-01 at 00:00