Thai Question Text-To-SQL Parsing Using Transformer
Other
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Tungruethaipak N., Prom-On S.
Publisher: Institute of Electrical and Electronics Engineers Inc.
Publication year: 2024
Start page: 631
End page: 637
Number of pages: 7
ISBN: 979-835038176-4
Languages: English-Great Britain (EN-GB)
Abstract
This paper introduces a novel approach for trans-lating Thai natural language utterances into Structured Query Language (SQL) and establishes a baseline in this burgeoning field. SQ L serves as a pivotal language for communication and executing diverse tasks within databases. While prior research in text-to-SQL parsing has predominantly centered on English with some exploration in Chinese, the absence of resources for low-resource languages like Thai presents a significant challenge. To address this gap, we constructed a Thai version of the Spider dataset-a benchmark dataset featuring cross-domain samples, multiple tables, and complex queries-specifically tailored for Thai language processing tasks. Challenges arise from Thai's unique word segmentation coupled with the presence of SQL keywords and database table columns expressed in English. To establish a baseline, we leverage fine-tuned mT5 [24], a transformer-based large language model developed by Google, which inherently supports multiple languages. This study marks a pivotal step towards advancing natural language understanding and SQL translation for Thai, shedding light on critical research avenues in multilingual text-to-SQL parsing. Which is able to get significant performance improvement of at least 80% to 97% for different SQL components © 2024 IEEE.
Keywords
mT5, Spider dataset, SQL, Text to SQL