Classifying Burmese (Myanmar) Medical Reviews Using Synthetic Data Approach

Conference proceedings article

Authors/Editors

AYE HNINN KHINE

Strategic Research Themes

Publication Details

Author list: Hein Minn Thu, Pyae Bhone Moe, Nyein Chan Ko Ko, Aye Hninn Khine

Publication year: 2026

Title of series: The 11th International Conference on Digital Arts, Media and Technology (DAMT) and 9th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

Start page: 520

End page: 525

Number of pages: 6

Abstract

Burmese (also known as the Myanmar language) remains a low-resource language for Natural Language Processing, particularly in the healthcare domain, where labeled corpora are limited due to privacy and ethical concerns. This study investigates the feasibility of using synthetic data for medical review sentiment classification in Burmese. We construct synthetic medical review corpora by translating English medical review datasets into Burmese using both large language models and machine translation models. The resulting datasets are evaluated on downstream sentiment classification tasks using Logistic Regression and Support Vector Machine models with both syllable-level and word-level features. Experimental results show that synthetic reviews translated by Google Translate achieved the highest F1- score of 0.71 on the Medication Review Dataset, while Gemini-2.5-pro yielded an F1-score of 0.45 on the Medical Condition Dataset. These findings demonstrate that, despite occasional translation errors and hallucinations, synthetic data can serve as a foundation resource for Burmese healthcare NLP. This work provides the first benchmark for Burmese medical review classification and highlights the importance of combining synthetic data generation with expert validation for reliable low resource digital health application.

Keywords

No matching items found.