Classifying Burmese (Myanmar) Medical Reviews Using Synthetic Data Approach

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listHein Minn Thu, Pyae Bhone Moe, Nyein Chan Ko Ko, Aye Hninn Khine

Publication year2026

Title of seriesThe 11th International Conference on Digital Arts, Media and Technology (DAMT) and 9th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

Start page520

End page525

Number of pages6


Abstract

Burmese (also known as the Myanmar language) remains a low-resource language for Natural Language Processing, particularly in the healthcare domain, where labeled corpora are limited due to privacy and ethical concerns. This study investigates the feasibility of using synthetic data for medical review sentiment classification in Burmese. We construct synthetic medical review corpora by translating English medical review datasets into Burmese using both large language models and machine translation models. The resulting datasets are evaluated on downstream sentiment classification tasks using Logistic Regression and Support Vector Machine models with both syllable-level and word-level features. Experimental results show that synthetic reviews translated by Google Translate achieved the highest F1- score of 0.71 on the Medication Review Dataset, while Gemini-2.5-pro yielded an F1-score of 0.45 on the Medical Condition Dataset. These findings demonstrate that, despite occasional translation errors and hallucinations, synthetic data can serve as a foundation resource for Burmese healthcare NLP. This work provides the first benchmark for Burmese medical review classification and highlights the importance of combining synthetic data generation with expert validation for reliable low resource digital health application.


Keywords

No matching items found.


Last updated on 2026-10-02 at 00:00