Classifying Burmese (Myanmar) Medical Reviews Using Synthetic Data Approach

Conference proceedings article

ผู้เขียน/บรรณาธิการ

Aye Hninn Khine

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Hein Minn Thu, Pyae Bhone Moe, Nyein Chan Ko Ko, Aye Hninn Khine

ปีที่เผยแพร่ (ค.ศ.): 2026

ชื่อชุด: The 11th International Conference on Digital Arts, Media and Technology (DAMT) and 9th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

หน้าแรก: 520

หน้าสุดท้าย: 525

จำนวนหน้า: 6

บทคัดย่อ

Burmese (also known as the Myanmar language) remains a low-resource language for Natural Language Processing, particularly in the healthcare domain, where labeled corpora are limited due to privacy and ethical concerns. This study investigates the feasibility of using synthetic data for medical review sentiment classification in Burmese. We construct synthetic medical review corpora by translating English medical review datasets into Burmese using both large language models and machine translation models. The resulting datasets are evaluated on downstream sentiment classification tasks using Logistic Regression and Support Vector Machine models with both syllable-level and word-level features. Experimental results show that synthetic reviews translated by Google Translate achieved the highest F1- score of 0.71 on the Medication Review Dataset, while Gemini-2.5-pro yielded an F1-score of 0.45 on the Medical Condition Dataset. These findings demonstrate that, despite occasional translation errors and hallucinations, synthetic data can serve as a foundation resource for Burmese healthcare NLP. This work provides the first benchmark for Burmese medical review classification and highlights the importance of combining synthetic data generation with expert validation for reliable low resource digital health application.

คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง