Probabilistic learning models for topic extraction i Thai language
Conference proceedings article
ผู้เขียน/บรรณาธิการ
กลุ่มสาขาการวิจัยเชิงกลยุทธ์
ไม่พบข้อมูลที่เกี่ยวข้อง
รายละเอียดสำหรับงานพิมพ์
รายชื่อผู้แต่ง: Asawaroengchai C., Chaisangmongkon W., Laowattana D.
ผู้เผยแพร่: Hindawi
ปีที่เผยแพร่ (ค.ศ.): 2018
หน้าแรก: 35
หน้าสุดท้าย: 40
จำนวนหน้า: 6
ISBN: 9781538652541
นอก: 0146-9428
eISSN: 1745-4557
ภาษา: English-Great Britain (EN-GB)
บทคัดย่อ
Natural language processing (NLP) in Thai language is notoriously complicated. One major problem is the lack of word boundary in a sentence, introducing ambiguity in word tokenization. For topic extraction, semantic ambiguity adds another layer of complexity to the problem. Topic model that disregards word order, such as Latent Dirichlet Allocation (LDA), performs poorly in Thai Language. In this paper, we experimented and tested a probabilistic language model equipped with word location information, the so-called Topic N-grams model (TNG). We deployed several testing tasks to assess TNG's capabilities of modeling the generative process of Thai text and established benchmarks that compare the performance of LDA and TNG for various NLP tasks in Thai language. To our knowledge, this paper is the first to explore word-order model in Thai language topic extraction. We concluded that TNG can help boosting performance of Thai language processing in word cutting, semantic checking, word prediction, and document generation task. We also explored how we can measure performance of LDA and TNG on such tasks using perplexity. ฉ 2018 IEEE.
คำสำคัญ
LDA, TNG, Topic Modeling, Topic N-grams, Word Cutting