A hybrid approach for Thai word segmentation with crowdsourcing feedback system

Conference proceedings article


ผู้เขียน/บรรณาธิการ


กลุ่มสาขาการวิจัยเชิงกลยุทธ์

ไม่พบข้อมูลที่เกี่ยวข้อง


รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่งChaonithi K., Prom-On S.

ผู้เผยแพร่Hindawi

ปีที่เผยแพร่ (ค.ศ.)2016

ISBN9781467397490

นอก0146-9428

eISSN1745-4557

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84988893977&doi=10.1109%2fECTICon.2016.7561298&partnerID=40&md5=48c35f448cb25b74b992828e81bcb181

ภาษาEnglish-Great Britain (EN-GB)


ดูบนเว็บไซต์ของสำนักพิมพ์


บทคัดย่อ

This paper proposes a new hybrid method for Thai word segmentation using crowd-sourced dictionary integrated with word bi-gram model. The main dictionary is extracted into basic and compound word dictionaries to improve dictionary based algorithm performance. The word segmentation process begins with heuristic exhaustive matching algorithm using basic word dictionary to generate all possible basic word sequence candidates from an input string. Then, the best candidate is selected by word bi-gram model to solve ambiguity problem. Finally, the sequence of basic words is combined into compound words with compound word dictionary. Another part of this work is applying crowdsourcing paradigm. We implemented a web application for training bi-gram model and dictionary updates from user feedbacks. This process improves the lexical knowledge of the platform over the time. The algorithm was evaluated with two corpora. With InterBEST 2009 corpus, the proposed algorithm yields average precision, recall and f-measure at 97.52%, 97.70%, and 97.63%. With social network corpus, the proposed method yields average precision, recall and f-measure at 98.47%, 98.59%, and 98.54% respectively. ฉ 2016 IEEE.


คำสำคัญ

bi-gramcrowdsourcingexhaustive matchingHybrid methodMachine Learningweb serviceword segmentation


อัพเดทล่าสุด 2023-26-09 ถึง 07:36