Thai Named Entity Recognition Using Bi-LSTM-CRF with Word and Character Representation

Conference proceedings article

ผู้เขียน/บรรณาธิการ

สันติธรรม พรหมอ่อน

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

ไม่พบข้อมูลที่เกี่ยวข้อง

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Thattinaphanich S., Prom-On S.

ผู้เผยแพร่: Hindawi

ปีที่เผยแพร่ (ค.ศ.): 2019

หน้าแรก: 149

หน้าสุดท้าย: 154

จำนวนหน้า: 6

ISBN: 9781728110196

นอก: 0146-9428

eISSN: 1745-4557

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076750254&doi=10.1109%2fINCIT.2019.8912091&partnerID=40&md5=3ac28221cbd7d6f87bad32146624f274

ภาษา: English-Great Britain (EN-GB)

ดูในเว็บของวิทยาศาสตร์ | ดูบนเว็บไซต์ของสำนักพิมพ์ | บทความในเว็บของวิทยาศาสตร์

บทคัดย่อ

Named Entity Recognition (NER) is a handy tool for many natural language processing tasks to identify and extract a unique entity such as person, location, organization and time. In English and Chinese, NER has been thoroughly researched and is able to be applied in more practical settings. Its development in Thai is still limited because of rare resources and language difficulties such as the lack of boundary indicator for words, phrases and sentences. In this paper, we present an application of Bi-LSTM-CRF with word/character level representation, to solve this problem. Firstly, we prepared texts by tokenizing a sentence to a bunch of words. We then prepared word representation and Bi-LSTM character representation. In the end, we built a recurrent neural network combined with CRF to learn the sequence of text and extract the knowledge to build NER recognition to overcome this problem. Our model was evaluated by the NER opensource corpus from a Facebook group ThaiNLP. The results of our model yielded precision, recall, and F1 at 91.79%, 91.51% and 91.65% respectively. ฉ 2019 IEEE.

คำสำคัญ

Bi-LSTM, Conditional Random Field