Thai Named Entity Recognition Using Bi-LSTM-CRF with Word and Character Representation
Conference proceedings article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Thattinaphanich S., Prom-On S.
Publisher: Hindawi
Publication year: 2019
Start page: 149
End page: 154
Number of pages: 6
ISBN: 9781728110196
ISSN: 0146-9428
eISSN: 1745-4557
Languages: English-Great Britain (EN-GB)
View in Web of Science | View on publisher site | View citing articles in Web of Science
Abstract
Named Entity Recognition (NER) is a handy tool for many natural language processing tasks to identify and extract a unique entity such as person, location, organization and time. In English and Chinese, NER has been thoroughly researched and is able to be applied in more practical settings. Its development in Thai is still limited because of rare resources and language difficulties such as the lack of boundary indicator for words, phrases and sentences. In this paper, we present an application of Bi-LSTM-CRF with word/character level representation, to solve this problem. Firstly, we prepared texts by tokenizing a sentence to a bunch of words. We then prepared word representation and Bi-LSTM character representation. In the end, we built a recurrent neural network combined with CRF to learn the sequence of text and extract the knowledge to build NER recognition to overcome this problem. Our model was evaluated by the NER opensource corpus from a Facebook group ThaiNLP. The results of our model yielded precision, recall, and F1 at 91.79%, 91.51% and 91.65% respectively. ฉ 2019 IEEE.
Keywords
Bi-LSTM, Conditional Random Field