Thai Named Entity Recognition Using Bi-LSTM-CRF with Word and Character Representation

Conference proceedings article

Authors/Editors

SANTITHAM PROM-ON

Strategic Research Themes

No matching items found.

Publication Details

Author list: Thattinaphanich S., Prom-On S.

Publisher: Hindawi

Publication year: 2019

Start page: 149

End page: 154

Number of pages: 6

ISBN: 9781728110196

ISSN: 0146-9428

eISSN: 1745-4557

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076750254&doi=10.1109%2fINCIT.2019.8912091&partnerID=40&md5=3ac28221cbd7d6f87bad32146624f274

Languages: English-Great Britain (EN-GB)

View in Web of Science | View on publisher site | View citing articles in Web of Science

Abstract

Named Entity Recognition (NER) is a handy tool for many natural language processing tasks to identify and extract a unique entity such as person, location, organization and time. In English and Chinese, NER has been thoroughly researched and is able to be applied in more practical settings. Its development in Thai is still limited because of rare resources and language difficulties such as the lack of boundary indicator for words, phrases and sentences. In this paper, we present an application of Bi-LSTM-CRF with word/character level representation, to solve this problem. Firstly, we prepared texts by tokenizing a sentence to a bunch of words. We then prepared word representation and Bi-LSTM character representation. In the end, we built a recurrent neural network combined with CRF to learn the sequence of text and extract the knowledge to build NER recognition to overcome this problem. Our model was evaluated by the NER opensource corpus from a Facebook group ThaiNLP. The results of our model yielded precision, recall, and F1 at 91.79%, 91.51% and 91.65% respectively. ฉ 2019 IEEE.

Keywords

Bi-LSTM, Conditional Random Field