Thai person name recognition (PNR) using likelihood probability of tokenized words
Conference proceedings article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Saetiew N., Achalakul T., Prom-On S.
Publisher: Hindawi
Publication year: 2017
ISBN: 9781509046669
ISSN: 0146-9428
eISSN: 1745-4557
Languages: English-Great Britain (EN-GB)
Abstract
Named Entity Recognition (NER) is very important in many natural language processing tasks, especially information extraction. The problem of NE extraction in Thai is much more complicated than English because Thai language lacks orthography and boundary indicator between words. In this paper, we presented a research work in the field of NER with the emphasis on person name recognition (PNR) in Thai text. Our proposed method consists of 4 steps. First, text is tokenized into a set of words. Second, a part-of-name probability is computed for each word using Odds with Laplace smoothing and Logistic function. Third, name candidates are selected based on the likelihood probability. Finally, the end point of name is identified using a set of rules and a drop rate threshold. We then evaluated out method using 1,700 online news from the InterBEST 2009 corpus. The results show that the proposed method yields average precision, recall, f-measure and accuracy at 75.21%, 98.10%, 85.15%, and 81.05% respectively. ฉ 2017 IEEE.
Keywords
Laplace smoothing, Odds, Person name recognition, Thai text analytics