Topic Modeling Enhancement using Word Embeddings

Conference proceedings article

ผู้เขียน/บรรณาธิการ

สันติธรรม พรหมอ่อน

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

การเปลี่ยนแปลงด้วยเทคโนโลยีดิจิตอล (รูปแบบการวิจัยเชิงกลยุทธ์)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Limwattana, Siriwat; Prom-On, Santitham;

ผู้เผยแพร่: Elsevier

ปีที่เผยแพร่ (ค.ศ.): 2021

ISBN: 9781665438315

นอก: 0928-4931

eISSN: 1873-0191

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85112361663&doi=10.1109%2fJCSSE53117.2021.9493816&partnerID=40&md5=c632cf7d3975cae3d9817d7eb213ef5c

ภาษา: English-Great Britain (EN-GB)

ดูในเว็บของวิทยาศาสตร์ | ดูบนเว็บไซต์ของสำนักพิมพ์ | บทความในเว็บของวิทยาศาสตร์

บทคัดย่อ

Latent Dirichlet Allocation(LDA) is one of the powerful techniques in extracting topics from a document. The original LDA takes the Bag-of-Word representation as the input and produces topic distributions in documents as output. The drawback of Bag-of-Word is that it represents each word with a plain one-hot encoding which does not encode the word level information. Later research in Natural Language Processing(NLP) demonstrate that word embeddings technique such as Skipgram model can provide a good representation in capturing the relationship and semantic information between words. In recent studies, many NLP tasks could gain better performance by applying the word embedding as the representation of words. In this paper, we propose Deep Word-Topic Latent Dirichlet Allocation(DWT-LDA), a new process for training LDA with word embedding. A neural network with word embedding is applied to the Collapsed Gibbs Sampling process as another choice for word topic assignment. To quantitatively evaluate our model, the topic coherence framework and topic diversity are the metrics used to compare between our approach and the original LDA. The experimental result shows that our method generates more coherent and diverse topics. © 2021 IEEE.

คำสำคัญ

Latent Dirichlet Allocation, Word Embedding