A combination of text mining techniques for relevant literature search and extractive summarization

Conference proceedings article

ผู้เขียน/บรรณาธิการ

JONATHAN HOYIN CHAN

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

ไม่พบข้อมูลที่เกี่ยวข้อง

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Phongwattana T., Chan J.H.

ผู้เผยแพร่: Hindawi

ปีที่เผยแพร่ (ค.ศ.): 2018

หน้าแรก: 7

หน้าสุดท้าย: 11

จำนวนหน้า: 5

ISBN: 9781450365512

นอก: 0146-9428

eISSN: 1745-4557

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85058683915&doi=10.1145%2f3278293.3278300&partnerID=40&md5=fb40e1412a29a4211e115d1f6df6e2bc

ภาษา: English-Great Britain (EN-GB)

ดูในเว็บของวิทยาศาสตร์ | ดูบนเว็บไซต์ของสำนักพิมพ์ | บทความในเว็บของวิทยาศาสตร์

บทคัดย่อ

Over the past few years, the amount of research papers published has dramatically increased. Consequently, researchers spend a lot of time reviewing relevant literature in order to better understand their domain of interest and keep up with new developments. After doing literature reviews in the area of text mining, we found many works proposing the means of sentence representation in machine learning for finding sentence similarity. These include average bag of words, weight average word vectors, bag of n-grams, and matrix-vector operations. However, these techniques are limited in word ordering and semantic analysis. This paper proposes a framework that combines two text mining techniques, paragraph vectors and TextRank, for the selection of relevant research paper and extractive summarization, respectively. Our training corpus includes over 20 million research papers. The aim of this work is to build a supplementary research tool that assists researchers in saving time conducting literature reviews. As the result, we can rank all relevant research papers potentially within the corpus, and utilize the outputs in our literature reviews. Moreover, the tool can extract all potential keywords in a single task as well. ฉ 2018 Association for Computing Machinery.

คำสำคัญ

Document similarity, Extractive summarization, Literature Review, paragraph vectors, TextRank