A combination of text mining techniques for relevant literature search and extractive summarization

Conference proceedings article


Authors/Editors


Strategic Research Themes

No matching items found.


Publication Details

Author listPhongwattana T., Chan J.H.

PublisherHindawi

Publication year2018

Start page7

End page11

Number of pages5

ISBN9781450365512

ISSN0146-9428

eISSN1745-4557

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85058683915&doi=10.1145%2f3278293.3278300&partnerID=40&md5=fb40e1412a29a4211e115d1f6df6e2bc

LanguagesEnglish-Great Britain (EN-GB)


View in Web of Science | View on publisher site | View citing articles in Web of Science


Abstract

Over the past few years, the amount of research papers published has dramatically increased. Consequently, researchers spend a lot of time reviewing relevant literature in order to better understand their domain of interest and keep up with new developments. After doing literature reviews in the area of text mining, we found many works proposing the means of sentence representation in machine learning for finding sentence similarity. These include average bag of words, weight average word vectors, bag of n-grams, and matrix-vector operations. However, these techniques are limited in word ordering and semantic analysis. This paper proposes a framework that combines two text mining techniques, paragraph vectors and TextRank, for the selection of relevant research paper and extractive summarization, respectively. Our training corpus includes over 20 million research papers. The aim of this work is to build a supplementary research tool that assists researchers in saving time conducting literature reviews. As the result, we can rank all relevant research papers potentially within the corpus, and utilize the outputs in our literature reviews. Moreover, the tool can extract all potential keywords in a single task as well. ฉ 2018 Association for Computing Machinery.


Keywords

Document similarityExtractive summarizationLiterature Reviewparagraph vectorsTextRank


Last updated on 2023-06-10 at 07:36