A combination of text mining techniques for relevant literature search and extractive summarization
Conference proceedings article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Phongwattana T., Chan J.H.
Publisher: Hindawi
Publication year: 2018
Start page: 7
End page: 11
Number of pages: 5
ISBN: 9781450365512
ISSN: 0146-9428
eISSN: 1745-4557
Languages: English-Great Britain (EN-GB)
View in Web of Science | View on publisher site | View citing articles in Web of Science
Abstract
Over the past few years, the amount of research papers published has dramatically increased. Consequently, researchers spend a lot of time reviewing relevant literature in order to better understand their domain of interest and keep up with new developments. After doing literature reviews in the area of text mining, we found many works proposing the means of sentence representation in machine learning for finding sentence similarity. These include average bag of words, weight average word vectors, bag of n-grams, and matrix-vector operations. However, these techniques are limited in word ordering and semantic analysis. This paper proposes a framework that combines two text mining techniques, paragraph vectors and TextRank, for the selection of relevant research paper and extractive summarization, respectively. Our training corpus includes over 20 million research papers. The aim of this work is to build a supplementary research tool that assists researchers in saving time conducting literature reviews. As the result, we can rank all relevant research papers potentially within the corpus, and utilize the outputs in our literature reviews. Moreover, the tool can extract all potential keywords in a single task as well. ฉ 2018 Association for Computing Machinery.
Keywords
Document similarity, Extractive summarization, Literature Review, paragraph vectors, TextRank