Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features

Conference proceedings article


Authors/Editors


Strategic Research Themes

No matching items found.


Publication Details

Author listAnuntakarun S., Wattanapornprom W., Lertampaiporn S.

PublisherHindawi

Publication year2019

Start page65

End page69

Number of pages5

ISBN9781728125442

ISSN0146-9428

eISSN1745-4557

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85081652685&doi=10.1109%2fICSEC47112.2019.8974749&partnerID=40&md5=bd2d33a292ce76486f6904a9cca50c8d

LanguagesEnglish-Great Britain (EN-GB)


View on publisher site


Abstract

In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants. ฉ 2019 IEEE.


Keywords

Prediction methods


Last updated on 2023-02-10 at 07:36