Feature selection in GSNFS-based marker identification
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Sivakorn Kozuevanich, Jonathan H Chan, Asawin Meechai
Publication year: 2019
Title of series: CSBio '19: Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics
URL: https://dl.acm.org/doi/10.1145/3365953.3365964
Abstract
Gene Sub-Network-based Feature Selection (GSNFS) is a method capable of handling case-control and multiclass studies for gene sub-network biomarker identification by an integrated analysis of gene expression, gene-set and network data. It has previously been shown to reasonably identify sub-network markers for lung cancer. However, previous studies have not assessed the importance of each subnetwork identified by GSNFS. In this work, we applied correlation-based and information gain feature selection techniques to rank the identified sub-network biomarkers (gene-set). First, the top- and bottom- 5 ranked gene-sets were selected and investigated the classification performance. Expectedly, the top-ranked gene-sets provided an excellent performance while the bottom-ranked gene-sets showed a poor performance. The identified top-ranked gene-sets such as MAPK signalling pathway were known to relate to cancer. Furthermore, combined top-ranked gene-sets from top 2 up to top 30 showed a further improvement on the performance when compared to using individual gene-sets. The results in this study are promising as significantly fewer subnetworks were needed to build a classifier and gave a comparable performance to a full data-set classifier.
Keywords
No matching items found.