Predicting the effect of variants on splicing using convolutional neural networks

บทความในวารสาร


ผู้เขียน/บรรณาธิการ


กลุ่มสาขาการวิจัยเชิงกลยุทธ์


รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่งThanapattheerakul, Thanyathorn; Engchuan, Worrawat; Chan, Jonathan H;

ผู้เผยแพร่PeerJ

ปีที่เผยแพร่ (ค.ศ.)2020

วารสารPeerJ – the Journal of Life & Environmental Sciences (2167-8359)

Volume number8

นอก2167-8359

eISSN2167-8359

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85090176703&doi=10.7717%2fpeerj.9470&partnerID=40&md5=f98821b4f5696f5d85e5ccdb78331bd7

ภาษาEnglish-Great Britain (EN-GB)


ดูในเว็บของวิทยาศาสตร์ | ดูบนเว็บไซต์ของสำนักพิมพ์ | บทความในเว็บของวิทยาศาสตร์


บทคัดย่อ

Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10−7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition. © 2020 Thanapattheerakul et al.


คำสำคัญ

Binding sitesConvolutional neural networks (CNN)Deep learningGenomic variantsRNA Splice SitesSplice siteSplicing events


อัพเดทล่าสุด 2023-25-09 ถึง 07:36