Unsupervised algorithms for population classification and ancestry informative marker selection
Conference proceedings article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Rodpan A., Wangkumhang P., Assawamakin A., Prom-On S., Tongsima S.
Publisher: Springer Verlag (Germany): Computer Proceedings
Publication year: 2010
Volume number: 115 CCIS
Start page: 208
End page: 216
Number of pages: 9
ISBN: 3642167497; 9783642167492
ISSN: 1865-0929
Languages: English-Great Britain (EN-GB)
View in Web of Science | View on publisher site | View citing articles in Web of Science
Abstract
Single Nucleotide Polymorphisms (SNPs) can be used to identify the differences among populations. However, for high-level organisms, there are numerous number of SNPs distributed throughout entire of the genomes. Animal breeders can make use of these genetic markers to different subpopulations. For economical purpose, finding a minimum number of SNPs that can accurately identify different breeds is needed. In this paper, given a set of SNP genotyping samples, without knowing what breed a sample belong to (unlabeled samples), we developed a framework to classify these samples into different animal groups (breeds) based on their genotyping profiles. The proposed framework further identifies a small set of SNPs, called ancestry informative markers (AIMs) that can accurately classify these samples to these groups. The proposed framework adopted the Principal Component Analysis (PCA) technique, and Student's t-test, to cluster unlabeled genotype data and determine AIMs, respectively. This unsupervised approach can avoid potential ascertainment biases due to mistakenly label some samples or having unlabeled data to be classified. ฉ 2010 Springer-Verlag Berlin Heidelberg.
Keywords
AIMs, ancestry informative markers, population structure, Student's t-test