Genotype imputation using LD-based Weighted K Nearest Neighbor
Date Issued
2014
Date
2014
Author(s)
Zeng, Jhih-Wun
Abstract
Detection of single nucleotide polymorphism (SNP) in high-throughput sequencing technologies has become efficient and robust strategies for SNP discovery and genome-Wide association study. However, the conventional high-throughput genotyping techniques often produce a certain proportion of missing calls. It has been long recognized that failing to account for these missing data could dramatically reduce the power of detecting SNPs. A variety of imputation methods have been developed to impute the missing genotypes. Methods based on the K-nearest neighbors (KNN) and weighting K-nearest neighbors (wtKNN) have received some attention by considering the similarities in the haplotype structures. More recently, a number of powerful methods based on hidden Markov model (HMM) have become popular in SNPs imputation. However, these methods are time consuming or mostly suitable for small maker sets imputation and cannot exploit the structure of indirect association of tightly linked SNPs. In this study, We Will propose a novel but computationally simple imputation method that is based on weighting K-nearest neighbors (wtKNN) by considering linkage disequilibrium (LD). We will demonstrate the performance of our method to impute missing SNPs using both Genotyping by sequencing (GBS) data and simulation studies. In addition, we will compare the accuracy and performance of our method with competing imputation methods.
Subjects
imputation
Genome-wide association study
linkage disequilibrium
missing
K-nearest neighbor
single nucleotide polymorphism
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-103-R01621206-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):b131a8a579134346634b554f1b01eeda
