Selecting additional tag SNPs for tolerating missing data in genotyping

Huang, Yao-Ting; Zhang, Kui; Chen, Ting; KUN-MAO CHAO; Huang, Yao-Ting; Zhang, Kui; Chen, Ting; Chao, Kun-Mao

doi:10.1186/1471-2105-6-263

Selecting additional tag SNPs for tolerating missing data in genotyping

Resource

BMC Bioinformatics 6: 263

Journal

BMC Bioinformatics

Journal Volume

6

Journal Issue

263

Date Issued

2005

Date

2005

Author(s)

Huang, Yao-Ting

Zhang, Kui

Chen, Ting

KUN-MAO CHAO

DOI

10.1186/1471-2105-6-263

URI

http://ntur.lib.ntu.edu.tw//handle/246246/154529

https://www.scopus.com/inward/record.uri?eid=2-s2.0-27744576805&doi=10.1186%2f1471-2105-6-263&partnerID=40&md5=d2c8e42a32a2e01c0da85699227e421f

Abstract

Background: Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. Results: We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective. Conclusion: Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution. ? 2005 Huang et al., licensee BioMed Central Ltd.

SDGs

[SDGs]SDG3

Other Subjects

Genotyping; Greedy algorithms; Haplotypes; Human population; Linear programming relaxation; Linkage disequilibrium; Missing data; Optimal solutions; Optimal systems; Algorithms; algorithm; article; controlled study; cost control; cost effectiveness analysis; genotype; haplotype; single nucleotide polymorphism; theoretical study; biological model; biology; chromosome map; computer program; computer simulation; DNA sequence; gene frequency; genetic database; genetic marker; genetic predisposition; human; human genome; methodology; nucleotide sequence; procedures; statistical analysis; genetic marker; Algorithms; Chromosome Mapping; Computational Biology; Computer Simulation; Data Interpretation, Statistical; Databases, Genetic; DNA Mutational Analysis; Gene Frequency; Genetic Markers; Genetic Predisposition to Disease; Genome, Human; Genotype; Haplotypes; Humans; Models, Genetic; Polymorphism, Single Nucleotide; Research Design; Sequence Analysis, DNA; Software

Type

journal article

File(s)

Name

09.pdf

Size

510.18 KB

Format

Adobe PDF

Checksum

(MD5):5f673c4f73bb4e5e6f48cac0eb91211b

Selecting additional tag SNPs for tolerating missing data in genotyping

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)