https://scholars.lib.ntu.edu.tw/handle/123456789/511133
標題: | Using hamming distance as information for SNP-sets clustering and testing in disease association studies | 作者: | CHARLOTTE WANG Kao W.-H. CHUHSING KATE HSIAO Wei Z. |
公開日期: | 2015 | 出版社: | Public Library of Science | 卷: | 10 | 期: | 8 | 來源出版物: | PLoS ONE | 摘要: | The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers. ? 2015 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84943144973&doi=10.1371%2fjournal.pone.0135918&partnerID=40&md5=0e2f595c6ac5a53e37bf92079b1ed1de https://scholars.lib.ntu.edu.tw/handle/123456789/511133 |
ISSN: | 1932-6203 | DOI: | 10.1371/journal.pone.0135918 | SDG/關鍵字: | algorithm; Article; clustering algorithm; coronary artery disease; disease association; disease predisposition; gene linkage disequilibrium; genetic association; genetic susceptibility; genotype; Hamming distance; haplotype; human; intermethod comparison; mathematical computing; phylogenetic tree; simulation; single nucleotide polymorphism; chromosomal mapping; genetic association study; genetic predisposition; genetics; signal noise ratio; single nucleotide polymorphism; statistics and numerical data; Algorithms; Chromosome Mapping; Genetic Association Studies; Genetic Predisposition to Disease; Genotype; Haplotypes; Humans; Linkage Disequilibrium; Polymorphism, Single Nucleotide; Signal-To-Noise Ratio |
顯示於: | 流行病學與預防醫學研究所 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。