https://scholars.lib.ntu.edu.tw/handle/123456789/520974
標題: | A composite model for subgroup identification and prediction via bicluster analysis | 作者: | Chen H.-C. Zou W. TZU-PIN LU Chen J.J. |
公開日期: | 2014 | 出版社: | Public Library of Science | 卷: | 9 | 期: | 10 | 來源出版物: | PLoS ONE | 摘要: | Conclusion: The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.Background: A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response.Methods: This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms.Results: The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample's subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908635215&doi=10.1371%2fjournal.pone.0111318&partnerID=40&md5=bdb072aea9eda30e2a942616c6a864ab https://scholars.lib.ntu.edu.tw/handle/123456789/520974 |
ISSN: | 1932-6203 | DOI: | 10.1371/journal.pone.0111318 | SDG/關鍵字: | biological marker; tumor marker; Article; breast cancer; cancer classification; cancer patient; classification algorithm; cluster analysis; diagnostic accuracy; diagnostic test accuracy study; diagonal linear discriminant analysis; genotype; human; lung adenocarcinoma; lung cancer; lung squamous cell carcinoma; nonhuman; phenotype; prediction; random forest; Salmonella; sensitivity and specificity; serotype; support vector machine; algorithm; classification; cluster analysis; information processing; Algorithms; Biomarkers, Tumor; Cluster Analysis; Datasets as Topic; Humans |
顯示於: | 流行病學與預防醫學研究所 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。