https://scholars.lib.ntu.edu.tw/handle/123456789/521835
標題: | One-step extrapolation of the prediction performance of a gene signature derived from a small study | 作者: | Wang L.-Y. WEN-CHUNG LEE |
公開日期: | 2015 | 出版社: | BMJ Publishing Group | 卷: | 5 | 期: | 4 | 來源出版物: | BMJ Open | 摘要: | Objective: Microarray-related studies often involve a very large number of genes and small sample size. Cross-validating or bootstrapping is therefore imperative to obtain a fair assessment of the prediction/classification performance of a gene signature. A deficiency of these methods is the reduced training sample size because of the partition process in cross-validation and sampling with replacement in bootstrapping. To address this problem, we aim to obtain a prediction performance estimate that strikes a good balance between bias and variance and has a small root mean squared error. Methods: We propose to make a one-step extrapolation from the fitted learning curve to estimate the prediction/classification performance of the model trained by all the samples. Results: Simulation studies show that the method strikes a good balance between bias and variance and has a small root mean squared error. Three microarray data sets are used for demonstration. Conclusions: Our method is advocated to estimate the prediction performance of a gene signature derived from a small study. ? 2015, BMJ Publishing Group. All rights reserved. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84928264580&doi=10.1136%2fbmjopen-2014-007170&partnerID=40&md5=2e8c314623c7247a00338850dfbeccf7 https://scholars.lib.ntu.edu.tw/handle/123456789/521835 |
ISSN: | 2044-6055 | DOI: | 10.1136/bmjopen-2014-007170 | SDG/關鍵字: | area under the curve; Article; bootstrapping; breast cancer; cancer tissue; colon; colon cancer; controlled study; gene; gene expression; gene signature; genetic database; genetic procedures; human; human tissue; learning curve; machine learning; major clinical study; measurement error; microarray analysis; Monte Carlo method; prediction; random forest; sample size; small root mean squared error; support vector machine; variance; breast tumor; colon tumor; gene expression profiling; gene expression regulation; genetics; reproducibility; statistical model; statistics; Area Under Curve; Breast Neoplasms; Colonic Neoplasms; Gene Expression Profiling; Gene Expression Regulation, Neoplastic; Humans; Learning Curve; Linear Models; Monte Carlo Method; Reproducibility of Results; Sample Size; Statistics as Topic |
顯示於: | 流行病學與預防醫學研究所 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。