A comparison of statistical methods for identifying differentially expressed genes using Affymetrix oligonucleotide arrays
Date Issued
2005
Date
2005
Author(s)
Lee, Hsin-I
DOI
zh-TW
Abstract
Microarray technology has made it possible to measure the abundance of mRNA transcripts for thousands of genes simultaneously. In particular, Affymetrix high-density oligonucleotide array, a patent for Affymetrix GeneChipTM, is very popular in the scientific community due to its high specificity and reproducible property.
In this study, we first review three statistical methods, Affymetrix Microarray Suite 5.0 (MAS 5.0) (Affymetrix GeneChipTM, 2002), Model Based Expression Index (MBEI) (Li and Wong, 2001) and Robust Multi-array Average (RMA) (Irizarry et al., 2003), that are currently in use for background correction, normalization and expression transformation. Then we evaluate their performance based on significance tests of the resulting fold change estimates obtained from these methods. Student t-test, penalized t-statistic provided by Efron et al. (2002), and nonparametric test (Mann-Whitney test or Wilcoxon signed rank test) (Conover, 1999) are implemented for the significance test. It is shown that MAS 5.0, MBEI and RMA can lead to quite different conclusions for identification of the differentially expressed probe sets.
Therefore, we develop a simulation mechanism to generate replicated experiments. The simulation study is modified from the method recently proposed by Hess and Iyer (2004). Our modified method can mimic naturally occurring data and is based on a real “temperate” array data. For each simulated data set, we directly use the selection probability function proposed by Pepe et al. (2003) with Student t statistic for ranking the expression levels of probe sets. We calculate sensitivity and false discovery rate (FDR) of the three methods based on 100 simulated data sets for various scenarios. We recommend RMA for routine applications because it appears to have higher sensitivity and smaller FDR in all the scenarios under study. Note that MBEI is competitive with RMA in most scenarios. In addition, we develop a practical algorithm to determine sample size of the experiments using Affymetrix oligonucleotide arrays.
In this study, we first review three statistical methods, Affymetrix Microarray Suite 5.0 (MAS 5.0) (Affymetrix GeneChipTM, 2002), Model Based Expression Index (MBEI) (Li and Wong, 2001) and Robust Multi-array Average (RMA) (Irizarry et al., 2003), that are currently in use for background correction, normalization and expression transformation. Then we evaluate their performance based on significance tests of the resulting fold change estimates obtained from these methods. Student t-test, penalized t-statistic provided by Efron et al. (2002), and nonparametric test (Mann-Whitney test or Wilcoxon signed rank test) (Conover, 1999) are implemented for the significance test. It is shown that MAS 5.0, MBEI and RMA can lead to quite different conclusions for identification of the differentially expressed probe sets.
Therefore, we develop a simulation mechanism to generate replicated experiments. The simulation study is modified from the method recently proposed by Hess and Iyer (2004). Our modified method can mimic naturally occurring data and is based on a real “temperate” array data. For each simulated data set, we directly use the selection probability function proposed by Pepe et al. (2003) with Student t statistic for ranking the expression levels of probe sets. We calculate sensitivity and false discovery rate (FDR) of the three methods based on 100 simulated data sets for various scenarios. We recommend RMA for routine applications because it appears to have higher sensitivity and smaller FDR in all the scenarios under study. Note that MBEI is competitive with RMA in most scenarios. In addition, we develop a practical algorithm to determine sample size of the experiments using Affymetrix oligonucleotide arrays.
Subjects
酸晶片
基因表現
重複數
oligonucleotide arrays
gene expression
sample size
Type
thesis