廖振鐸臺灣大學:農藝學研究所李欣怡Lee, Hsin-IHsin-ILee2007-11-282018-07-112007-11-282018-07-112005http://ntur.lib.ntu.edu.tw//handle/246246/59154生物晶片能同時平行檢測成千上萬基因的 mRNA 含量,間接說明基因的表現程度。Affymetrix GeneChipTM 公司的專利產品—Affymetrix 高密度寡聚核苷酸晶片 (high-density oligonucleotide array),為一種精準度及再現性 (reproducibility) 較高的 DNA 生物晶片。 Affymetrix GeneChipTM 公司 (2002) 的 Affymetrix Microarray Suite 5.0 (MAS 5.0)、Li and Wong (2001) 的 Model Based Expression Index (MBEI)、Irizarry et al. (2003) 的 Robust Multi-array Average (RMA) 為目前常用之三種表現量轉換方法。在我們的研究中,利用學生氏t檢定、Efron et al. (2001) 的 penalized t-statistic、無母數檢定 (Mann-Whitney test 或 Wilcoxon signed rank test) (Conover, 1999) 以及結合學生氏t統計值或無母數統計值之 Pepe et al. (2003) 的 selection probability function 選拔方法,討論三種表現量轉換方法之鑑別顯著差異表現探針組 (probe sets) 的結果,發現三種表現量轉換方法是有差異的。 此外,我們修正 Hess and Iyer (2004) 的模擬方法,在 R 環境下模擬試驗數據。利用統計模擬,我們間接證明了 Affymetrix 高密度寡聚核苷酸晶片的再現性。我們使用結合學生氏t統計值或無母數統計值之 selection probability function 選拔方法,以 sensitivity、specificity 和 false discovery rate 比較三種表現量轉換方法。在我們的研究中,建議使用 RMA 表現量轉換方法,MBEI 為緊跟其後具競爭力的方法。我們建議使用 Rat 230A 晶片試驗之重複數 (sample size) 在3 ~ 7之間。學生氏t統計值為相較於 Mann-Whitney test 統計值高效且穩定之 selection probability function 選拔方法統計值的選擇。最後將討論重複數的部分整合成一非常實用的演算法 (algorithm),提供給研究人員作為決定重複數之參考,以期能在試驗成本及效率之間取得平衡。Microarray technology has made it possible to measure the abundance of mRNA transcripts for thousands of genes simultaneously. In particular, Affymetrix high-density oligonucleotide array, a patent for Affymetrix GeneChipTM, is very popular in the scientific community due to its high specificity and reproducible property. In this study, we first review three statistical methods, Affymetrix Microarray Suite 5.0 (MAS 5.0) (Affymetrix GeneChipTM, 2002), Model Based Expression Index (MBEI) (Li and Wong, 2001) and Robust Multi-array Average (RMA) (Irizarry et al., 2003), that are currently in use for background correction, normalization and expression transformation. Then we evaluate their performance based on significance tests of the resulting fold change estimates obtained from these methods. Student t-test, penalized t-statistic provided by Efron et al. (2002), and nonparametric test (Mann-Whitney test or Wilcoxon signed rank test) (Conover, 1999) are implemented for the significance test. It is shown that MAS 5.0, MBEI and RMA can lead to quite different conclusions for identification of the differentially expressed probe sets. Therefore, we develop a simulation mechanism to generate replicated experiments. The simulation study is modified from the method recently proposed by Hess and Iyer (2004). Our modified method can mimic naturally occurring data and is based on a real “temperate” array data. For each simulated data set, we directly use the selection probability function proposed by Pepe et al. (2003) with Student t statistic for ranking the expression levels of probe sets. We calculate sensitivity and false discovery rate (FDR) of the three methods based on 100 simulated data sets for various scenarios. We recommend RMA for routine applications because it appears to have higher sensitivity and smaller FDR in all the scenarios under study. Note that MBEI is competitive with RMA in most scenarios. In addition, we develop a practical algorithm to determine sample size of the experiments using Affymetrix oligonucleotide arrays.第一章 前言 1 1.1 Affymetrix高密度寡聚核苷酸晶片之簡介……………………….1 1.2 研究動機與目的……………………………………………………3 第二章 Affymetrix高密度寡聚核苷酸晶片探針組的表現量 5 2.1 晶片背景值校正……………………………………………………5 2.2 晶片正規化…………………………………………………………7 2.3 表現量轉換………………………………………………………..11 2.4 實際試驗資料分析………………………………………………..15 第三章 鑑別具表現差異的探針組 19 3.1 檢定方法…………………………………………………………..19 3.2 Pepe et al. (2003)之selection probability function………………..27 第四章 統計模擬比較及重複數之決定 32 4.1 Hess and Iyer (2004)的模擬方法………………………………….32 4.2 模擬Affymetrix高密度寡聚核苷酸晶片資料……………………34 4.3 模擬結果與討論…………………………………………………...42 4.4 選擇適當重複數之演算法………………………………………...54 第五章 結論與未來研究 55 5.1 總結………………………………………………………………..55 5.2 未來研究…………………………………………………………..57 參考文獻……………………………………………………………………...58 附錄A R和 Bioconductor 的下載及使用………………………………...61 附錄B 討論重複數之R程式………………………………………………62en-US高密度寡聚核苷酸晶片基因表現重複數oligonucleotide arraysgene expressionsample sizeAffymetrix 高密度寡聚核苷酸晶片試驗統計分析方法之比較A comparison of statistical methods for identifying differentially expressed genes using Affymetrix oligonucleotide arraysthesis