廖振鐸臺灣大學:農藝學研究所余亭琬Yu, Ting-WanTing-WanYu2007-11-282018-07-112007-11-282018-07-112004http://ntur.lib.ntu.edu.tw//handle/246246/59121隨著生物技術的快速發展,雙色微陣列晶片實驗的應用日漸廣泛。然而在微陣列晶片實驗過程中,常包含釵h系統誤差,造成實驗最後所得到的數據往往不夠準確,因此如何將資料中所包含的系統誤差扣除,找出真正表現有顯著差異的基因是釵h學者爭相討論的問題。 本研究中主要針對dye-swap實驗下的雙色微陣列晶片實驗資料,利用對數比模型來作資料分析,對數比模型與一般微陣列晶片資料分析所採用的ANOVA模型不同之處在於:一般ANOVA模型是以每個光強度的對數值當反應變數,而我們使用的對數比模型則以成對的紅、綠光強度比值的對數值當反應變數,其中又可分為一階段對數比模型及兩階段對數比模型。一階段對數比模型是假設所有基因的反應變數為齊質變方,但由於在微陣列實驗中,大多數的基因是表現無顯著差異,因此可能會造成表現有顯著差異的基因變方被低估。而兩階段對數比模型顧名思義即是將模型分為兩個部分來作探討。模型的第一階段主要在進行正規化的動作,而在模型的第二階段又根據假設的不同分為gene-by-gene模型以及混合機率分佈模型。gene-by-gene模型是假設各基因間非齊質變方,所以一個基因就有一個模型。混合機率分佈模型則是將基因表現分成表現有顯著差異及沒有顯著差異兩個族群來作討論。至於在判定顯著基因方面,一階段對數比模型採用一般的學生氏t檢定,gene-by-gene模型則利用Efron et al.(2001)提出的Sg統計量,而混合機率分佈模型則利用事後機率的勝算判定基因表現是否有顯著差異。 本研究最後也透過一階段對數比模型和gene-by-gene模型來討論決定重複數的問題。由於一階段對數比模型是在齊質變方的假設下,因此估計出來的變方可能過小,使得實驗所需的重複數被低估。而gene-by-gene模型中,N個基因就有N個變方,我們建議利用N個變方的第90百分位數作為決定重複數的依據。Microarray technology is a powerful tool to detect the expression level of many thousands of genes. However there are many sources of systematic variation which may bias the estimation of the gene expression. Hence how to remove the systematic variation and estimate the gene expression correctly are important topics in the micr- oarray experiment. In this study, we focus on the dye-swap two-color DNA spotted microarray experiments. We try to analyze the data collected from this kind of microarray experiments by some log-ratio models, which can be classified as one-stage log-ratio models and two-stage log-ratio models. In one-stage log- ratio models, we assume the variances of the gene expression for different genes are homogenous. However most genes in microarray experiments are not significantly expressed, hence the estimate of this unique variance may be actually smaller than the true values for the rest significant genes. Consider the two-stage log-ratio models, the first part of the two-stage log-ratio models can be regarded as a global normalization model and the second part is a gene-specific model. Moreover the gene-specific model can be regarded as gene-by-gene models or mixture probability density function models under different assumptions. The variances of the gene expression are assumed to be different in the gene-bye-gene models, leading to that every gene has it own model. Mixture probability density function models regard genes in microarray experiments as significantly expressed genes and non-significantly expressed genes, each population has its own probability density function. The Student’s t statistic is used in one stage log-ratio models to identify differentially expressed genes. Similarly, Sg statistic proposed by Efron et al. (2001) is used in two-stage gene-by-gene models. In mixture probability density function models, differentially expressed genes are determined by the posterior odds which is a kind of Bayesian approach. Finally, we consider the sample size based on the one-stage log-ratio models and the gene-by-gene models, respectively.目 錄 第一章 前言 …………………………………………………1 1.1 雙色微陣列晶片試驗之簡介……………………………… 1 1.2 研究動機與目的…………………………………………… 2 1.3 文獻探討…………………………………………………… 3 第二章 正規化及鑑別有顯著差異的基因………………… 4 2.1 正規化方法 ………………………………………………… 4 2.2 判定顯著基因…………………………………………………9 2.3 對數比模型………………………………………………… 11 2.4 實例應用…………………………………………………… 19 第三章 重複數之研究 ………………………………………44 3.1 多重比較…………………………………………………… 44 3.2 重複數研究………………………………………………… 46 3.3 結果與討論………………………………………………… 48 第四章 結論與未來研究……………………………………58 4.1 總結………………………………………………………… 58 4.2 未來研究及討論…………………………………………… 59 參考文獻 ……………………………………………………61 附錄A 對數比模型之S-plus程式…………………………63 附錄B 重複數之S-plus程式………………………………6716311096 bytesapplication/pdfen-US裂區設計重複數染劑對調點印微陣列試驗microarraysample sizedye-swap染劑對調雙染色點印微陣列試驗之研究Data Analysis and Determination of Sample Size for the Dye-Swap Two-Color Spotted Microarray Experimentsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/59121/1/ntu-93-R91621208-1.pdf