莊曜宇臺灣大學:電機工程學研究所陳鴻毅Chen, Hung-I HarryHung-I HarryChen2007-11-262018-07-062007-11-262018-07-062007http://ntur.lib.ntu.edu.tw//handle/246246/53527基因體的變異是腫瘤生成發展的主因之一。已經有許多的研究證明DNA序列拷貝數的異常對癌症致病是有重大的相關性。比較基因體雜合微陣列(array CGH)是依據基因表現的微陣列晶片的技術所研發,其可以以高解析度找出染色體上序列拷貝數的變異。然而,由於array CGH先天上的特性,許多針對基因表現的資料所使用的分析工具,如資料正規化演算法,通常無法得到令人滿意的結果。在此我們闡述一個新的array CGH正規化演算法,其可以利用在array CGH實驗中,染色體上相鄰位置探針的相依性來提供精準的array CGH資料的正規化。 為了驗證此正規化演算法的表現,我們也利用隱馬爾可夫模型(HMM)來發展一套模擬系統,其可以模擬出有隨機DNA序列拷貝數變化的array CGH實驗的資料組。另外,我們也將我們的演算法去對CL1-0, CL1-1和CL1-5這三種細胞株的array CGH實驗資料作正規化來比較之間的結果。 CL1-0, CL1-1和CL1-5是依據不同的侵入性作分類,之間關係極為接近的肺癌細胞株。經由正規化後,不只使資料的品質顯著的改善,也強化了實驗結果的可靠度。藉由這個新發展的演算法,正規化後的資料呈現顯著的DNA序列拷貝數變化。最後,以此演算法為基礎,我們未來也將建立一個對使用者友善的線上系統來提供方便的array CGH資料的分析。Genomic instability is one of fundamental factors in tumorigenesis and tumor progression. Many studies have shown that copy-number abnormalities at the DNA level are important in the pathogenesis of cancer. Array Comparative Genomic Hybridization (array CGH), developed based on expression microarray technology, can reveal the chromosomal aberrations in segmental copies at a high-resolution. However, due to the nature of array CGH, many standard expression data processing tools, such as data normalization, often failed to yield satisfactory results. We demonstrate a novel array CGH normalization algorithm, which provides an accurate array CGH data normalization by utilizing the dependency of neighboring probe measurements in array CGH experiments. To facilitate the study, we have developed a Hidden Markov Model (HMM) to simulate a series of array CGH experiments with random DNA copy number alterations that can be used to validate the performance of our normalization. In addition, we applied our algorithm to normalize real data from an array CGH study of CL1-0, CL1-1 and CL1-5 cell lines. CL1-0, CL1-1 and CL1-5 are closely related lung cancer cell lines which are classified according to their differential invasiveness. The normalization made significant improvement over data quality and enhanced the reliability of experimental results. By using this newly developed algorithm, the normalized data showed distinct patterns of DNA copy number alternations among those lung cancer cell lines. Finally, based on this new development; we are establishing a user-friendly web-based system to provide convenient online array CGH data analysis.口試委員會審定書 ……………………………………………………… I 謝誌 ………………………………………………………………………. II 中文摘要 ………………………………………………………………….. III Abstract …………………………………………………………………….. IV Chapter 1 Introduction ……………………………………………………… 1 1.1 Background and Motivation of the Study …………………………. 1 1.2 The purpose and framework of the study ……………………….... 2 Chapter 2 Introductions to Array Comparative Genomic Hybridization …... 5 2.1 Comparative Genomic Hybridization Analysis …………………… 5 2.2 Array Comparative Genomic Hybridization ……………………... 7 Chapter 3 Materials and Methods ………………………………................. 11 3.1 Ridge-tracing Normalization Algorithm ………………................. 12 3.1.1 Quantile Normalization…………………………………… 13 3.1.2 2D Kernel Smoothing Algorithm…………………………. 15 3.1.3 Regression Methods ...……………………………………. 16 3.2 Probe Ratio Distribution and Mode Detection …….....………….. 21 3.3 Array CGH Simulation by Hidden Markov Model …….………… 24 3.4 Validation By Using Real aCGH Data ………………………….. 28 3.4.1 CL1-0, CL1-1, and CL1-5 Cell Lines …………………….. 29 3.4.2 HEEBO Microarray ……………………………………….. 29 3.4.3 Agilent Microarray ………………………………………… 29 Chapter 4 Results ...………………………………………………………..... 31 4.1 Performance of Ridge-Tracing and Normalization Algorithm ……. 32 4.2 Probe Ratio Mode Determination and aCGH Data Centralization … 33 4.3 Comparison of Four Regression Methods …….……………………. 38 4.4 Applications of Normalization Algorithm to aCGH Hybridization … 41 4.4.1 Results of HEEBO Microarray………………………………. 42 4.4.2 Results of Agilent Microarray……………………………….. 46 Chapter 5 Discussion …………………..…………………………………….. 50 Chapter 6 Conclusion …..……………………………………………………. 54 References .………..………………………………………………………… 56 圖 目 錄 Figure 1-1 The architecture of our development ………………………… 4 Figure 2-1 Scheme of the basic steps of CGH analysis …………………. 6 Figure 2-2 Scheme of array CGH experiment …………………………… 9 Figure 2-3 Examples of array CGH profiles on Chromosome 1 …………. 9 Figure 2-4 Factors which influence the success of array CGH …………… 10 Figure 3-1 Array CGH data and visualization …………………………… 11 Figure 3-2 An example and the flow chart of quantile normalization ....... 14 Figure 3-3 The identical distribution normalized by quantile normalization …………………………………………………. 15 Figure 3-4 Logistic function (with1747918 bytesapplication/pdfen-US基因體雜合微陣列去氧核醣核酸拷貝數正規化向心化隱馬爾可夫模型Array CGHDNA copy numbersNormalizationCentralizationHidden Markov Model[SDGs]SDG3發展一嶄新之比較基因體雜合微陣列正規化演算法Development of a Normalization Algorithm for Array Comparative Genomic Hybridizationthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53527/1/ntu-96-R94921059-1.pdf