劉清臺灣大學:農藝學研究所劉妙宜Liu, Miao-YiMiao-YiLiu2007-11-282018-07-112007-11-282018-07-112004http://ntur.lib.ntu.edu.tw//handle/246246/59129本研究重點為建立具顯性及缺失遺傳標識之連鎖圖譜(linkage map)。正確的基因連鎖圖譜將是影響數量性狀基因座(quantitative trait loci ,QTL)定位與分析結果的重要因素;假設基因連鎖圖譜未知,且基因組上佈滿遺傳標識因子,必須先將屬於同一連鎖群(linkage group)之標識因子區分為同一群,再利用最大概似法同時估算連鎖群上各標識因子間可能出現的互換率,由此可得標識因子最適的排列順序及標識因子間之遺傳距離,達到建立連鎖圖譜的目的。 當連鎖群上標識因子基因型訊息完全已知(fully informative)時,利用多基因座概似函數(multilocus likelihood)同時求算數個標識因子的互換率與利用兩點分析(two point analysis)個別獨立的計算結果相同;然而,當標識因子基因型訊息完全或部分缺失時,則只能以多基因座概似函數,根據連鎖群上所有標識因子的訊息來同時估計互換率。此時,概似函數相當複雜且無法寫出封閉解(closed form solution),只能藉由數值方法如牛頓法(Newton-Raphson method)、EM法(Expectation Maximization)等,運用遞迴(iterative )運算求得近似解。EM演算法因為沒有利用到概似函數之二次微分式,因此收斂速度較慢,並且無法求出最大概似估值之漸近變異矩陣。本文以牛頓法同時求出各標識因子間的互換率之最大概似估計值,因為此法不旦能得到最大概似估值(maximum likelihood estimate),同時亦能得到最大概似估值之漸近變異矩陣(asymptotic covariance matrix),即可評估根據此估值所做推論的可靠性。 本文首先模擬產生BC子代與 子代之標識因子資料,利用牛頓法遞迴求解各標識因子間互換率之最大概似估值,再利用Haldane基因定位函數(Haldane’s mapping function)將估算出來的互換率轉換成遺傳距離,與先前給定的遺傳距離相比較,結果相當接近,且由漸近變異矩陣可知標準差(standard error)非常的小。此外,牛頓法與EM演算法估算各標識因子間互換率之最大概似估值之計算結果幾乎完全相同。The purpose of this study is to construct the linkage map with dominant and missing markers. A correct and accurate gene linkage map is vital for mapping and analysing quantitative trait loci (QTL). If the gene linkage map is unknown for a sequence of markers in the genome, we have to firstly divide the markers in the sequence into linkage groups, and then determine the most likely order of markers and the distances between neighboring markers within a linkage group. This is done by maximum likelihood (ML) method. When markers within a linkage group are fully observed, using the multilocus likelihood function to simultaneously estimate the recombination frequencies for all markers is equivalent to using two point analysis to independently estimate recombination frequency for each pair of markers. However, when some markers are partially observed or missing, the only way to calculate the recombination frequencies of markers is to simultaneously estimate the recombination frequencies according to the information of all markers within a linkage group by multilocos likelihood function. Usually, the multilocus likelihood function is too complicated to have a closed form solution and we can only use numerical analysis methods such as Newton-Raphson or EM algorithms to derive an approximate solution by iteration. The EM algorithm does not use the second order derivatives of likelihood function, so the convergence rate is slower and is unable to calculate the asymptotic covariance matrix of ML estimates. This study simulated the backcross data and F2 intercross data, using the Newton-Raphson method to simultaneously calculate the ML estimates of the recombination frequencies of all markers within a linkage group. The Newton-Raphson method can get not only ML estimates but also the asymptotic covariance matrix of ML estimates, the latter enables us to evaluate the plausibility of our statistical inference based on ML estimates, and then applying Haldane’s mapping function to transform the estimated recombination frequencies into genetic distances. We found the calculated distances are similar to what we originally assigned. The asymptotic covariance matrix showed that the standard errors are pretty small. In addition, the results of ML estimates by the Newton-Raphson method are identical to those of the EM algorithm.中文摘要………………………………………………………….…….Ⅰ 英文摘要………………………………………..………………………Ⅱ 目錄…………………………………………………………….……….Ⅳ 圖目錄……………………………………………………….………….Ⅵ 表目錄………………………………………………………………..…Ⅶ 第一章 前言……………………………………………………………1 第二章 前人研究………………………………………………………4 第一節 連鎖圖譜的演進…………………………………….5 第二節 給定基因排列之順序後計算概似函數估計值…….7 第三章 研究方法……………………………………………………..14 第一節 研究對象…………………………………………...14 第二節 基本連鎖分析……………………………………...16 Ⅰ、EM演算法…………………………………………...20 Ⅱ、牛頓法………………………………………………23 第三節 互換率與遺傳距離………………………………...25 Ⅰ、Haldane基因定位函數之簡介……………………26 Ⅱ、互換率與遺傳距離之關係圖……………………….27 第四節 建立連鎖圖譜………………………………….…..29 Ⅰ、最有可能的基因排列順序…………………………29 Ⅱ、給定基因排列之順序後計算概似函數估計值……30 第五節 多基因座概似函數……………………………...…31 第六節 最大概似估計值的EM解……………………….…33 第七節 最大概似估計值的牛頓解………………………...35 第四章 資料模擬……………………………………………………..40 第一節 模擬子代資料…………………………………..….40 Ⅰ、BC子代………………………………………………42 Ⅱ、 子代……………………………….……….……..44 第二節 利用三點分析法找出最有可能的基因排列順序...48 第三節 模擬結果………………………………………...…51 Ⅰ、BC子代……………………………………..……….51 Ⅱ、 子代…………………………………….…….…..57 第五章 結果與討論…………………………………………….…….63 第一節 研究發現……………………………………...……64 Ⅰ、求出最有可能的基因排列順序………………..…..64 Ⅱ、給定基因排列之順序後計算概似函數估計值…….65 (A) 染色體總長度之估計……………………...……65 (B) 標識因子基因型訊息完全已知之情況……...…66 (C) 標識因子基因型訊息部分缺失或者完全缺失之情況……………………………………………..66 (D) 遺傳距離相等與否之比較………………………67 (E) 變異係數的比較………………………………...68 第二節 後續研究建議……………………………………...70 參考書籍與文獻……………………………………………….……….73 附錄A 三點分析法………………………………….………………..75 附錄B 馬可夫鏈過程……………….………………………………..76 附錄C 兩基因座間的互換率………………………………...………77761643 bytesapplication/pdfen-US遺傳標識連鎖圖譜漸近變異矩陣牛頓法最大概似估計值genetic markerslinkage mapNewton-Raphson metho以牛頓法建立具顯性及缺失標識資料之遺傳連鎖圖Linkage map construction with dominant and missing markers by Newton-Raphson methodthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/59129/1/ntu-93-R91621202-1.pdf