2013-08-012024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/679845摘要:瞭解基因調控(gene regulation)是計算生物學研究領域中最重要的問題之一,隨著高通量(high-throughput)的資料與系統性的分析方法陸續被應用於瞭解細胞在各種狀況下的基因調控網路(gene regulatory networks)之研究上,研究人員發現基因調控不僅僅是由轉錄因子(transcription factor, TF)結合DNA 序列上的順式調節因子(cis-elements)這麼簡單的機制而已,許多後轉錄(post-transcription)與後轉譯(post-translation)的調控機制與因子也慢慢被發現並受到重視。但僅管我們所知道能影響基因表現的因子越來越多,轉錄(transcription)程序的開啟仍然是基因表現成蛋白質的過程中最關鍵的步驟之一,而能直接決定轉錄程序效率的,即是轉錄起始位置(transcription start site, TSS)上下游鄰近的DNA 序列,這些被稱為啟動子(promoters)的序列帶有其下游基因該在何時被表現與該表現多少數量的最直接資訊。啟動子活性預測(promoter activity prediction)即是試圖回答這樣的一個問題:我們是否能從啟動子序列特徵預測其下游基因的表現量? 過去許多重要的研究發表都顯示這個想法是可行的,但經過多年的努力,啟動子活性預測的準確度仍然有非常大的進步空間,因此,本計畫擬引入兩個創新的概念,試圖進一步提升啟動子活性預測的準確性。首先,本計畫將延續本研究室過去三年的研究成果,將蛋白質的結構資訊用於提升預測轉錄因子與順式調節因子之間的親和性之準確度;再者,本計畫擬將長非編碼RNA(long non-coding RNA, lncRNA)可能之影響也一起納入預測模型之中。準確性高的啟動子預測模型將可幫助我們確認各調控因子(不論是TF 或lncRNA)與被調控基因之關聯性,本計畫擬將所開發的方法應用於人類、小鼠、果蠅與酵母菌所有的啟動子序列上,建立所有編碼基因(coding genes)與非編碼基因(non-coding genes)之可能調控因子的資料庫,並將所開發的預測工具經國際期刊發表後供研究學者使用。 <br> Abstract: Understanding gene regulation remains as one of the most important issues in the study of computational biology. With the advance of high-throughput technologies and systematically analytic methods, many post-transcriptional and post-translational regulators have been found. Though more and more factors have to be considered in constructing gene regulatory networks, the promoter activity plays the major role in determining the intensity of gene expression. It is the promoter regions that control the efficiency of transcription and are widely studied for the past decade. This project aims at developing a new methodology for predicting promoter activity by including two novel strategies. First, the structure information of the DNA-binding domains will be considered when quantifying the binding affinity between transcription factors (TFs) and cis-elements. Second, the regulatory role of long non-coding RNAs (lncRNAs) will be considered. By constructing predicting models with a higher accuracy, we are able to further confirm the regulatory relationships between the key factors (no matter TFs or lncRNAs) and the regulated genes. The proposed method will be applied to all the promoter sequences in the human, mouse, fruit fly, and yeast genomes, for both the coding and non-coding genes. Afterward, the predicted information will be deposited in local databases and the developed computational tools will be released for public use.基因調控高通量資料分析轉錄因子長非編碼RNA啟動子活性預測gene regulationhigh-throughput data analysistranscription factorslong non-coding RNAlncRNApromoter activity prediction桂冠型研究計畫【探尋長非編碼 RNA(lncRNA)調控啟動子活性之可能性與建立預測模型】