高成炎臺灣大學:資訊工程學研究所張耀霖Chang, Yao-LinYao-LinChang2010-06-022018-07-052010-06-022018-07-052008U0001-1707200823320900http://ntur.lib.ntu.edu.tw//handle/246246/184825蛋白質與 DNA 的交互作用在生物體內的機制中扮演非常關鍵的角色,這些機制包括:基因轉錄、基因重組、基因複製及DNA修復。尋找可能的蛋白質與DNA之結合配對有助於了解細胞中的調控網路,而這也是後基因體時期的重要工作。利用實驗方法來找尋可能的配對通常是昂貴且耗時的,因此我們提出一個三維regulog 的方法來預測可能的結合配對。我們的方法並提供配對的結合模型及交互作用的胺基酸及核苷酸。們提出一個新的計分方法來搭配三維regulog。此計分法結合了與DNA交互作用的殘基 (residue) 之演化保留以及胺基酸及核苷酸之結合傾向。我們的方法在辨識66個與DNA結合的蛋白質家族時有很高的準確率 (precision) 跟求全率(recall)。另一方面,使用我們的方法在預測蛋白質的熱點 (hotspot) 能量時亦有不錯的準確率。我們的方法並在多特定結合 (multi-specific) 蛋白質家族中亦有不錯的辨識度。們更進一步提出一個以知識為基(knowledge-based)的計分矩陣來增進原有計分方法的效能。使用此新的計分矩陣可讓我們預測蛋白質與DNA的結合力(binding affinity),我們在多個不同的測試資料中都有到很好的表現,包括蛋白質與DNA的結晶結構、丙胺酸掃瞄 (Alanine-scaning) 及鋅指蛋白質 (zinc finger protein) 之實驗資料。 我們並用此方法掃瞄酵母菌HO基因的啟動子 (promoter)並找出可能的轉譯結合區(transcription factor binding sites)。Protein-DNA interaction plays a key role in living organisms of many genetic activities such as transcription, recombination, DNA replication and repair. Finding binding pairs of proteins and DNA can help us to understand the regulatory pathway of a cell which is an important task of the post-genomic era. Experimental approaches for finding such pairs usually expensive and time-consuming. We propose computational approach called “3D-regulogs” to large scale infer protein-DNA binding partners by using the concept of regulog and the crystal structures of protein-DNA complex as templates. Such method also provides the binding model and interacting amino acids and DNA bases of predicted partners. he 3D-regulogs uses a scoring method which combines the evolutionary conservation of DNA-contact residues and the preference of interacting residues and nucleotides to evaluate protein-DNA binding partners. By applying the scoring method, we achieve high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. We also obtained high accuracy in predicting binding free energy of hotspot mutation sets. By testing the regulog mapping of multi-specific families, our method showed good performance to identify proteins with distinct DNA-binding specificity. or further enhancing the interaction term of the scoring function, we proposed a novel knowledge-based scoring matrix. By using such proposed scoring method, it achieved high correlation with binding affinities of several test sets, including complexes extracted from PRONIT, the Alanine-scanning set, and the base mutation set of zinc finger proteins. We also use the scoring method to scan promoter regions of yeast HO gene and obtained potential transcription factor binding sites.Abstract iiihapter 1 ntroduction 1.1 Motivation 1.2 Biological Importance of Protein-DNA Interaction 2.2.1 Transcription 2.2.2 Replication 2.2.3 Recombination 3.3 Background 3.4 Template-based approach to model protein-DNA interactions 5.5 Interologs and Regulogs 7.6 Thesis organization 9hapter 2 Evolutionary conservation of DNA- contact residues in DNA-binding domains 10.1 Introduction 10.2 Method 11.2.1 Template library 12.2.3 Scoring method 13.3 Results 13.3.1 Positive and negative set for each contact domain 13.3.2 Determining the threshold of similar DNA-binding function of a contact domain 14.3.3 Non-DNA-binding proteins 15.4 Discussion 16.5 Summary 21hapter 3 Evolutionary conservation and Interacting preference for identifying protein-DNA interactions 22.1 Introduction 22.2 Method 24.2.1 Template preparation 24.2.2 Alignment Tools 25.2.3 Scoring function 25.3 Result 27.3.1 Identifying DNA-binding domains 27.2.2 Free energy prediction between proteins and DNAs 30.4 Discussion 31.4.1 Hormone receptor family 31hapter 4 Knowledge-based Scoring Function for Binding Affinity Prediction 35.1 Introduction 35.1.1 Residue-based binding model of protein-DNA complexes 35.1.2 Scoring matrix construction 38.2 Scoring Method 41.3 General prediction of protein-DNA binding affinities 41.4 Energy evaluation on Alanine-scanning proteins 44.5 Binding affinity prediction of zinc finger proteins 46.5.1 Zinc finger domain 46.5.2 Experimental binding affinities of zinc finger proteins 46.5.3 Evaluation with experimental data 47.6 Transcription factor binding sites detection 48.7 Summary 51hapter 5 Regulogs mapping of DNA-binding protein families 52.1 Classes of DNA-binding protein families 52.2 Dataset 54.3 Result 54.3.1 Identification of positive proteins 54.3.2 Determination of Z-score thresholds 55.4 Summary 57hapter 6 Conclusion 58.1 Summary 58.2 Future work 59ibliography 60ppendix A 75ist of Publications 75application/pdf2006127 bytesapplication/pdfen-US調控網路演化保留熱點鋅指蛋白質啟動子regulogsevolutionary conservationbinding affinityAlanine-scanningzinc finger proteinpromoter以三維regulog之方法預測蛋白質與DNA之交互作用及模型A 3D-regulog approach to predict protein-DNA binding partners and binding modelthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/184825/1/ntu-97-F89922083-1.pdf