黃乾綱臺灣大學:工程科學及海洋工程學研究所黃俊欽Huang, Chun-ChinChun-ChinHuang2010-07-142018-06-282010-07-142018-06-282009U0001-1208200922061900http://ntur.lib.ntu.edu.tw//handle/246246/188998蛋白質和DNA的交互作用通常牽涉到DNA的轉錄、複製、遺傳訊息傳送、或是基因重組等重要生化作用。而蛋白質與DNA的結合特性又可分為序列專一性結合以及非專一性結合。序列專一性結合能夠去辨識特定的DNA鹼基對部份;另一方面,非專一性結合主要是與DNA的醣基-磷酸部份進行反應。論文第一階段在討論結合殘基預測。對具序列專一性結合殘基的預測,分類預測器能夠達到96.45%的精確度、50.14%的靈敏度、99.31%的專一性、以及81.70%的準確度和高達62.15%的F型測量值;而非專一性結合預測器可達到89.14%的精確度、53.06%的靈敏度、95.25%的專一性、以及65.47%的準確性和高達58.62%的F型測量值。此外,我們將兩項預測結果進行OR運算後,可獲得89.26%的精確度、56.86%的靈敏度、95.63%的專一性、以及71.92%的準確性和63.51%的F型測量值。論文第二階段則探討蛋白質-DNA結合模式的預測,所設計的多類型分類的支援向量機可達到75.83%的精確度。論文研究呈現了以序列資訊為基礎的預測分類器,且該分類器能夠針對與DNA結合機制有關的轉錄因子,預測序列專一性結合殘基以及非專一性結合殘基。而發展蛋白質-DNA結合型態的預測器,其目標是希望能夠提供生化學者額外的結構預測資訊,並進一步提升殘基的預測表現。此外,我們也從本實驗中學習相關經驗,將經驗應用在轉錄因子以外的蛋白質類型的結合性殘基預測。Protein-DNA interactions are essential for fundamental biochemical activities including DNA transcription, replication, packaging, repair and rearrangement. Proteins interacting with DNA can be classified into two modes distinguished by sequence-specific and non-specific binding respectively. Protein-DNA specific binding provides a mechanism to recognize correct nucleotide base pairs namely sequence-specific identification. On the other hand, protein-DNA non-specific binding shows relatively little base-sequence preference and interacts with DNA backbone.n this thesis, we present a two stage Protein-DNA binding prediction. In the first stage of DNA-binding residues prediction, the predictor for DNA specific binding residues achieves 96.45% accuracy with 50.14% sensitivity, 99.31% specificity, 81.70% precision, and 62.15% F-measure. The predictor for DNA non-specific binding residues achieves 89.14% accuracy with 53.06% sensitivity, 95.25% specificity, 65.47% precision, and 58.62% F-measure. In addition, we combine the results of sequence-specific and non-specific binding residues predicted in previous stage with OR operation, and the predictor achieves 89.26% accuracy with 56.86% sensitivity, 95.63% specificity, 71.92% precision, and 63.51% F-measure. In the second stage, a protein-DNA interaction mode predictor is proposed. It can achieve 75.83% accuracy while using support vector machine with multi-class prediction.his article presents the design of a sequence-based predictor aiming to identify the sequence-specific and non-specific DNA-binding residues in a transcription factor with DNA binding-mechanism concerned. The protein-DNA interaction mode prediction was introduced to provide biochemist more structural hint and help improve previous DNA-binding residues prediction. In addition, we will exploit the experiences learned in this study to design binding-mechanism concerned predictors for other types of DNA-contacted proteins.誌謝 i要 iiBSTRACT iii有名詞對照 v錄 vi目錄 viii目錄 xhapter1 導論 1hapter2 相關工作 5.1 專一性結合與非專一性結合 5.2 預測方法之相關文獻探討 8.3 資料集合 (Dataset) 的取得 10.4 定義結合性殘基 12.5 定義蛋白質-DNA結合型態 13.6 分類器套件—LIBSVM 18.7 其它工具及專有名詞 21hapter3 實驗方法 25.1 實驗架構與分類器 25.2 特徵選取與向量編碼 27.3 資料正規化處理 30.4 驗證方法 30.5 獨立測試 31hapter4 實驗結果與討論 34.1 最佳參數選取 34.2 表現評估 34hapter5 結論 49獻參考 53錄 581856387 bytesapplication/pdfen-US結合性殘基預測序列專一性結合非專一性結合支援向量機轉錄因子DNA-binding residues predictionsequence-specific bindingnon-specific bindingsupport vector machinetranscription factor以去氧核糖核酸作用之專一性及非專一性結合殘基預測結果為基礎進而推論蛋白質序列上蛋白質-核酸結合類型Prediction of Transcription Factor Domain based on Analysis of Specific and non-Specific DNA-Binding Residues on the Protein Sequencethesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/188998/1/ntu-98-R96525072-1.pdf