劉長遠臺灣大學:資訊工程學研究所何政融Ho, Cheng-JungCheng-JungHo2007-11-262018-07-052007-11-262018-07-052007http://ntur.lib.ntu.edu.tw//handle/246246/53676蛋白質結構的分析及排列是一個在生物學及醫學上有著重要的應用而受到廣泛大眾重視的研究領域。在這一篇文章中,我們介紹了一個使用特徵之間相連關係的類神經網路蛋白質結構搜尋方法。這個方法的目的是為了在一個蛋白質結構的資料庫中找出含有特定結構區塊的蛋白質。基於一個手寫中文字辨認的方法,我們設計了這一個承襲了其效能的蛋白質結構搜尋方法。這個方法是一個較為粗略的方法,它能指出資料庫中有可能包含特定結構的蛋白質但是我們仍然需要使用其他的技巧來找出其結構在蛋白質中的正確位置。 我們設計了一個不受旋轉及位移影響的特徵擷取方法並使用Procrustes演算法來找出蛋白質結構中可能的重要區塊。Procrustes演算法原本是用於語言分吸上的方法,我們必須在不改變其特質的情況下將其修改為能應用於蛋白質結構上。蛋白質及其重要區塊間的相適性評估是這個方法的一大重點,我們將其化為一最佳化問題並使用霍普菲爾網路來找出他們之間的能取得最佳相適性的對應關係。搜尋的工作則是由一個訓練過的倒傳遞網路來執行的,當我們輸入特定蛋白質對所有重要區塊的相適性分數到這個倒傳遞網路,它會告訴我們在資料庫中哪些蛋白質有著與輸入的蛋白質相似的結構。最後我們比較這個方法與另外一個利用了字尾樹的演算法。 我們希望這份工作能夠對於找尋用已知藥品來治療未知病毒的研究能夠有所貢獻。基於相似的蛋白質結構會有相似的功能的這個理論,我們能使用這個方法來找出與未知病毒相似的已知病毒而對抗此已知病毒的藥品也有可能對於未知病毒有著療效。 本論文取材自國科會計劃NSC 94-2213-E-002-105,題目與解答為指導教授所授,程式製作及網站設計為學生完成。Protein structure analysis and alignment is a topic receiving public focus now due to the important applications in biological and medical fields. We introduce a neural network based protein structure search method using the cell-to-cell adhesion property of the feature cells in this article. This method aims to find protein structures consisting of a target segment structure of interest from the protein structure database. By extending the basic idea from a handprinted Chinese character recognition method, we design this method for protein structure matching by taking advantage of the speed and performance of the character recogntion method. The method is a coarse one that aims to identify the protein structure that may contain the substructure of interest and needs further post-processing on the candidates to locate the exact location of the substructure. We design a new feature extraction method that generates rotation and translation invariant features and use the Procrustes algorithm to find the key substructures which are denoted by radicals. The Procrustes algorithm is originally designed for linguistic analysis and we modify it without loss of its characteristics to apply on the protein structures. Compatibility measurements between the protein structure and radical substructures is of key importance in the process and is formulated as an optimization problem solvable by a Hopfield network. The searching work is performed by a trained backpropagation network that takes in the compatibilty scores of the query substructure and outputs the reference to the candidate protein structures. Finally we compare this method to another index-based method that uses a modified suffix tree algorithm. We hope that this work can contribute to the biomedical researches in finding cures for a new virus by searching for the similar viral protein structures of viruses with existing cures. Since similar protein structures may have similar functions, the existing cures found by this method may be effective against the new virus too. This work was partially supported by National Science Council, ROC under contract number NSC 94-2213-E-002-105.1 Introduction 1 2 Related work 2 2.1 Comparison-based structure search . . . . . . . . . . . . . . . . . . . 3 2.1.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . .3 2.1.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2.1.3 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Index-based structure search . . . . . . . . . . . . . . . . . . . . . .8 2.2.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . .8 2.2.2 Building index . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 2.2.3 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 3 Methods to compare 10 3.1 The geometric suffix tree . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . .10 3.1.2 Tree construction . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.3 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 CTCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 3.2.1 Original CTCA for Handprinted Chinese Character Recognition . . . . . 15 3.2.2 CTCA for Protein Structure Matching . . . . . . . . . . . . . . . . . 21 4 Experiments 30 5 Conclusion 301948139 bytesapplication/pdfen-US蛋白質結構搜尋類神經網路霍普菲爾網路倒傳遞網路特徵相連關係protein structure searchneural networkhop eld networkbackpropgationcell-to-cell adhesion使用特徵相連關係的類神經網路蛋白質結構搜尋方法Neural Network Method for Protein Structure Search using Cell-to-Cell Adhesionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53676/1/ntu-96-R94922164-1.pdf