劉長遠臺灣大學:資訊工程學研究所馬舒雅Manna, SukanyaSukanyaManna2007-11-262018-07-052007-11-262018-07-052006http://ntur.lib.ntu.edu.tw//handle/246246/53848此論文提出一系統化的方法從數位媒體中生物和電腦科學的知識擷取有用的資訊。此工作在基因的層次找出蛋白質酵素的演化趨勢。此外,本論文尚描繪出從數位典藏中有同現性地酵素彼此間的關係的生物發現。 此處我們發展了 pseudo-reverse 機制來比較現存標準概念的蛋白質酵素的行為。關於核甘酸置換的比例,我們的研究強烈的建立在進化論的假設上;我們想要確定pseudo-reverse 的方法在進化論的假設下可以多接近正確的答案。我們在此使用了 Nei and Gojobori 一般化的標準模型來決定 nucleotide 的替代以及 Jukes and Cantor’s模型來計算其比率。我們也將可比較的基因組欽入此模型來計算物種世系,如人類,老鼠的蛋白質酵素。此調查研究預測酵素的突變相較之下比原來的蛋白質慢且人類和鼠類的酵素差異時間更比原來的蛋白質慢五倍之多,約四億年。 在論文附錄的部份,我們描述了酵素互引的相關研究。這包含從現存的酵素數位文獻中自動擷取出隱含的和外顯的生藥知識。我們已經在此展示小規模的資料以便對在一般的數位資料庫像 CiteSeer 上有多少可得酵素資料有完整的 idea 。我們從 4950 對酵素的數位文件中,建立酵素對酵素的互引網路。此調查著重在 CiteSeer 這樣一般資料庫中的酵素研究資料,研究資料分為三個基本的狀態 -- 良好建立的,半熟的,及未知的。我們的目標非常的簡單,主要負責專注在同一文件中兩個酵素的關係,來找出生物資訊。我們由相關的參考文獻來驗證此概念,並發現此方法可以偵測由某些酵素引起或治療的疾病。甚至可以從文獻中得知關於酵素的詳細分子反應。 此論文解釋了這些調查及其詳細的方法。This dissertation provides a systematic methodologies for the information retrieval from digital media incorporating the knowledge from both biology and computer science. Here the work is proceeded in the genomic level to find out the evolutionary trends of enzyme proteins. Besides this, this thesis also illustrates some biological findings of how the enzymes can be related with each other through co-occurrences in the digital literatures. Here, we developed a method of pseudo-reverse mechanism to compare the behaviour the enzyme proteins with the existing standard concepts. Our work is based on the strong assumption from the evolutionary theory, about the rates of nucleotide substitutions; we use this in the pseudo-reverse approach to verify how far it can be justified. We employed here the standard model of Nei and Gojobori in a generalized form for determining the nucleotide substitutions and Jukes and Cantor's model for finding out their rates. We also embedded the comparative genomics in this model to calculate the lineages among the species like human, mouse and rat for these enzyme proteins. We predicted from this study that the mutation for the enzymes are comparatively slower than ordinary proteins and the time of divergence for these enzymes with human and mouse or rat is almost five times more, around 400 Million years. In the Appendix part of this thesis, we described the study on the enzyme co-citations. This involves automated extraction of explicit and implicit biomedical knowledge of the existing works on enzymes from the digital documents. We have presented here the work on a small scale data-set so as to have an overall idea of the availability of these enzymes on a generic digital library like CiteSeer. We created enzyme-to-enzyme co-citation network from digital documents from 4950 pairs of enzymes. This study emphasizes three basic statuses of the enzyme studies on the generic database like CiteSeer -- some are well established, some are half cooked and others still now unknown and unclear. Our goal is very simple and it mainly responsible to focus on two enzyme relation in a document. We validated the concepts of this work with the related references and found that this approach can find ways to detect diseases, which are caused or cured by certain enzymes. Even it can help to get the detail underlying molecular reactions about enzymes from the literatures.Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Biological Background of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Enzymes and their Roles in Living Body . . . . . . . . . . . . . . . . . . . . . 3 1.3.2 Molecular Evolution Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Relation between DNA, RNA and Proteins . . . . . . . . . . . . . . . . . . . 8 1.3.4 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.5 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Pseudo-Reverse Approach in Genetic Evolution 18 2.1 Comparative Genomics and Evolutionary Studies . . . . . . . . . . . . . . . . . . . . 18 2.2 Nucleotide substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Nucleotide Substitution Model : Jukes and Cantor’s one parameter model . . 19 2.2.2 Number of substitution between two sequences . . . . . . . . . . . . . . . . . 23 2.2.3 Number of substitutions between two non-coding sequences . . . . . . . . . . 23 2.2.4 Number of substitution between two protein-coding sequences . . . . . . . . . 25 2.3 Rates of evolutionary changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Pseudo-Reverse approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.2 Generalized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.7 Simulations and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.2 Experimental Results and Observation . . . . . . . . . . . . . . . . . . . . . . 34 i 3 Conclusion 45 A Enzyme Co-citation : A Case Study with CiteSeer 52 A.1 Role of information extraction (IE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.2 Enzyme-enzyme co-occurrence concept . . . . . . . . . . . . . . . . . . . . . . . . . . 53 A.3 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 A.3.1 Reason for choosing CiteSeer over other databases . . . . . . . . . . . . . . . 54 A.3.2 Dataset and co-indexing of enzymes . . . . . . . . . . . . . . . . . . . . . . . 54 B Glossary 66568486 bytesapplication/pdfen-US比較基因體學酵素演化核甘酸替代共同引述comparative genomicsenzymesevolutionnucleotide substitutionsco-citation[SDGs]SDG3應用比較基因體學於尋找酵素演化趨勢之研究Comparative Genomics in Determining the Evolutionary Trends of Enzymesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53848/1/ntu-95-R93922143-1.pdf