從文獻摘要擷取蛋白質交互作用之系統

高成炎阮雪芬臺灣大學：彭沁璘Peng, Chin-LinChin-LinPeng2007-11-262018-07-052007-11-262018-07-052004http://ntur.lib.ntu.edu.tw//handle/246246/53605為了瞭解蛋白質與蛋白質，或蛋白質與基因之間的交互作用，弁鈳J白質體學成為現今重要的一個課題。透過蛋白質與蛋白質之間交互作用的資訊，可以更加正確地得知未知蛋白質的弁遄C此外，由蛋白質之間交互作用所構成的網路關係，提供我們對細胞機能初步的認識。一個不知道在細胞機制扮演何種角色的蛋白質，利用蛋白質交互作用的資料，藉其相關聯已辨識蛋白質的弁遄A能夠對此未知蛋白質作一弁鄋犒w測。蛋白質交互作用的網路，也能提供其內蛋白質弁鈮N義的新觀點，更深一層地瞭解細胞的弁遄C此系統（命名為Protext）開發的目標是從文獻摘要擷取蛋白質交互作用，利用國際生物技術資訊中心（NCBI）所建構的文獻資料庫，針對文獻摘要進行自動處理，從其中擷取蛋白質交互作用的資訊，並提供網頁服務與圖形化的工具，建構蛋白質交互作用網路。目前系統利用機器學習-貝氏演算法（Naïve Bayes）作為基礎，有效地處理摘要中的句子，產生可能的蛋白質交互作用。此外，提出利用聯結文法（Link Grammar）為基礎的系統，能更加有效產生正確可能的蛋白質交互作用網路。使用者可以透過網頁，對其有興趣的蛋白質，產生相關蛋白質交互作用的資訊與圖形化的預測網路。對於尋找可能的蛋白質交互作用，或利用文獻摘要做特定蛋白質研究，均極有價值。Functional proteomics is aimed at understanding the protein-protein and gene-protein interactions. The function of a protein can be characterized more precisely through knowledge of protein-protein interactions. Moreover, networks of interacting proteins provide a first level of understanding the cellular mechanism. The cellular functions of uncharacterized proteins are revealed through their linkages to characterized proteins. The networks of linkages offer a new view of the meaning of protein function, and a deepened understanding of the functioning of cells. ProtExt is a web-based software package which automatically extracts information of protein-protein interactions from the literature abstracts available at the NCBI Entrez-PubMed system and provides a visualization tool for constructing protein-protein interaction network. The engine of ProtExt is based on a Naïve Bayes learning method to efficiently process sentences from abstracts and generate possible protein-protein interactions. In the further, more accurate link grammar based extraction system is proposed. Users can specify their interested proteins and a network of the predicted interacting proteins will be generated and visualized on the web page. This system will provide a valuable resource for discovery of protein-protein interactions and further study of the related proteins from PubMed abstracts.中文摘要 1 Abstracts 2 CHAPTER 1 INTRODUCTION 2 1.1 Motivation 2 1.2 Related Work 3 CHAPTER 2 MATERIALS AND METHODS 7 2.1 Naїve Bayes Based Extraction System 7 2.1.1 System Architecture 7 2.1.2 Dictionary 9 2.1.2.1 The Dictionary of Protein Names 9 2.1.2.2 The Dictionary of Functional Keywords 14 2.1.3 Extraction Process 15 2.1.4 Naїve Bayes Classification 16 2.1.5 Training Data Set 17 2.1.6 Mutual Information 17 2.2 Link Grammar Based Extraction System 18 2.2.1 System Architecture 18 2.2.2 Sentence Segmentation and Tokenization 20 2.2.3 Named Entity Recognition and Conversion 20 2.2.4 Simple Filtering 21 2.2.5 Paring and Template Matching 21 2.2.6 PETL: ProtExt Template Language 23 2.2.6.1 Operators 24 2.3 Visualization 26 2.3.1 GraphViz 27 2.3.2 Java Applet 27 2.4 Web Implementation 27 2.4.1 Input Query 28 2.4.2 Dictionary Checking 29 2.4.3 Results Output 30 2.4.4 Visualization 32 2.4.4.1 Static Visualization 32 2.4.4.2 Dynamic Visualization 34 CHAPTER 3 EXPERIMENTS AND DISCUSSIONS 36 3.1 Case Study I : Cofilin 36 3.2 Case Study II : Multiple Input Proteins 37 CHAPTER 4 CONCLUSIONS AND FUTURE WORK 45 Reference 46 Appendix 49590665 bytesapplication/pdfen-US交互作用蛋白質擷取系統文獻摘要extraction systemabstractproteininteraction從文獻摘要擷取蛋白質交互作用之系統ProtExt: a Web-based Protein-protein Interaction Extraction System for PubMed Abstractsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53605/1/ntu-93-R91922060-1.pdf