翁昭旼Jau-Min Wong臺灣大學:醫學工程學研究所鄭景旭Zheng, Jing-XuJing-XuZheng2010-06-022018-06-292010-06-022018-06-292008U0001-2507200811165500http://ntur.lib.ntu.edu.tw//handle/246246/184644隨著生物醫學和很多分析方的快速發展,使用文件探勘工具去尋找蛋白質間交互作用變得越來越重要。現今研究學者藉由閱讀生醫文獻以獲得重要資訊,但生醫文獻的數量卻以驚人的速度成長,如果以人工擷取資訊,將會耗費大量人力跟時間,因此從文件中自動擷取重要訊息的需求量增加。 我們利用了淺層剖析器跟考量句子結構,發展了一個能從文獻中自動擷取蛋白質間交互作用的資訊系統。我們系統比對句子的文法樣式跟傳統作法不同。我們設計有效率的演算法並考量句子的語意制定一些規則以擷取蛋白質交互關係,而關係中並區分出有作用蛋白質跟被作用蛋白質。我們的系統由以下數個步驟所組成,分別是醫學文獻前處理、斷句、斷字、詞類標記、蛋白質名詞辨識、描述交互作用的關鍵字、介係詞及連接詞標記、蛋白質間交互作用的擷取。最後利用兩個測試集來評估此系統,分別是 LLL05競賽與BioCreAtIvE-PPI。With the rapid progress of biomedical science and large amounts of analysis methods, many researchers nowadays access knowledge about protein-protein interaction through PubMed abstracts, but the amount of biomedical literature is enormous and continues to grow at exponential rate. Therefore, the demand for automatic extraction of information from text has been increasing, using text mining tools to find knowledge such as protein-protein interactions, which is useful for specific analysis tasks has become critical.e develop a system which can automatically extracts protein-protein interactions from free text using a shallow parser and sentence structure analysis techniques. Our system matches sentences against syntax patterns typically describing protein-protein interactions. We design an efficient algorithm and develop a set of rules which extracts protein-protein interactions from their syntactic roles. Protein-protein interactions include ACTOR ( doner of action) and OBJECT (receiver of action).There are essential steps to accomplish our system which includes preprocessor, sentence splitting, tokenization, part-of-speech tagging, protein names recognition, interaction keywords , prepositions , conjunction tagging and protein-protein interactions extracting. Finally, we evaluate our system on two samples, one derived from the LLL05 challenge, the other from BioCreAtIvE-PPI.口試委員會審定書謝 ………………………………………………………………… i文摘要 …………………………………………………………… ii文摘要 …………………………………………………………… iii錄 ………………………………………………………………… iv目錄 ……………………………………………………………… vi目錄 …………………………………………………………… viii一章 緒論…………………………………………………………… 1.1研究背景與動機…………………………………………………1.2研究目的………………………………………………………….1.3論文架構......………………………………………………… 2二章 相關文獻……………………………………………………. 3.1型樣比對的研究……………………………………………… . 3.2蛋白質交互作用方法論……………………………………….. 4.3自然語言處理…………………………………………………. 6三章 材料與方法………………………………………………….. 9.1 材料………………………………………………………….. 9.2 系統設計……………………………………………………….15.3系統架構及流程………………………………………………..16.4蛋白質間交互作用的擷取………………………………….. 30四章 視覺化蛋白質交互作用呈現………………………………..39五章 結果和討論……………………………………………………44.1 LLL05競賽…………………………………………………….44.2 BioCreAtIvE-PPI………………………………………….. 45.3 討論………………………………………………………….. 46六章 結論…………………………………………………………. 49.1 結論……………...……………………………………………49.2 限制……………...……………………………………………49.3 未來的工作…………………………………………………….49考文獻……………………………………………………………….50application/pdf750538 bytesapplication/pdfen-US蛋白質和蛋白質間的交互作用文字探勘語意型樣斷句斷字詞類標記生醫名詞的辨識protein-protein interactiontext miningsyntax patternssentence splittingtokenizationpart-of-speechprotein names recognition從醫學文獻摘要擷取蛋白質之間的交互作用Extracting Interactions between Proteins from PubMedbstractthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/184644/1/ntu-97-R95548053-1.pdf