開發資訊檢索及文字探勘系統以研究基因、甲基化和疾病之關係

2006-08-012024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/695237摘要：去氧核醣核酸(DNA)之甲基化是一種發生在胞嘧啶(cytosine)環狀結構上之共價化學修飾，也是與許多生理功能 (例如：細胞週期和DNA 修補) 相關之重要外遺傳 (epigenetic)特徵，異常的DNA 甲基化已被證實會導致許多人類疾病，例如癌症，而造成人類腫瘤產生的主要原因之一是由於腫瘤抑制基因(tumor suppressor genes) 其啟動子(promotor)上之CpG island 發生高度甲基化，而導致這些基因無法被表現，某些特定基因甚至會偏好在特定類型之腫瘤內進行甲基化。應用在生物領域之文字探勘是指從文件中自動地萃取、整合出相關的基因、蛋白質資訊以及本身或互相之間在功能上的關係。許多資訊技術像是資料庫、資訊檢 索、機器學習和自然語言處理都能在文字探勘的研究上給予協助，例如，資訊檢索 常被用&#637<br> Abstract: DNA methylation, a covalent chemical modification at the carbon 5 position of the cytosine ring, is an important epigenetic feature that is associated with various physiological functions, such as cell cycle and DNA repair. Aberrant DNA methylation is linked to numerous human diseases, for instance, cancer. One of the major contributions to the development of human tumors is the silencing of tumor suppressor genes by CpG island promoter hypermethylation. Some particular genes are preferentially methylated with respect to others in certain tumor types. Text mining in biology is the process of automatically extracting and combining information regarding genes, proteins and their functional associations from text documents. Several information techniques, such as database, information retrieval (IR), machine learning and natural language processing (NLP) assist the research of text mining. For instance, IR is used to select documents that are relevant according to a user’s needs and NLP can be applied to analyze a sentence to determine its structure. In the last few years, biological literature describing disease candidate genes with putative aberrant methylation has been published exponentially. It is necessary to automatically summarize the literature and uncover potentially meaningful relations among genes, methylation and cancers. However, so far no attempt has been made to analyze and retrieve the available DNA methylation information, especially disease candidate genes, in a great deal of literature. We will present an information retrieval and text mining system, MeInfoText (MIT), for DNA methylation information in text. The goals of this system are to discover novel relationship between aberrant methylated genes and diseases, infer new disease candidate genes, fill a gap in the already available DNA methylation resources and facilitate the research on epigenetics. The proposed approach of our system is as follows: at first, the documents will be gathered, preprocessed and indexed. Secondly, the text will be classified using machine learning techniques and several methods such as statistics and syntactic parses will be integrated to extract relations among genes, methylation and diseases. Thirdly, public databases and methylation related tools will be integrated to our system for cross-references and text analysis.腫瘤抑制基因甲基化5-甲基胞嘧啶資訊檢&#63850文字探勘&#63850引自然語言處&#63972機器學習tumor suppressor genesCpG island5-methylcytosinemethylationinformation retrievaltext miningindexnatural language processingmachine learning開發資訊檢索及文字探勘系統以研究基因、甲基化和疾病之關係