陳中明Chen, Chung-Ming臺灣大學:醫學工程學研究所蔡昆男Tsai, Kun-NanKun-NanTsai2010-06-022018-06-292010-06-022018-06-292009U0001-2207200917264200http://ntur.lib.ntu.edu.tw//handle/246246/184652Identification and prediction of RNA signal expression is an important issue in genome research. Three important issues of RNA signal expression have been explored in this dissertation. The first one is identification of splice sites of RNA viruses, the second one is identification of cryptic 5’ splice site activation, and the last one is identification of crucial pathways. In the past, various methods have been developed for identifying splice sites of such species as Human, Drosophila, Arabidopsis thaliana, and so on, but not for RNA viruses. Splice site identification on an RNA virus has two potential difficulties seriously degrading the performance of most conventional splice site predictors. One is a limited number of genome strains available for a virus species and the other is the diversified sequence patterns around the splice sites caused by the high mutation frequency. Moreover, most splice site prediction methods have not taken into account the effect of mutations on splice sites. Therefore, they cannot effectively identify cryptic 5’ splice site activation when mutations occur around splice sites. For identification of crucial pathways, this dissertation focuses on the cytotoxic effect of recombinant Mycobacterium tuberculosis CFP-10/ESAT-10 protein on the crucial pathways of WI-38 cells, which has not been studied in depth previously. To overcome these three difficult issues, three new methods, called Genomic splice site prediction (GSSP algorithm), gapped-dinucleotide patterns with logarithmic frequency approach (GDLF algorithm), and crucial pathway analysis approach, have been proposed. GSSP algorithm used the eigen-patterns with cross-species strategy to identify splice sites for RNA viruses. The GSSP algorithm was shown to be effective and superior to NNsplice and SplicePredictor in predicting the splice sites of five RNA species in the Orthomyxoviruses family. The sensitivity and specificity achieved by the GSSP algorithm were all higher than 92% for splice sites. Furthermore, this method was successfully applied to identify the splice site prediction of human immunodeficiency virus type 1 (HIV1). GDLF algorithm combined gapped-dinucleotide patterns with logarithmic frequency to identify cryptic 5’ splice site activation when mutations occurred around splice sites. Based on the analyzed results, the GDLF algorithm was sufficiently shown to be a more efficient method than Ri value and free energy (△G) for identifying activated cryptic 5’ splice sites. The specificity achieved by the GDLF algorithm was 83% for cryptic 5’ splice sites when the sensitivity was fixed at 85%. Moreover, the GDLF algorithm was also successfully applied to the identification of alternative 5’ splice site selection of influenza A virus. rucial pathway analysis approach was an integrated analysis approach combining time-course microarray data and annotated pathway databases, and was proposed with the emphasis on identifying the potentially crucial pathways. The potentially crucial pathways were selected based on a composite criterion characterizing the average significance and topological properties of important genes. The analysis results suggested that the regulatory effect of rCFES was at least involved in cell proliferation, cell motility, cell survival, and metabolisms of WI-38 cells. The survivability of WI-38 cells, in particular, was significantly decreased to 62% with 12.5 μM rCFES. Furthermore, the focal adhesion pathway was identified as the potentially most-crucial pathway and 58 of 65 important genes in this pathway were down-regulated by rCFES treatment. Using qRT-PCR, we have confirmed the changes in the expression levels of LAMA4, PIK3R3, BIRC3, and NFKBIA, suggesting that these proteins may play an essential role in the cytotoxic process in the rCFES-treated WI-38 cells. The analysis results corroborate that the three proposed methods are effective in resolving the three underlying issues of RNA signal expression, respectively.口試委員會審定書………………………………………………………icknowledgments (Chinese)…………………………………………iibstract (Chinese)…………………………………………………iiibstract………………………………………………………………vontents………………………………………………………………viiiist of figures ……………………………………………………xiist of tables………………………………………………………xiiihapter 1 Introduction ……………………………………………1.1 Genomic project …………………………………………………1.2 RNA splicing ……………………………………………………2.3 Research status of mutations at splice sites …………3.4 Gene expression and gene regulation………………………4.5 Motivation…………………………………………………………6hapter 2 Related works …………………………………………11.1 Splice site identification ………………………………11.2 Activated cryptic 5’ splice site identification…14.3 Gene regulation pathway analysis…………………………16hapter 3 Materials and methods………………………………18.1 GSSP algorithm…………………………………………………18.1.1 Datasets………………………………………………………18.1.2 Proposed algorithm…………………………………………20.1.3 Sequence binarization………………………………………21.1.4 Consensus sequence…………………………………………23.1.5 Sequence pattern mining……………………………………23.2 GDLF algorithm…………………………………………………32.2.1 Datasets………………………………………………………32.2.2 Proposed algorithm…………………………………………33.3 Crucial pathway analysis approach………………………35.3.1 Cell cultures………………………………………………35.3.2 Expression and purification of rCFES…………………35.3.3 Cell survival assay………………………………………36.3.4 Microarray analysis………………………………………36.3.5 Significance analysis of gene expression…………37 .3.6 Pathway topology analysis………………………………38.3.7 RTQ-PCR analysis……………………………………………44hapter 4 Results and discussions……………………………45 .1 GSSP algorithm………………………………………………45.1.1 Performance analysis……………………………………45.1.2 Results………………………………………………………46.1.3 Comparisons with exist methods…………………………48.1.4 Application…………………………………………………57.2 GDLF algorithm…………………………………………………58.2.1 Performance analysis………………………………………58.2.2 Results………………………………………………………59.2.3 Comparisons with exist methods………………………60.2.4 Application…………………………………………………64.3 Crucial pathway analysis approach………………………68.3.1 Cytotoxic effect of rCFES on WI-38 …………………68.3.2 Identification of important genes……………………68.3.3 Potentially crucial pathways in rCFES-induced WI-38 cells…………………………………………………………………69.3.4 Validation of important genes by RTQ-PCR…………77.3.5 Application…………………………………………………82hapter 5 Conclusions and future works……………………88 eference……………………………………………………………92bbreviations………………………………………………………114ppendix………………………………………………………………116. List of test data for GDLF algorithm……………………116. List index of the 500 GDLF scores…………………………121application/pdf1318919 bytesapplication/pdfen-US特徵樣式跨物種正黏液病毒5端隱藏剪接位核苷酸對偶樣式突變肺結核菌CFP-10/ESAT-6重組蛋白重要路徑纖維細胞RNA virusEigen-patternCross-speciesOrthomyxovirusCryptic 5’ splice siteGapped-dinucleotide patternMutationtuberculosisrecombinant CFP-10/ESAT-6 proteincrucial pathwayfibroblast[SDGs]SDG3物種的RNA訊息表現之鑑別與預測:以RNA病毒和人類的剪接位選擇與WI-38細胞的重要路徑受肺結核菌之rCFES重組蛋白的毒性影響為例Identification and prediction of RNA signal expression of species: splice site selections of RNA virus and human, and cytotoxic effect of recombinant Mycobacterium tuberculosis CFP-10/ESAT-10rotein on the crucial pathways of WI-38 cellsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/184652/1/ntu-98-D91548013-1.pdf