高成炎臺灣大學:資訊工程學研究所蔡其杭Tsai, Chi-HungChi-HungTsai2007-11-262018-07-052007-11-262018-07-052006http://ntur.lib.ntu.edu.tw//handle/246246/54000Support Vector Machine (SVM) is widely adopted in the field of machine learning and pattern recognition, and recently the application of SVM techniques to bioinformatics is also very promising. In this dissertation, we applied SVM to two important issues in bioinformatics: protein disulfide connectivity prediction and quantitative-structure activity relationship (QSAR) model construction. For disulfide connectivity prediction, we implemented an algorithm which infers pair-wise bonding probability by SVM, and introduced a descriptor which derived from the sequential distance between oxidized cysteines (DOC). From the analysis of prediction, it revealed that the prediction accuracy is improved with the addition of this descriptor DOC. Furthermore, we developed a two-level prediction model to integrate protein local and global information. The experimental results showed that the prediction accuracy is greatly enhanced. These results are compared with those of previous studies, and a prediction web-service is also provided on the internet. For QSAR model construction, we developed an approach to build QSAR models by selecting the hypothetical descriptor pharmacophore (HDP) with generic evolutionary method (GEM) and correlating the descriptors to activities with SVM. Experimental results of 5 public datasets indicated that our approach is comparable to those of previous studies. Additionally, we incorporated k-means and hierarchical clustering methods to cluster compounds into subsets and construct specific QSAR model for each cluster. The experimental results show that compounds with particular structural features are successfully clustered into the same subset, and the prediction accuracy was enhanced using specific models build by these clusters.Chapter 1 Introduction 1 1.1 Background Knowledge 2 1.1.1 Supprt Vector Machine in bioinformatics 2 1.1.2 Disulfide Connectivity Prediction 3 1.1.3 Quantitative Structure-Activity Relationships (QSAR) 4 1.2 Thesis Overview 5 Chapter 2 Improving Disulfide Connectivity Prediction 7 2.1 Introduction 7 2.2 Prediction of the Disulfide Connectivity Pattern 8 2.2.1 Support Vector Machine 9 2.2.2 Data Encoding 9 2.2.3 Maxium Weight Matching 10 2.2.4 Evaluation Criteria 10 2.3 Dataset and Results 10 2.3.1 Cross-Validation of SP39 11 2.4 PreCys Web Server 12 2.5 Discussion and Conclusion 13 Chapter 3 Two-level Models for Disulfide Connectivity Prediction 16 3.1 Pair-wise and Pattern-wise Methods 16 3.2 Two-level Framework 17 3.2.1 Level-1: Pair-wise 18 3.2.2 Level-2: Pattern-wise 20 3.2.3 Reduction for Imbalance 21 3.3 Results and Discussion 22 3.3.1 Dataset Preparation 22 3.3.2 Validation with SP39 and SP43 23 3.4 Effects of Descriptors 24 3.4.1 Pair-wise Relation from Level-1 25 3.4.2 CSP implication 26 3.4.3 Global Information 27 3.4.4 Effect of Candidate Selection 28 3.5 Conclusion 28 Chapter 4 GEMSVM for QSAR Models construction 30 4.1 Introduction 30 4.2 Material and Methods 33 4.2.1 Screen Features by Mahalanobis Distance 33 4.2.2 Feature Selection by Generic Evolutionary Method 34 4.2.3 GEMSVM 35 4.2.4 GEMPLS 36 4.2.5 GEMkNN 37 4.2.6 Performance Evaluation 38 4.2.7 Dataset Preparation 38 4.3 Results and Discussion 41 4.3.1 Validation with Artificial Data Set 41 4.3.2 Validation with Public Data Sets 43 4.4 Conclusions 45 Chapter 5 Ligand Clustering and Specific QSAR Model 48 5.1 Introduction 48 5.2 Material and Methods 49 5.2.1 Identify Activity-Correlated Features 49 5.2.2 Ligand Clustering 49 5.2.3 Specific Model Construction and Prediction 51 5.2.4 Dataset Preparation 51 5.3 Results and Discussion 52 5.3.1 PDGFR dataset 52 5.4 Conclusions 56 Chapter 6 Conclusions 58 6.1 Summary 58 6.2 Future works 59 Bibliography 61 Appendix A. List of Publications 681979287 bytesapplication/pdfen-US支援向量機雙硫鍵雙硫鍵預測藥物結構活性迴歸模型SVMdisulfide-bonddisulfide connectivity predictionQSAR應用支援向量機解蛋白質雙硫鍵預測及藥物結構活性量化回歸模型建構Applying Support Vector Machines to Protein Disulfide Connectivity Prediction and QSAR Model Constructionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/54000/1/ntu-95-D90922008-1.pdf