指導教授:賴飛羆臺灣大學:生醫電子與資訊學研究所陳偉昕Chen, Wei-HsinWei-HsinChen2014-11-262018-07-052014-11-262018-07-052013http://ntur.lib.ntu.edu.tw//handle/246246/261839新生兒篩檢是在早期判斷出新生兒代謝疾病的方法,透過新生兒的採血,由血液樣本進行串聯質譜儀的分析,可以及早防治與給予治療。為此我們在台大醫院開發了一套新生兒篩檢資料處理系統,這個系統包含了樣本收集、檢驗資料上傳分析、給予治療與追蹤病人的功能。在本研究中,我們使用了資料探勘的方法來提高新生兒代謝疾病的辨識率,首先,我們將2002年到2007年七月的紙本新生兒篩檢室的資料數位化,並且把所有新生兒的篩檢資料彙集成資料庫。在本研究中,我們的機器學習方法將應用於苯酮尿症、高甲硫胺酸血症與3-甲基巴豆醯輔酵素羧化酵素缺乏症,藉由嘗試新的特徵組合配合最佳特徵抽取的方法,我們得到了對不同的疾病的最佳模型,可以大幅的下降偽陽性的個案,並且可以正確的判斷出所有陽性的病人。由此可知,此系統可以準確的判斷新生兒篩檢相關疾病,並且可以更有效的利用醫療資源。A Hospital Information System that integrates screening data and interpretation of the data is routinely requested. However, the accuracy of disease classification may be low because of the disease characteristics and analytes used for classification. The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system has been designed and deployed based on a Service-Oriented Architecture framework under the Web Services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. In this study, machine learning classification was used to predict the following: phenylketonuria, hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase deficiency. The classification methods used 435,682 newborn samples collected at the Center between 2006 and 2012. These samples include 229 newborns with values over the diagnostic cutoffs and 1822 over the screening cutoffs but that do not meet the diagnostic cutoffs. The feature selection strategies were defined as follows. The original 35 analytes and the manifested features are ranked based on the F-score. Next, the combinations of the top 20 ranked features were selected as input features to Support Vector Machines classifiers to obtain optimal feature sets. Finally, the feature sets were tested using 5-fold cross validation and the optimal models were generated. The datasets collected in year 2011 and 2012 were utilized as the predicting cases. By adopting the results of this study, the number of suspected cases could be reduced dramatically. Furthermore, the results of the research have been compared with those of other methodologies.CONTENTS 中文摘要 iii ABSTRACT iv CONTENTS vi LIST OF FIGURES viii LIST OF TABLES ix Chapter 1 Introduction 1 1.1 Newborn Screening Program 1 1.2 NTUH Newborn Screening Data 2 1.3 Cutoff Method 3 1.4 Aim of This Study 4 Chapter 2 Data Collection 5 2.1 Data Collection and Digitization 5 2.2 Data Collection and Correction 7 2.2.1 Data Transfer Algorithm and Process 7 2.2.2 Error Identifications and Corrections 8 2.3 Error Statistics 9 2.4 Data Statistics 10 Chapter 3 Materials and Methods for Metabolic Diseases 26 3.1 System Architecture 29 3.2 Data Preparation 31 3.3 Feature Selection Strategies 32 3.3.1 Support Vector Machines 32 3.3.2 Post-Analytical Tools 35 3.3.3 Data Training and Prediction 36 Chapter 4 Results 42 4.1 Newborn Screening Hospital Information System 42 4.2 Training Results 42 4.2.1 Optimal Feature Sets 42 4.3 Prediction Results 50 Chapter 5 Discussion 55 5.1 NTUH Newborn Screening Hospital Information System 55 5.2 Proposed Approach 58 5.3 Limitations 59 5.4 Future Work 60 Chapter 6 Conclusion 61 References 62 Appendix 67 LIST OF FIGURES Figure 1. Neonatal screening data digitization process. 5 Figure 2. Data transformation algorithm 7 Figure 3. Error statistics 10 Figure 4. Trend of MS/MS analytes 24 Figure 5. Workflow of newborn screening processes in NTUH 26 Figure 6. The system architecture of the Web-based newborn screening system 29 Figure 7. The SVM methodology 33 Figure 8. Post-analytical tools example for GA-II 35 Figure 9. Post-analytical tools example for GA-II 36 Figure 10. Training and prediction strategies 37 Figure 11. Boxplot of Phe 38 Figure 12. Feature selection strategies by relevant features 39 Figure 13. The boxplot of the selected markers of PKU 44 Figure 14. The boxplot of the selected markers of Hypermethioninemia 47 Figure 15. The boxplot of the selected markers of 3-MCC deficiency 50 Figure 16. Snapshot of newborn screening hospital information system 56 Figure 17. The collaborating, interoperability among NSHIS subsystems. 57 LIST OF TABLES Table I. Definition of noisy data 9 Table II. Accumulated species concentrations over the period 2002-2012 12 Table III. Summary of the disease data 31 Table IV. Manifested disease features 40 Table V. Selected markers of three diseases 42 Table VI. Comparison of the current method vs. the proposed method in 2011 51 Table VII. Comparison of the current method vs. the proposed method in 2011 52 Table VIII. Comparison of the current method vs. the proposed method in 2012 52 Table IX. Comparison of current method vs. SVM without Feature Selection, SVM with F-score, proposed method and post-analytical tools in 2011 535205390 bytesapplication/pdf論文公開時間:2017/01/27論文使用權限:同意有償授權(權利金給回饋本人)網路服務新生兒篩檢串聯質譜儀醫療資訊系統新生兒代謝疾病資料探勘以網路服務為基礎的新生兒代謝疾病篩檢系統A Web-Service-Based Newborn Screening System for Metabolic Diseasesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/261839/1/ntu-102-D97945014-1.pdf