陳正剛臺灣大學:工業工程學研究所劉兆文Liu, Chao-WenChao-WenLiu2007-11-262018-06-292007-11-262018-06-292004http://ntur.lib.ntu.edu.tw//handle/246246/51187The well-known regression trees use the variance reduction as a measure to select attributes and split the data set to build a decision tree model. The conventional tree splitting, however, depletes the sample size rapidly after few levels of splitting results in unreliable splitting decisions with small sample sizes. In order to overcome the sample-depleting problem of regression trees, Sample-efficient regression trees (SERT) was proposed to avoid the unnecessary splits. But when a great number of interaction effects exist, the select-and-split construction of SERT is still not efficient in stopping the sample size depleting. In this research, we propose an Enhanced Sample-Efficient Regression Trees (ESERT) that expended with attribute combination selection and the MaxF selection criterion. We first show how to apply the MaxF selection criterion to regression tree’s attribute selection and stopping of tree construction. With the MaxF selection criterion, methodologies of attribute combination selection are introduced. A complete select-and-split tree construction and model estimation will be described. The ESERT procedures for both binary and continuous attributes will be developed. Using three different simulation scenarios, we demonstrate the contributions of MaxF selection criterion, sample-efficient method and attribute combination selection to tree construction. Two real cases: semiconductor bad tool selection and differentially expressed gene selection, will be also used to illustrate and validate the proposed ESERT.Abstract i 中文摘要 ii Contents iii Contents of Figures iv Contents of Tables vi Chapter 1 Introduction 1 Chapter 2 SERT with MaxF Selection Criterion and Attribute Combination Selection for Binary Attributes 8 2.1 MaxF Selection Criterion for Regression Trees 8 2.2 Attribute Combination Selection 13 2.3 Complete Select-and-Split Tree Construction 19 2.4 Estimation 23 Chapter 3 SERT with MaxF Selection Criterion and Attribute Combination Selection for Continuous Attributes 25 3.1 MaxF Selection Criterion for Regression Trees 26 3.2 Attribute Combination Selection 30 3.3 Complete Selection-and-Split Tree Construction 34 3.4 Estimation 38 Chapter 4 Validation with Simulation and Real Case Study 40 4.1 Validation with Simulation 40 4.1.1 Scenario one 40 4.1.2 Scenario two 42 4.1.3 Scenario three 45 4.2 Validation with Real case: Semiconductor Bad tool selection 49 4.3 Validation with Real case: Differentially Expressed Gene Selection 52 Chapter 5 Conclusions 56 Reference 57 Appendix: C++ code for attribute combination selection 58851105 bytesapplication/pdfen-US迴歸樹Regression Trees利用MaxF篩選變數組合之迴歸樹Enhanced Sample-Efficient Regression Trees with MaxF Selection Criterion and Attribute Combination Selectionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/51187/1/ntu-93-R91546016-1.pdf