2006-08-012024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/692033摘要:依據各試驗單位的特性建立樣本分類法則, 可細分為兩個步驟: (1) 篩選數個可明確判別群集的變數, (2) 利用選擇的變數建立最佳分類法則。其中步驟 (1) 通稱為特徵篩選 (feature selection), 當可供做為分類依據的變數個數很多時, 選擇少數幾個足以適當分類樣本的變數除了有助於降低成本, 挑選適當的變數也是準確分類樣本的關鍵 (Sima et al., 2005)。變數篩選標準可歸類為相關性度量與錯分率度量兩大類。錯分率度量與分類模式有關, 不同分類模式結果相異。相關性度<br> Abstract: The problem of classification is to assign objects to one of the mutually exclusive subgroups in the population based on the object's characteristics. To build a precise rule of classification, a two-step procedure is usually performed on the training dataset: (1) selecting a few features that are most informative in the sense of decision making; (2) deriving the formula that outputs optimal allocation of objects. Selecting appropriate features is particularly essential for a successful classification. Recent methods of feature selection consider either the misclassification rate of objects given information of a set of variables, or the association between variables and class label. The former yields inconsistent results for different settings of classifiers. The later is subject to the choice of association measures. The analysis of actual data from a study of breast cancer gene expression is included. Hsing et al. (2005) has proposed a new measure of association, the coefficient of intrinsic dependence, or CID. The CID captures not only linear but general association among variables. It was also demonstrated that CID is capable of putting variables in appropriate order according to their degree of association to the target variable even when sample size is small. This research will broaden the work of Hsing et al. (2005) by applying CID in feature selection. It will be followed by construction of Bayes classifiers and comparisons to conventional methods.本質相關係數分類法則特徵篩選微陣列CIDclassificationfeature selectionmicroarray利用本質相關係數建立樣本分類法則之可行性評估