劉力瑜Liu, Li-yu臺灣大學:農藝學研究所馬梓豪Ma, Tzu-haoTzu-haoMa2010-05-052018-07-112010-05-052018-07-112008U0001-2307200818411400http://ntur.lib.ntu.edu.tw//handle/246246/180066就針對「辨識生物晶片資料的基因標誌」這樣的主題而言,計學家曾提出許多方法,以求得更為精確且具有代表意義的基因標誌。根據前人研究發現,尋找出具有代表意義的基因才是建立正確性高分類法的關鍵。因此,此篇研究我們將提出利用本質相關係數辨識基因標誌的方法。從模擬的結果可以發現,該係數在不同的分配下,甚至針對不同種類的相關性都有這相當好的表現情形。們亦針對一份乳癌病人之微陣列資料進行分析。在此分析中,我們透過四項數值的比較,發現利用該係數所檢測得到的基因,明顯地比利用其他四種現有的統計方法所篩選得到的基因,更具有準確性與估計能力。和來說,從我們的研究結果可以得知,利用該係數以及其相關的變化型態所得到的基因標誌,無論是在針對相關性的辨識,或著找出的基因在後續分類法的表現情形,都具有相當程度的準確性與好的估計能力。For the topic of "identification of gene signatures in microarray data," statisticians have proposed lots of methods to accurately select the genes which are most representative. According to the results of previous researches, feature selection is essential in accurately classifying objects into classes. Therefore, we propose to use the coefficient of intrinsic dependence (CID) in identifying signatures. From the simulation results, we find that CID has a proper and stable detecting power in location or scale difference and under the different assumptions of distribution.he CID is also exercised on a breast cancer microarray data. We find that the selected genes by subCID, a expansion of CID, are thought more accurate and powerful in class estimation than the conventional statistics.ccording to the results of our study, there is convincing evidence that CID and subCID are more accurate and powerful in feature selection, and the selected genes are well-performed in classification studies, such as class estimation.TABLE OF CONTENTSageABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivHAPTER INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1I THE COEFFICIENT OF INTRINSIC DEPENDENCE . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Definition of CID . . . . . . . . . . . . . . . . . . . . . . . 4.3 Definition of subCID . . . . . . . . . . . . . . . . . . . . . 6.4 The properties of CID and subCID . . . . . . . . . . . . . 7.5 Hypothesis test of dependence . . . . . . . . . . . . . . . . 8II COMPARISON OF FEATURE SELECTION STATISTICS . . 11.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Definition of test statistics . . . . . . . . . . . . . . . . . . 12.3 Definition of power . . . . . . . . . . . . . . . . . . . . . . 14.4 Parameter setting . . . . . . . . . . . . . . . . . . . . . . . 14.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . 14V BREAST CANCER DATA ANALYSIS . . . . . . . . . . . . . . 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Description of the data set . . . . . . . . . . . . . . . . . . 18.3 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 19.4 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 22 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Article review . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Future study . . . . . . . . . . . . . . . . . . . . . . . . . 34EFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37PPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41application/pdf592597 bytesapplication/pdfen-US本質相關係數生物晶片辨識基因標誌分類法CIDmicroarrayidentificationgene signatureclassification[SDGs]SDG3利用本質相關係數辨識生物晶片資料的基因標誌Identification of the Gene Signatures in Microarray Data by CIDthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/180066/1/ntu-97-R95621201-1.pdf