林守德臺灣大學:資訊網路與多媒體研究所顏君釗Yen, Chun-ChaoChun-ChaoYen2010-05-052018-07-052010-05-052018-07-052009U0001-3107200909424200http://ntur.lib.ntu.edu.tw//handle/246246/180793在本篇論文中,我們提出了一個非監督式特徵選擇的方式,從資料中移除多餘的特徵。主要的貢獻可以分成兩個部分:第一,我們依照平面對於資料中近乎線性相依的描敘能力,利用特徵分解(eigen-decomposition)來對這些平面的方程式排名。藉由高斯消去法,我們逐步地刪去那些可以被其它特徵取代的特徵。第二,我們證明了此方法接近於對資料進行主成分分析後,進而移除在不重要的主成分上所占比重較大的特徵。然而,我們更進一步的考慮了每次特徵被移除後,對其它特徵的影響。實驗證明我們的方法可以在一個已知特徵相依性的人為資料上,刪去那些和其它特徵相依的特徵。而對於真實世界的資料,相較於其它方法,我們的可以有更好的效果。In the thesis, we propose an unsupervised feature selection method to remove the redundant features from a dataset. The major contributions are twofold. First, we propose an eigen-decomposition method to rank the hyperplanes (which describes the relations between features) based on their near linear dependency characteristic, and then design an efficient Gaussian-elimination method to one by one remove the feature that is best represented by the rest of the features. Second, we provide a proof showing that our method is similar to removing the features that contribute the most to the PCA components with the smallest eigenvalue, but considering the effect of each removal of features. We perform experiments on an artificial data set created by ourselves, and two other real-world data sets with different characteristics. The experiment show that our method can almost perfectly remove those dependent features without losing any independent dimension in the artificial set and outperforms two other competitive algorithms in the real-world dataset.Acknowledgements 1要 2bstract 3able of Contents 4ist of Figures 6ist of Tables 7. Introduction 8.1. Outline of the Proposed Solution 8.2. Thesis Organization 9. Related Works 10.1. Supervised Feature Selection 10.1.1. Goodness Measurement 10.1.1.1. Filter 11.1.1.2. Wrapper 11.1.1.3. Hybrid 12.1.2. Search Strategy 13.2. Unsupervised Feature Selection 13.3. Discarding Redundant Variables 15.3.1. Principal-Component-Analysis-Based 15.3.2. Clustering-Based 17. Methodology 19.1. Redundancy Examination 19.2. Geometrical Interpretation 22.3. Feature Removal 25.4. Connection to PCA-Based Methods 29. Experiment 31.1. Artificial Dataset 31.2. Real-World Dataset 32. Conclusion 36. Reference 37application/pdf1008813 bytesapplication/pdfen-US非監督式特徵選取機器學習線性相依主成分分析Unsupervised Feature SelectionMachine LearningLinear DependencyPrincipal Component Analysis非監督式特徵選擇:最小化特徵的資訊冗餘Unsupervised Feature Selection: inimize Information Redundancy of Featuresthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/180793/1/ntu-98-R96944016-1.pdf