Applications of Kernel-Partition FLD to Document Classification
Date Issued
2006
Date
2006
Author(s)
Hsu, Chung-Jui
DOI
en-US
Abstract
Since attribute interpretation is important in classification but not provided by nonlinear classifiers, the objective of this research is to develop a methodology for nonlinear classification methods, namely KFD (Kernel Fisher Discriminant) and KMSE (Kernel Minimum Squared Error), to provide attribute interpretation. The proposed methodology is called Kernel-Partition FLD. KFD and KMSE are both kernel-based classifiers which transform the instances from the original attribute space to the feature space. The feature space is efficient for feature extraction and pattern recognition but losses the meanings of the original attributes. For attribute interpretation, we need to partition the instances with nonlinear structures into several groups where each group has its own linear structure. Then, we can apply FLD (Fisher Linear Discriminant) for each group to provide attribute interpretation. In addition, we also attempt to attribute interpretation for the Kernel-Partition in this study. We will then apply the methodology to document classification. Classification of a large number of documents with a great number of terms is a challenge for all learning algorithms and will be the focus of this research. A novel approach should be developed such that the text dataset can be better classified through nonlinear classification with knowledge on which terms (attributes) are more important for classification of certain types of documents. Moreover, the high computation cost and the sparsity problem of document vectors in document classification is also an issue to be addressed in this research. Thus, a dimension-reduction methodology is developed to effectively diminish the computation requirement and reduce the dimensions without loss much information. With Kernel-Partition FLD, the attribute interpretation can be further developed to become classification rules for kernel-based classification approaches. In this research, the proposed methodologies will be shown to successfully combine the advantages of both linear and nonlinear classifiers through simulated cases and the real-world cases of the text dataset.
Subjects
核費雪區別法
核最小平方誤差法
核分組費雪區別法
屬性解釋
Document Classification
Attribute Interpretation
Kernel-Partition FLD
Kernel Fisher discriminants
Kernel Minimum Squared Error
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-95-R93546026-1.pdf
Size
23.53 KB
Format
Adobe PDF
Checksum
(MD5):966034e35f8e66c8246e30e8f846dcde