Support Vector Machines: Classification with Coding and Regression for Gene Selection
Date Issued
2008
Date
2008
Author(s)
Chen, Pei-Chun
Abstract
This thesis contains two major themes. One is the multiclass support vector machines and the other is the support vector regression for gene selection. In the first part, we propose a regression approach for multiclass support vector classification. We introduce some existing coding schemes into the support vector classification by coding the class labels into multivariate responses. Regression of these multivariate responses on kernelized input data is used to extract a low-dimensional featureubspace for discriminant purpose. We unify these coding schemes by showing that they are equivalent in the sense of leading to the same low-dimensional discriminant feature subspace. Classification is then carried out in this low-dimensional subspace using a linear discriminant algorithm, which can be any reasonable choice. The regression approach for extracting low-dimensionaliscriminant subspace combined with user-specified linearlgorithm can team up into a simple but yet powerful toolkit for multiclass support vector classification. Issues of encoding, decoding and the notions of equivalence of codes are discussed. Experimental results, including prediction ability and CPU time, show that our approach is a competent alternative for the multiclass support vector machine problem.n the second part, we propose a support vector regressionpproach for gene selection and use the selected genes for disease classification. Current gene selection methods based on microarray data have treated each individual subject with equal weight to the disease of interest. However, tissues collected from different patients can be from different disease stages and may have different strength of association with the disease. To reflecthis circumstance, our proposed method will take into account the subject variation by assigning different weights to subjects. The weights are calculated via support vector regression. Then significant genes are selected based on the cumulative sum of weighted expressions. The proposed gene selection procedure isllustrated and evaluated using the acute leukemia and colon cancer data. The results and performance are compared with four other approaches in terms of classification accuracies.
Subjects
coding
gene selection
kernel
linear discriminant subspace
machine learning
microarray data analysis
support vector machine
support vector regression
SDGs
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-97-D93842005-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):80957e6372b3373fd4eaa7746c39ff16
