Recursive feature selection with significant variables of support vectors
Journal
Computational and Mathematical Methods in Medicine
Journal Volume
2012
Date Issued
2012
Author(s)
Abstract
The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM. ? 2012 Chen-An Tsai et al.
SDGs
Other Subjects
Data reduction; Genes; Cancer classification; Classification algorithm; Classification performance; Extensive simulations; Microarray data sets; Prediction accuracy; Recursive feature elimination; Significant variables; Support vector machines; acyltransferase; ADP ribosylation factor interacting protein 2; basal cell adhesion molecule; cell adhesion molecule; complement component C4a; complement component C4b; estrogen receptor; hepatocyte nuclear factor 3alpha; integrin; intestinal trefoil factor; membrane protein; sodium channel; thrombospondin; unclassified drug; article; comparative study; genetic analysis; genetic engineering and gene technology; genetic procedures; lung cancer; nucleotide sequence; recursive support vector machine; simulation; support vector machine; support vector machine recursive feature elimination; algorithm; artificial intelligence; automated pattern recognition; biological model; biology; breast tumor; computer simulation; DNA microarray; factual database; gene expression profiling; gene expression regulation; genetic database; genetics; human; lung tumor; methodology; normal distribution; reproducibility; statistical model; support vector machine; Algorithms; Artificial Intelligence; Breast Neoplasms; Computational Biology; Computer Simulation; Databases, Factual; Databases, Genetic; Gene Expression Profiling; Gene Expression Regulation, Neoplastic; Humans; Lung Neoplasms; Models, Genetic; Models, Statistical; Normal Distribution; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results; Support Vector Machines
Type
journal article
