A Robust Re-Rank Approach with Application to Pooling-Based GWA Study Data
Date Issued
2012
Date
2012
Author(s)
Liu, Jia-Rou
Abstract
Recently, more and more researches encounter the problem where the data objects have an extremely large number of variables while the available sample size is relatively small. To detect the difference between two populations in this situation, the widely used two sample t-test would fail to apply due to its instability in estimating variances. The non-parametric counterpart, AUC, will face the problem of tied values and also fail. To improve the detection power while keeping the robustness, the idea of ``rank-over-variable'' is more appropriate to analyze large-p-small-n datasets. In this study, we propose a robust re-rank approach to overcome the above-mentioned difficulties and reduce the influence of enormous features in the large-$p$-small-$n$ situation. In particular, we obtain a rank-based statistic for each feature based on the concept of "rank-over-variable". Techniques of "random subset" and "re-rank" are then iteratively applied to ranking features. Finally, the leading features in the constructed ranking list will be selected for further research. To evaluate the performance of our proposed re-rank approach, we conduct several simulation studies based on the GAIN-MDD dataset. Compared with the t-statistic and AUC, our re-rank approach is able to identify more pre-defined truly relevant SNPs and robust for different pool number and pooling error. Furthermore, we also demonstrate a real data analysis to explore the markers associated with bipolar disorder.
Subjects
large-p-small-n
dimension reduction
feature selection
filter method
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-101-R99849024-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):7a71e646bcbc326973dfe4ff15bd632b
