Robust self-tuning semiparametric PCA for contaminated elliptical distribution
Journal
IEEE Transactions on Signal Processing
Journal Volume
70
Pages
5885
Date Issued
2022-06-08
Author(s)
Abstract
Principal component analysis (PCA) is one of the most popular dimension
reduction methods. The usual PCA is known to be sensitive to the presence of
outliers, and thus many robust PCA methods have been developed. Among them, the
Tyler's M-estimator is shown to be the most robust scatter estimator under the
elliptical distribution. However, when the underlying distribution is
contaminated and deviates from ellipticity, Tyler's M-estimator might not work
well. In this article, we apply the semiparametric theory to propose a robust
semiparametric PCA. The merits of our proposal are twofold. First, it is robust
to heavy-tailed elliptical distributions as well as robust to non-elliptical
outliers. Second, it pairs well with a data-driven tuning procedure, which is
based on active ratio and can adapt to different degrees of data outlyingness.
Theoretical properties are derived, including the influence functions for
various statistical functionals and asymptotic normality. Simulation studies
and a data analysis demonstrate the superiority of our method.
reduction methods. The usual PCA is known to be sensitive to the presence of
outliers, and thus many robust PCA methods have been developed. Among them, the
Tyler's M-estimator is shown to be the most robust scatter estimator under the
elliptical distribution. However, when the underlying distribution is
contaminated and deviates from ellipticity, Tyler's M-estimator might not work
well. In this article, we apply the semiparametric theory to propose a robust
semiparametric PCA. The merits of our proposal are twofold. First, it is robust
to heavy-tailed elliptical distributions as well as robust to non-elliptical
outliers. Second, it pairs well with a data-driven tuning procedure, which is
based on active ratio and can adapt to different degrees of data outlyingness.
Theoretical properties are derived, including the influence functions for
various statistical functionals and asymptotic normality. Simulation studies
and a data analysis demonstrate the superiority of our method.
Subjects
Active ratio; elliptical distributions; influence function; PCA; robustness; semiparametric theory; Tyler's M-estimator; MULTIVARIATE LOCATION; M-ESTIMATORS; COMPONENT ANALYSIS; OUTLIER DETECTION; R-ESTIMATION; PRINCIPAL; SCATTER; COVARIANCE; SHAPE; REGRESSION; Statistics - Methodology; Statistics - Methodology
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Type
journal article