Combining attention with spectrum to handle missing values on time series data without imputation
Journal
INFORMATION SCIENCES
Journal Volume
609
Pages
1271
Date Issued
2022
Author(s)
Abstract
In the development of predictive models, the problem of missing data is a critical issue that traditionally requires a two-step analysis. Data scientists analyze the patterns of missing values, select variables, impute missing values on the basis of domain knowledge, and then train a model. Models typically have their input sizes hardcoded, and have limitations in handling data with high missing rates or changes in available variables. We propose an attention-based neural network combined with a novel real number representation, which requires little work on manually selecting variables, and in which missing data can be overlooked, making imputation unnecessary. In this proposed model, data analysis can be one step, omitting the first step of imputing missing values. The study included data on 32,709 intensive care unit (ICU) admissions and 60 healthcare variables from the Medical Information Mart for Intensive Care (MIMIC)-IV. The proposed algorithm yielded an area under the receiver operating characteristic curve (AUC) of 0.842 (95% CIs: 0.828–0.856) when predicting prolonged length of stay in the ICU, outperforming current approaches using imputation methods. The proposed algorithm can be applied to a range of problems in data science, as it addresses the issue of incomplete data with automatic variable selection.
Subjects
Missing value; Incomplete data; Attention neural network; Deep learning; Electronic health record; Imputation; Missing value; Incomplete data; Attention neural network; Deep learning; Electronic health record; Imputation; INTENSIVE-CARE-UNIT; LENGTH-OF-STAY; NEURAL-NETWORK; CLASSIFICATION; MORTALITY; SEVERITY
Publisher
ELSEVIER SCIENCE INC
Type
journal article