Missing value imputation on multiple measurements for prediction of liver cancer recurrence: A comparative study
Journal
Frontiers in Artificial Intelligence and Applications
Journal Volume
274
Pages
1930-1939
Date Issued
2015
Author(s)
Abstract
The problem of missing values frequently occurs during data analysis. Imputation is one of the solutions to handle missing data. Clinical data often contain multiple measurements such as laboratory test results which are measured at different time points. In this study, we compared three imputation methods and their effects on different multiple measurement data sets with different sampling time periods. Data sets of liver cancer were used in this study for classification of liver cancer recurrence based on two types of classification models built by support vector machine (SVM) and random forests. The results report appropriate combinations of imputation methods and sampling time periods which achieve better classification results than those of other imputation methods and periods. These reported the leading imputation method with SVM is significantly different (P<0.001) from mean imputation with SVM which is frequently used by data sets with missing values. ? 2015 The authors and IOS Press. All rights reserved.
SDGs
Other Subjects
Autocorrelation; Classification (of information); Decision trees; Diseases; Intelligent control; Intelligent systems; Classification models; Classification results; Comparative studies; imputation; Missing value imputation; Missing values; Multiple measurements; Random forests; Support vector machines
Publisher
IOS Press
Type
conference paper
