Cost-sensitive learning for recurrence prediction of breast cancer
Journal
PACIS 2010 - 14th Pacific Asia Conference on Information Systems
Pages
1218-1228
Date Issued
2010
Author(s)
Abstract
Breast cancer is one of the top cancer-death causes and specifically accounts for 10.4% of all cancer incidences among women. The prediction of breast cancer recurrence has been a challenging research problem for many researchers. Data mining techniques have recently received considerable attention, especially when used for the construction of prognosis models from survival data. However, existing data mining techniques may not be effective to handle censored data. Censored instances are often discarded when applying classification techniques to prognosis. In this paper, we propose a cost-sensitive learning approach to involve the censored data in prognostic assessment with better recurrence prediction capability. The proposed approach employs an outcome inference mechanism to infer the possible probabilistic outcome of each censored instance and adopt the cost-proportionate rejection sampling and a committee machine strategy to take into account these instances with probabilistic outcomes during the classification model learning process. We empirically evaluate the effectiveness of our proposed approach for breast cancer recurrence prediction and include a censored-data-discarding method (i.e., building the recurrence prediction model by only using uncensored data) and the Kaplan-Meier method (a common prognosis method) as performance benchmarks. Overall, our evaluation results suggest that the proposed approach outperforms its benchmark techniques, measured by precision, recall and F1 score.
Subjects
Breast cancer; Cost-sensitive learning; Data mining; Recurrence prediction; Survival analysis
SDGs
Other Subjects
Breast Cancer; Censored data; Classification models; Classification technique; Committee machines; Cost-sensitive learning; Data mining techniques; Evaluation results; Inference mechanism; Kaplan-Meier method; Prediction capability; Prediction model; Prognosis models; Research problems; Survival analysis; Survival data; Benchmarking; Costs; Data mining; Forecasting; Information systems; Mathematical models; Diseases
Type
conference paper