Explainable Machine Learning Models for Predicting FEV in Non-Smoking Taiwanese Men Aged 45-55 Years.

Chang, Chih-YuehChih-YuehChangPei, DeeDeePeiKuo, Yen-LiangYen-LiangKuoLI-NA LEEWu, Chung-ZeChung-ZeWuChu, Ta-WeiTa-WeiChuShen, Hsiang-ShiHsiang-ShiShenHuang, Chun-YenChun-YenHuangLiang, Yao-JenYao-JenLiang2026-04-222026-04-222025-12-11https://scholars.lib.ntu.edu.tw/handle/123456789/737439: Traditional regression explains only part of the variation in forced expiratory volume in one second (FEV). Machine learning (ML) methods may capture nonlinear patterns beyond linear assumptions. : We analyzed 23,943 non-smoking Taiwanese men aged 45-55 years from the MJ Health Screening Cohort. Random Forest (RF), Stochastic Gradient Boosting (SGB), and XGBoost were compared with multiple linear regression (MLR) using repeated train-test splits. Model performance was evaluated with RMSE, RAE, RRSE, and SMAPE. Shapley additive explanations (SHAP) were used to interpret variable effects. : ML models achieved slightly lower prediction errors than MLR. The most influential predictors across models were lactate dehydrogenase (LDH), body weight (BW), education level, leukocyte count, total bilirubin, and sport area. SHAP indicated negative effects of LDH and leukocyte count and positive associations for BW, bilirubin, education, and physical activity. : ML approaches provided modest accuracy gains and clearer interpretability compared with MLR. Biochemical and lifestyle factors-including LDH, BW, education, inflammation markers, and physical activity-contribute meaningfully to FEV among healthy middle-aged men.entrueFEV1SHAPhealth examination cohortlactate dehydrogenaselung functionmachine learningExplainable Machine Learning Models for Predicting FEV in Non-Smoking Taiwanese Men Aged 45-55 Years.journal article10.3390/diagnostics15243152414641532-s2.0-105025810429