Comparison and interpretation of data-driven models for simulating site-specific human-impacted groundwater dynamics in the North China Plain
Journal
Journal of Hydrology
Journal Volume
616
Date Issued
2023-01-01
Author(s)
Jing, Hao
He, Xin
Tian, Yong
Lancia, Michele
Cao, Guoliang
Guo, Zhilin
Zheng, Chunmiao
Abstract
Data-driven models (DDMs) have gained increasing popularity in groundwater hydrology in recent years due to the advancement of machine learning algorithms and the flexibility of easily accessible data. For groundwater purposes, the differences in deep learning (DL) algorithms compared with traditional tree-based machine learning (TB) algorithms have not been fully investigated, and the importance of different input features for groundwater level simulation has rarely been addressed. In this study, we test and validate six DDMs for simulating the groundwater levels of the North China Plain (NCP) at selected boreholes. The NCP is a large alluvial aquifer system (144,000 km2) overexploited by massive water withdrawals since the 1960s. In our simulations, four DDMs were tree-based (random forest, XGBoost, gradient boosting regression, LightGBM), and two were deep learning algorithms (Vanilla-LSTM and encoder-decoder-LSTM). The results showed that deep-learning-based DDMs provided a better correlation to observed data than tree-based models. Additionally, encoder-decoder-LSTM had the best model performance among all DDMs, and it had the ability to generate compelling results (R2 = 0.61, RMSE = 0.73 m), although each individual driving factor had a low correlation to the simulation target. GINI coefficient analysis and permutation feature importance analysis were used to determine the ranking of different model driving factors for the interpretable results. The results showed that the factors related to human activities had a much stronger impact on groundwater level variation than other factors. A preprocessing procedure of the driving factors helps produce satisfactory simulations aimed at sustainable water management and aquifer restoration, especially in data-scarce areas.
Subjects
Data-driven models | Groundwater dynamics | LSTM | Machine learning | North China Plain
Type
journal article
