Soil moisture controls the surface water evaporation, runoff, and the energy exchange between the land and the atmosphere. During droughts, soil moisture levels remain persistently low; prior to the onset of heavy rainfall, initial soil water content directly influences flood formation. However, traditional data sources possess inherent limitations: ground observation stations are sparse, satellite remote sensing is susceptible to cloud interference, and numerical modeling involves substantial computational burdens along with systematic biases.
A new study published in Advances in Atmospheric Sciences has successfully utilized machine learning techniques to develop China's high-precision, 1 km resolution soil moisture dataset spanning 2000 to 2025. This dataset enables daily monitoring of soil dryness and wetness conditions across the country, providing critical support for drought early warning, flood forecasting, and agricultural management.
The research team from Nanjing University employed daily data from over 2,300 automated soil moisture observation stations operated by the China Meteorological Administration (CMA) to train a CatBoost machine learning model. They innovatively incorporated feature selection and automated hyperparameter optimization techniques. The final fused dataset (abbreviated as CSMX) outperforms the vast majority of existing products in terms of bias correction. Notably, it significantly mitigates the long-standing "wet bias" issue in reanalysis data, particularly in southern China.
"Our model significantly reduces soil moisture estimation errors while preserving the temporal evolution characteristics of soil humidity." says Prof. Huiling Yuan, the corresponding author.
The dataset has been made publicly available at the Tibetan Plateau Data Center . and can be widely applied to:
"This dataset is particularly well-suited for capturing extreme events such as 'rapid transitions between droughts and floods'.” Yifan Dong, a PhD candidate and the lead author of the study, highlights.
Advances in Atmospheric Sciences
China’s 1 km Daily Surface Soil Moisture Fusion Dataset (2000–2025) Based on Explainable Machine Learning
15-Apr-2026