Abstract:
Photovoltaic Groundwater Pumping Systems (PVGWPSs) have experienced growing interest, particularly in two
key regions. In Africa, they offer a means to improve water availability for millions. In northern India, they could
help decarbonize the agricultural sector. However, large-scale deployment must be approached carefully to avoid
risks such as groundwater overextraction or widespread unmet irrigation demand. To support informed
deployment, a large-scale, physics-based, dynamic PVGWPS model is introduced, that simulates pumping ca
pacities of PVGWPS. Given the computational intensity of this model, machine learning-based emulators are
explored to replicate its results more efficiently without significant loss in accuracy. The emulator operates in
two stages. First, it predicts whether the motor-pump will stop due to water level dropping below the operational threshold. Among the models tested, the Gradient Boosting Classifier model performed best. Second, when no
stoppage is predicted, the emulator estimates the pumping capacity of the PVGWPS. Among the models tested for this second task, the Random Forest Regressor gave the most accurate results. Applied to datasets from Africa and the Indo-Gangetic Basin within India, the emulator achieved high accuracy (R 2 ≥ 0.99, NRMSE ≤ 5 %) while reducing computation time by more than a factor of 1500. The emulators thus offer high computational speed and sufficient accuracy to open the way to addressing large-scale dispatch problems, such as the optimal positioning and pre-sizing of PVGWPSs at regional, national, or even continental scales while considering a large number of possible climate scenarios. Coupled with sustainability analyses (not explored in this study), they could serve as powerful upstream decision-support tools for PVGWPSs planning, complementing more detailed, site-specific analyses .