Enforcing Equity in Neural Climate Emulators

William Yik Harvey Mudd College Dept. of Earth Sciences, University of Southern California Corresponding author: [email protected] Now at Department of Atmospheric Sciences, University of Washington Sam J. Silva Dept. of Earth Sciences, University of Southern California Dept. of Civil and Environmental Engineering, University of Southern California Dept. of Population and Public Health Sciences, University of Southern California
(June 28, 2024)
Abstract

Neural network emulators have become an invaluable tool for a wide variety of climate and weather prediction tasks. While showing incredibly promising results, these networks do not have an inherent ability to produce equitable predictions. That is, they are not guaranteed to provide a uniform quality of prediction along any particular class or group of people. This potential for inequitable predictions motivates the need for explicit representations of fairness in these neural networks. To that end, we draw on methods for enforcing analytical physical constraints in neural networks to bias networks towards more equitable predictions. We demonstrate the promise of this methodology using the task of climate model emulation. Specifically, we propose a custom loss function which punishes emulators with unequal quality of predictions across any prespecified regions or category, here defined using human development index (HDI). This loss function weighs a standard loss metric such as mean squared error against another metric which captures inequity along the equity category (HDI), allowing us to adjust the priority of each term before training. Importantly, the loss function does not specify a particular definition of equity to bias the neural network towards, opening the door for custom fairness metrics. Our results show that neural climate emulators trained with our loss function provide more equitable predictions and that the equity metric improves with greater weighting in the loss function. We empirically demonstrate that while there is a tradeoff between accuracy and equity when prioritizing the latter during training, an appropriate selection of the equity priority hyperparameter can minimize loss of performance.

Keywords AI for climate  \cdot machine learning  \cdot equity  \cdot fairness

1 Introduction

Modern Earth system models (ESMs) are key in characterizing the Earth’s response to anthropogenic forcing, and their predictions have been widely used to guide global climate policy such as United Nations Paris Agreement. As such, the impact of the predictions from such models has the potential to reach every human being on Earth. By numerically solving equations which describe our understanding of the climate system, ESMs are able to predict the state of the planet under various future scenarios. The large computational cost of such ESMs [1, 2], however, has motivated recent applications of new machine learning (ML) techniques for climate prediction, in particular deep learning with neural networks [3, 4, 5, 6, 7, 8]. While successive iterations of these neural climate emulators have each pushed the frontier of stable climate predictions using machine learning, the metrics used to assess these models only focus on accuracy or, more rarely, sensitivity to forcing and physical consistency/plausibility [9, 10, 11].

One important aspect of climate predictions which is discussed even less frequently is their fairness or equity. This consideration is of particular importance for emulators given their increasing application across a wide variety of climate prediction tasks including atmospheric forecasting [5, 12, 4], subgrid-scale parameterization [13, 14, 15, 16], precipitation nowcasting [17, 18, 19], equation and knowledge discovery [20, 21, 22], data assimilation [23, 24], downscaling [25, 26, 27], bias correction [28, 29, 30], and more [31]. Such principles also align themselves with the explicit fairness goals of many climate policies as well as recent literature on policy development [32, 33]. Despite this, there has been little work to date discussing equity in neural climate emulator predictions. Nevertheless, the inherent process of training a neural network provides a unique opportunity to address this issue. Since researchers already define advanced loss functions to bias their emulators towards specific goals, such as long-term stability and forecast sharpness [12, 34], the loss function itself could open a path towards more equitable predictions if notions of fairness could be integrated within it.

In this work, we draw on methods for enforcing physical constraints (e.g., conservation of mass and energy) [35] in neural networks via the loss function to bias neural climate emulators of global temperature towards more equitable predictions. Specifically, we use a two-part loss function which weighs a standard error metric such as MSE against a measure of the model’s fairness on a sliding scale. In doing so we can adjust the model’s preference for predictions with equal error throughout different prespecified regions of the globe, defined here using Human Development Index [36]. Importantly, given the many existing quantitative measures of equity, our method does not specify a specific fairness metric and allows for any such metric to be weighed against error in the loss function. This is in contrast to previous methods which focus specifically on spatial biases in neural network predictions [37] or non-spatial tabular data [38]. To demonstrate our method’s capability, we train a neural climate emulator to predict surface air temperature and the diurnal temperature range through the year 2100 using our custom loss function. We show that emulators trained using the loss function behave as desired, providing more equitable predictions and improving the equity metric as it is given greater weighting in the loss function. Additionally, we demonstrate that the sliding scale nature of the loss function can give rise to a tradeoff between accuracy and equity when the latter is weighted heavily during training. We demonstrate that a sufficiently small choice of equity weighting could vastly improve the fairness metric while achieving only a small loss in prediction accuracy.

2 Equitable Loss Functions

The ultimate goal of this work is to bias neural climate emulators towards more equitable predictions by integrating notions of fairness into their objective functions. This is accomplished in a manner similar to that of previous methods for enforcing analytical physical constraints in neural networks [35, 39, 26]. The key idea is to balance a traditional measure of error, in our case mean squared error, against a penalty term which captures some representation of fairness in the neural networks. Specifically, the loss function is formulated as

=α𝒫+(1α)MSE,𝛼𝒫1𝛼MSE\displaystyle\mathcal{L}=\alpha\mathcal{P}+(1-\alpha)\text{MSE},caligraphic_L = italic_α caligraphic_P + ( 1 - italic_α ) MSE ,

where 𝒫𝒫\mathcal{P}caligraphic_P is the equity penalty, MSE is global mean squared error, and α𝛼\alphaitalic_α is the equity weighting coefficient. Under this construction, 𝒫𝒫\mathcal{P}caligraphic_P may be based on any quantitative fairness metric. Given the lack of a singular definition for fairness, this flexibility is particularly important. We illustrate the effect of our loss function by defining fairness based on prediction accuracy in various predefined regions across the globe. Specifically, we divide the land of Earth into n𝑛nitalic_n equally sized regions based on Human Development Index, a composite measure of a country’s development [36, 40]. Then for the term 𝒫𝒫\mathcal{P}caligraphic_P, we punish the coefficient of variation, or relative standard deviation, between the errors across those n𝑛nitalic_n regions, encouraging the emulator to have equal predictive error in each region. While this is the only equity penalty presented in this work, we found that other definitions such as simple standard deviation and performance deviation from the most well predicted region lead to similar conclusions, with modestly worse overall performance.

Refer to caption
Figure 1: The 5 HDI regions over which equity is defined in the penalty 𝒫𝒫\mathcal{P}caligraphic_P.

The particular penalty explored in this work seeks to equalize the loss function for different groups, which is akin to parity-based fairness metrics in statistical applications [41, 42]. More precisely, if MSEisubscriptMSE𝑖\text{MSE}_{i}MSE start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the MSE of the emulator’s prediction in region i𝑖iitalic_i, then the emulator’s MSE over all land is

MSEl=1ni=0n1MSEi.subscriptMSE𝑙1𝑛superscriptsubscript𝑖0𝑛1subscriptMSE𝑖\displaystyle\text{MSE}_{l}=\frac{1}{n}\sum_{i=0}^{n-1}\text{MSE}_{i}.MSE start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT MSE start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Furthermore, the standard deviation of the MSE’s for each region is

σR=i=0n1(MSEiMSEl)2n.subscript𝜎𝑅superscriptsubscript𝑖0𝑛1superscriptsubscriptMSE𝑖subscriptMSE𝑙2𝑛\displaystyle\sigma_{R}=\sqrt{\frac{\sum_{i=0}^{n-1}(\text{MSE}_{i}-\text{MSE}% _{l})^{2}}{n}}.italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( MSE start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - MSE start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG .

Finally, our equity penalty for the climate emulator is

𝒫=σRMSEl.𝒫subscript𝜎𝑅subscriptMSE𝑙\displaystyle\mathcal{P}=\frac{\sigma_{R}}{\text{MSE}_{l}}.caligraphic_P = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT end_ARG start_ARG MSE start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG .

We split the land grid cells into quantiles each containing 20% of the cells (i.e., n=5𝑛5n=5italic_n = 5) based on the prespecified equity measure of interest, HDI. The spatial distribution of these 5 HDI regions are shown in Fig. 1.

3 Results

To demonstrate our equitable loss function methodology, we train an ensemble of neural networks to predict one of either surface air temperature (TAS) or diurnal temperature range (DTR) for the years 2080-2100 under an intermediate climate forcing scenario [2] using the ClimateBench dataset [3]. We emphasize again, however, that our method is agnostic to the training dataset and chosen definition of quantitative fairness. These neural climate emulators take as input annual means of carbon dioxide (CO2), methane (CH4), sulfur dioxide (SO2), and black carbon (BC) on a 96 latitude ×\times× 144 longitude global grid and predict TAS and DTR at the same spatiotemporal resolution. (See Materials and Methods for further details.)

We generate ensembles of 15 models for each of α=0,0.01,0.05,0.1,0.25,0.5,0.75,𝛼00.010.050.10.250.50.75\alpha=0,0.01,0.05,0.1,0.25,0.5,0.75,italic_α = 0 , 0.01 , 0.05 , 0.1 , 0.25 , 0.5 , 0.75 , and 1111. Then we compute the MSE of each trained ensemble member’s prediction, as well as the equity penalty. The results are shown in Fig. 5. Immediately obvious is the near monotonic increasing trend for MSE with increasing α𝛼\alphaitalic_α and a corresponding near monotonic decreasing trend for the equity penalty 𝒫𝒫\mathcal{P}caligraphic_P. Given the construction of the loss function with equity weighting coefficient α𝛼\alphaitalic_α, this tradeoff between accuracy and fairness is expected. Interestingly, for low values of α𝛼\alphaitalic_α, there appear to be large relative gains in the equity penalty in exchange for very little relative loss in predictive power. This is highlighted in Fig. 5 which shows the same plot as Fig. 5 but only for α0.25𝛼0.25\alpha\leq 0.25italic_α ≤ 0.25. For example, compared to α=0𝛼0\alpha=0italic_α = 0, the ensemble trained with α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 sees a 32% improvement in the equity penalty for only an 8% loss in accuracy.

Refer to caption
Figure 2: MSE (black) and 𝒫𝒫\mathcal{P}caligraphic_P (blue) for each neural network ensemble trained to predict DTR with varying α𝛼\alphaitalic_α. Error bars represent standard error of the ensemble mean.
Refer to caption
Figure 3: As in Fig. 5, but only for α0.25𝛼0.25\alpha\leq 0.25italic_α ≤ 0.25.
Refer to caption
Figure 4: MSE (black) and 𝒫𝒫\mathcal{P}caligraphic_P (blue) for each neural network ensemble trained to predict TAS with varying α𝛼\alphaitalic_α. Error bars represent standard error of the ensemble mean.
Refer to caption
Figure 5: As in Fig. 5, but only for α0.25𝛼0.25\alpha\leq 0.25italic_α ≤ 0.25.

Figs. 5 and 5 show a similar tradeoff between performance and equity for the TAS prediction task. It seems, however, that the α𝛼\alphaitalic_α value for which large gains in the equity penalty come at low cost in performance is between α=0.25𝛼0.25\alpha=0.25italic_α = 0.25 and α=0.5𝛼0.5\alpha=0.5italic_α = 0.5, as opposed to around α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 for DTR. We do not necessarily expect that this “sweet spot" value for α𝛼\alphaitalic_α should be the same for both prediction tasks, as the errors for each will have different scales which can affect both the MSE and 𝒫𝒫\mathcal{P}caligraphic_P values in the loss function.

For both DTR and TAS, certain ensembles trained with low α𝛼\alphaitalic_α values exhibit interesting behavior where both the accuracy and equity penalty of the ensemble are modestly improved compared to the α=0𝛼0\alpha=0italic_α = 0 ensemble, though the α𝛼\alphaitalic_α value where this occurs is not consistent between DTR and TAS. This is especially surprising since the MSE component of the loss function is defined globally while the equity penalty is only defined over land, meaning that the emulator is not simply achieving more equitable predictions by significantly raising error over the ocean. This is illustrated in Fig. 6 which shows the emulator’s MSE over both land and sea, as well as the equity penalty, for α0.25𝛼0.25\alpha\leq 0.25italic_α ≤ 0.25. From this, it is seen that the emulator’s performance over the ocean does not degrade with increasing α𝛼\alphaitalic_α and in fact improves for certain values.

Refer to caption
Figure 6: MSE (black) and 𝒫𝒫\mathcal{P}caligraphic_P (blue) for each neural network ensemble trained to predict DTR with varying α𝛼\alphaitalic_α. The square and triangular points represent MSE over land and ocean, respectfully. Error bars represent standard error of the ensemble mean.

Overall, Fig. 5-5 highlight that it is indeed possible to bias neural climate emulators towards more equitable predictions with a loss function that explicitly encodes a fairness metric. While this comes at the cost of accuracy, an appropriate choice of a small equity weighting coefficient α𝛼\alphaitalic_α (here, α0.25𝛼0.25\alpha\lessapprox 0.25italic_α ⪅ 0.25) can mitigate losses in performance or even improve global predictions, narrowing the α𝛼\alphaitalic_α hyperparameter search space for those wishing to make use of equitable loss functions. This finding for low α𝛼\alphaitalic_α values aligns with those of prior work [35] for enforcing analytical physical constraints in neural networks.

Refer to caption
Figure 7: DTR MSE in each HDI region for several values of α𝛼\alphaitalic_α. The black dashed line shows the mean of the colored bars, or simply the MSE over land.
Refer to caption
Figure 8: As in Fig. 8, but TAS.
Refer to caption
Figure 9: MSE of the neural network’s DTR predictions averaged over the test data for α=0𝛼0\alpha=0italic_α = 0, 0.050.050.050.05, 0.10.10.10.1, and 0.250.250.250.25.

To further understand the performance of neural climate emulators trained with the equitable loss function, we break down the predictive error at several small α𝛼\alphaitalic_α values by the HDI regions (see Figure 1). This is illustrated in Fig. 8, which shows the emulator’s predictive error in each HDI region for five increasing values of α𝛼\alphaitalic_α (0, 0.01, 0.05, 0.1, and 0.25) for DTR, and Fig. 8, which shows the same results but for the set of emulators trained to predict TAS. These results highlight how error is redistributed between the HDI regions in order to lower the equity penalty as α𝛼\alphaitalic_α increases. Specifically, the ensemble achieves more equitable predictions at higher values of α𝛼\alphaitalic_α by lowering the relative deviation of the error in each region from the mean (black dashed line). This is accomplished by simultaneously lowering the error in poorly predicted regions and raising the error in well predicted regions. For both predicted variables, the α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 neural network ensemble achieves similar MSE performnce as the α=0𝛼0\alpha=0italic_α = 0 case while also successfully reducing the spread in error between the HDI regions.

By redistributing error throughout the HDI regions as shown in Figs. 8 and 8, neural climate emulators trained with the equitable loss function make the spatial distribution of predictive error more uniform. That is, they make predictions of equal quality throughout the globe. This is illustrated in Fig. 9 which shows the DTR MSE averaged over the test period (2080-2100) for four increasing values of α𝛼\alphaitalic_α (0, 0.05, 0.1, and 0.25). Most obviously for the α=0.25𝛼0.25\alpha=0.25italic_α = 0.25 ensemble compared to α=0𝛼0\alpha=0italic_α = 0, the error in the poorly predicted regions over the Tibetan plateau and southeast Asia decreases while error in other regions of the globe increases, effectively lowering the equity penalty. For the regions where predictions become worse at higher α𝛼\alphaitalic_α values, the ensemble appears to distribute error randomly within them. This trend towards a random uniform distribution of error across the globe becomes more and more obvious with increasing α𝛼\alphaitalic_α.

4 Discussion and Conclusion

As weather and climate predictions from both pure ML and hybrid physics-ML models continue to improve according to global accuracy metrics, it remains equally important to ensure that their predictions are accurate for everyone, regardless of their location on the globe. By utilizing a novel loss function which balances a traditional error metric with a penalty for inequitable predictions, our method shows that neural climate emulators can be biased towards more fair predictions, regardless of the chosen definition of quantitative equity. Moreover, with proper weighting of the equity penalty term in the loss function, this comes at little cost to global accuracy. Overall, our results highlight that neural climate emulators can achieve fair predictions when trained towards an appropriate goal.

While we focus on one particular definition of equity in this work, another may easily be substituted in this proposed loss function penalty framework. There are interesting implications for the loss landscape when using different definitions of numerical equity. Just as mean squared error is preferred over mean absolute error when optimizing neural networks for certain problems [43, 44], some equity penalties will be more favorable from an optimization standpoint than others. Similarly, our choice to define equity regions based on HDI is not the only possible option. A more spatially coherent choice (e.g., using the continents as regions) is easily achievable using our method, and there are interesting open questions regarding the effects this may have on training and performance. Furthermore, many recent AI weather models tend to be optimized towards multi-objective loss functions, often to achieve accuracy on both short and long lead times. In a similar manner, future work may investigate the effects of training a neural climate emulator towards both equity and another goals such as obeying a physical constraint or maintaining sharpness in predictions at long lead times.

5 Materials and Methods

5.1 Data

We train neural networks to emulate the outputs of the Norwegian Earth System Model version 2 (NorESM2) [45] using the ClimateBench dataset [3]. For several shared socioeconomic pathways (SSPs) covering a wide range of future emissions scenarios [2], the ClimateBench dataset provides four key inputs to NorESM2, namely carbon dioxide (CO2), methane (CH4), sulfur dioxide (SO2), and black carbon (BC). These inputs are given as annual means on a 96 latitude ×\times× 144 longitude global grid. We use these inputs to predict the surface air temperature (TAS) and diurnal temperature range (DTR) outputs of NorESM2 at the same 96 latitude ×\times× 144 longitude global grid spatiotemporal resolution.

5.2 Neural Network Model and Training

In this work, we use the best performing model analyzed in [3], a CNN-LSTM. This type of neural network combines a convolutional neural network (CNN) with a long short-term memory (LSTM) network, which respectively capture spatial and temporal relations in the data making them well suited for climate prediction.

For each of the temperature variables we wish to predict (TAS and DTR) and α𝛼\alphaitalic_α value explored (0,0.01,0.05,0.1,0.25,0.5,0.75,00.010.050.10.250.50.750,0.01,0.05,0.1,0.25,0.5,0.75,0 , 0.01 , 0.05 , 0.1 , 0.25 , 0.5 , 0.75 , and 1111), we train an ensemble of 15 CNN-LSTMs on data provided by ClimateBench for a total of 2×8×15=24028152402\times 8\times 15=2402 × 8 × 15 = 240 neural networks. The training data includes NorESM2’s historical experiment as well as its output for the SSP126, SSP370, and SSP585 experiments of the Scenario Model Intercomparison Project (ScenarioMIP) [2]. We also include the hist-GHG and hist-aer experiments from the Detection and Attribution Model Intercomparison Project (DAMIP) [46] in our training data. Each of the 3 SSP and 3 historical experiments contain data from 2015-2100 and 1850-2014 for a total of 753 training data points. From this training dataset, we reserve the first two years of data from every decade in each experiment as validation data. This is done because climate data is highly autocorrelated in time, so it best practice to form the validation data from continuous subsets of dataset rather than at random. Finally, to evaluate the CNN-LSTM’s performance, we use the SSP245 scenario as a holdout test dataset because it represents intermediate future anthropogenic forcing.

5.3 Data, Materials, and Software Availability

The complete codebase for processing the ClimateBench data, defining the equitable loss function, training the neural networks, and analyzing the results will be made available soon.

Acknowledgements

The authors thank Joseph Hardin for helpful discussions and feedback.

References

  • Collins et al. [2012] Matthew Collins, Richard E Chandler, Peter M Cox, John M Huthnance, Jonathan Rougier, and David B Stephenson. Quantifying future climate change. Nature Climate Change, 2(6):403–409, 2012.
  • O’Neill et al. [2016] Brian C O’Neill, Claudia Tebaldi, Detlef P Van Vuuren, Veronika Eyring, Pierre Friedlingstein, George Hurtt, Reto Knutti, Elmar Kriegler, Jean-Francois Lamarque, Jason Lowe, et al. The scenario model intercomparison project (scenariomip) for cmip6. Geoscientific Model Development, 9(9):3461–3482, 2016.
  • Watson-Parris et al. [2022] Duncan Watson-Parris, Yuhan Rao, Dirk Olivié, Øyvind Seland, Peer Nowack, Gustau Camps-Valls, Philip Stier, Shahine Bouabid, Maura Dewey, Emilie Fons, et al. Climatebench v1. 0: A benchmark for data-driven climate projections. Journal of Advances in Modeling Earth Systems, 14(10):e2021MS002954, 2022.
  • Kochkov et al. [2023] Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, James Lottes, Stephan Rasp, Peter Düben, Milan Klöwer, et al. Neural general circulation models. arXiv preprint arXiv:2311.07222, 2023.
  • Watt-Meyer et al. [2023] Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K Clark, Brian Henn, James Duncan, Noah D Brenowitz, Karthik Kashinath, Michael S Pritchard, Boris Bonev, et al. Ace: A fast, skillful learned global atmospheric model for climate prediction. arXiv preprint arXiv:2310.02074, 2023.
  • El-Habil and Abu-Naser [2022] BASEL Y El-Habil and SAMY S Abu-Naser. Global climate prediction using deep learning. Journal of Theoretical and Applied Information Technology, 100(24):4824–4838, 2022.
  • Bauer et al. [2023] Peter Bauer, Peter Dueben, Matthew Chantry, Francisco Doblas-Reyes, Torsten Hoefler, Amy McGovern, and Bjorn Stevens. Deep learning and a changing economy in weather and climate prediction. Nature Reviews Earth & Environment, 4(8):507–509, 2023.
  • Yik et al. [2023a] William Yik, Sam J Silva, Andrew Geiss, and Duncan Watson-Parris. Exploring randomly wired neural networks for climate model emulation. Artificial Intelligence for the Earth Systems, 2(4):220088, 2023a.
  • Rasp et al. [2020] Stephan Rasp, Peter D Dueben, Sebastian Scher, Jonathan A Weyn, Soukayna Mouatadid, and Nils Thuerey. Weatherbench: a benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems, 12(11):e2020MS002203, 2020.
  • Rasp et al. [2023] Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, et al. Weatherbench 2: A benchmark for the next generation of data-driven global weather models. arXiv preprint arXiv:2308.15560, 2023.
  • Yu et al. [2024] Sungduk Yu, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C Will, Gunnar Behrens, Julius Busecke, et al. Climsim: A large multi-scale dataset for hybrid physics-ml climate emulation. Advances in Neural Information Processing Systems, 36, 2024.
  • Pathak et al. [2022] Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214, 2022.
  • Wang et al. [2022a] Peidong Wang, Janni Yuval, and Paul A O’Gorman. Non-local parameterization of atmospheric subgrid processes with neural networks. Journal of Advances in Modeling Earth Systems, 14(10):e2022MS002984, 2022a.
  • Rasp et al. [2018] Stephan Rasp, Michael S Pritchard, and Pierre Gentine. Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences, 115(39):9684–9689, 2018.
  • Beucler et al. [2024] Tom Beucler, Pierre Gentine, Janni Yuval, Ankitesh Gupta, Liran Peng, Jerry Lin, Sungduk Yu, Stephan Rasp, Fiaz Ahmed, Paul A O’Gorman, et al. Climate-invariant machine learning. Science Advances, 10(6):eadj7250, 2024.
  • Zanna and Bolton [2021] Laure Zanna and Thomas Bolton. Deep learning of unresolved turbulent ocean processes in climate models. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences, pages 298–306, 2021.
  • Shi et al. [2017] Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. Deep learning for precipitation nowcasting: A benchmark and a new model. Advances in neural information processing systems, 30, 2017.
  • Espeholt et al. [2022] Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Rob Carver, Marcin Andrychowicz, Jason Hickey, et al. Deep learning for twelve hour precipitation forecasts. Nature communications, 13(1):1–10, 2022.
  • Li et al. [2022] Weide Li, Xi Gao, Zihan Hao, and Rong Sun. Using deep learning for precipitation forecasting based on spatio-temporal information: a case study. Climate Dynamics, 58(1):443–457, 2022.
  • Zanna and Bolton [2020] Laure Zanna and Thomas Bolton. Data-driven equation discovery of ocean mesoscale closures. Geophysical Research Letters, 47(17):e2020GL088376, 2020.
  • Grundner et al. [2024] Arthur Grundner, Tom Beucler, Pierre Gentine, and Veronika Eyring. Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems, 16(3):e2023MS003763, 2024.
  • Yik et al. [2023b] William Yik, Maike Sonnewald, Mariana CA Clare, and Redouane Lguensat. Southern ocean dynamics under climate change: New knowledge through physics-guided machine learning. arXiv preprint arXiv:2310.13916, 2023b.
  • Arcucci et al. [2021] Rossella Arcucci, Jiangcheng Zhu, Shuang Hu, and Yi-Ke Guo. Deep data assimilation: integrating deep learning with data assimilation. Applied Sciences, 11(3):1114, 2021.
  • Wang et al. [2022b] Yueya Wang, Xiaoming Shi, Lili Lei, and Jimmy Chi-Hung Fung. Deep learning augmented data assimilation: Reconstructing missing information with convolutional autoencoders. Monthly Weather Review, 150(8):1977–1991, 2022b.
  • Wang and Tian [2022] Fang Wang and Di Tian. On deep learning-based bias correction and downscaling of multiple climate models simulations. Climate dynamics, 59(11):3451–3468, 2022.
  • Harder et al. [2023] Paula Harder, Alex Hernandez-Garcia, Venkatesh Ramesh, Qidong Yang, Prasanna Sattegeri, Daniela Szwarcman, Campbell Watson, and David Rolnick. Hard-constrained deep learning for climate downscaling. Journal of Machine Learning Research, 24(365):1–40, 2023.
  • Geiss et al. [2022] Andrew Geiss, Sam J Silva, and Joseph C Hardin. Downscaling atmospheric chemistry simulations with physically consistent deep learning. Geoscientific Model Development, 15(17):6677–6694, 2022.
  • Han et al. [2021] Lei Han, Mingxuan Chen, Kangkai Chen, Haonan Chen, Yanbiao Zhang, Bing Lu, Linye Song, and Rui Qin. A deep learning method for bias correction of ecmwf 24–240 h forecasts. Advances in Atmospheric Sciences, 38(9):1444–1459, 2021.
  • Hess et al. [2023] Philipp Hess, Stefan Lange, Christof Schötz, and Niklas Boers. Deep learning for bias-correcting cmip6-class earth system models. Earth’s Future, 11(10):e2023EF004002, 2023.
  • Kim et al. [2021] Ham Kim, YG Ham, YS Joo, and SW Son. Deep learning for bias correction of mjo prediction. Nature Communications, 12(1):3087, 2021.
  • Lai et al. [2024] Ching-Yao Lai, Pedram Hassanzadeh, Aditi Sheshadri, Maike Sonnewald, Raffaele Ferrari, and Venkatramani Balaji. Machine learning for climate physics and simulations. arXiv preprint arXiv:2404.13227, 2024.
  • Rogna and Vogt [2022] Marco Rogna and Carla J Vogt. Optimal climate policies under fairness preferences. Climatic Change, 174(3-4):25, 2022.
  • Giang et al. [2024] Amanda Giang, Morgan R Edwards, Sarah M Fletcher, Rivkah Gardner-Frolick, Rowenna Gryba, Jean-Denis Mathias, Camille Venier-Cambron, John M Anderies, Emily Berglund, Sanya Carley, et al. Equity and modeling in sustainability science: Examples and opportunities throughout the process. Proceedings of the National Academy of Sciences, 121(13):e2215688121, 2024.
  • Lam et al. [2023] Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, et al. Learning skillful medium-range global weather forecasting. Science, 382(6677):1416–1421, 2023.
  • Beucler et al. [2021] Tom Beucler, Michael Pritchard, Stephan Rasp, Jordan Ott, Pierre Baldi, and Pierre Gentine. Enforcing analytic constraints in neural networks emulating physical systems. Physical Review Letters, 126(9):098302, 2021.
  • Anand and Sen [1994] Sudhir Anand and Amartya Sen. Human development index: Methodology and measurement. Human Development Report, 1994.
  • He et al. [2023] Erhu He, Yiqun Xie, Licheng Liu, Weiye Chen, Zhenong **, and Xiaowei Jia. Physics guided neural networks for time-aware fairness: an application in crop yield prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14223–14231, 2023.
  • Pérez-Suay et al. [2017] Adrián Pérez-Suay, Valero Laparra, Gonzalo Mateo-García, Jordi Muñoz-Marí, Luis Gómez-Chova, and Gustau Camps-Valls. Fair kernel learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 339–355. Springer, 2017.
  • Harder et al. [2022] Paula Harder, Duncan Watson-Parris, Philip Stier, Dominik Strassel, Nicolas R Gauger, and Janis Keuper. Physics-informed learning of aerosol microphysics. Environmental Data Science, 1:e20, 2022.
  • Kummu et al. [2018] Matti Kummu, Maija Taka, and Joseph HA Guillaume. Gridded global datasets for gross domestic product and human development index over 1990–2015. Scientific data, 5(1):1–15, 2018.
  • Agarwal et al. [2019] Alekh Agarwal, Miroslav Dudík, and Zhiwei Steven Wu. Fair regression: Quantitative definitions and reduction-based algorithms. In International Conference on Machine Learning, pages 120–129. PMLR, 2019.
  • Caton and Haas [2020] Simon Caton and Christian Haas. Fairness in machine learning: A survey. ACM Computing Surveys, 2020.
  • Hodson [2022] Timothy O Hodson. Root mean square error (rmse) or mean absolute error (mae): When to use them or not. Geoscientific Model Development Discussions, 2022:1–10, 2022.
  • Chai and Draxler [2014] Tianfeng Chai and Roland R Draxler. Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature. Geoscientific model development, 7(3):1247–1250, 2014.
  • Seland et al. [2020] Øyvind Seland, Mats Bentsen, Dirk Olivié, Thomas Toniazzo, Ada Gjermundsen, Lise Seland Graff, Jens Boldingh Debernard, Alok Kumar Gupta, Yan-Chun He, Alf Kirkevåg, et al. Overview of the norwegian earth system model (noresm2) and key climate response of cmip6 deck, historical, and scenario simulations. Geoscientific Model Development, 13(12):6165–6200, 2020.
  • Gillett et al. [2016] Nathan P Gillett, Hideo Shiogama, Bernd Funke, Gabriele Hegerl, Reto Knutti, Katja Matthes, Benjamin D Santer, Daithi Stone, and Claudia Tebaldi. The detection and attribution model intercomparison project (damip v1. 0) contribution to cmip6. Geoscientific Model Development, 9(10):3685–3697, 2016.