License: CC BY-NC-SA 4.0
arXiv:2403.08154v1 [cs.LG] 13 Mar 2024
\conference

Proceedings of the IISE Annual Conference & Expo 2023
K. Babski-Reeves, B. Eksioglu, D. Hampton, eds.

The Effect of Different Optimization Strategies to Physics-Constrained Deep Learning for Soil Moisture Estimation

Jianxin Xie    Bing Yao    Zheyu Jiang
Abstract

Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depends greatly on advanced analytical and predictive domain-aware models. In this work, we propose a physics-constrained deep learning (P-DL) framework to integrate physics-based principles on water transport and water sensing signals for effective reconstruction of the soil moisture dynamics. We adopt three different optimizers, namely Adam, RMSprop, and GD, to minimize the loss function of P-DL during the training process. In the illustrative case study, we demonstrate the empirical convergence of Adam optimizers outperforms the other optimization methods in both mini-batch and full-batch training.

Keywords

Richards equation, soil moisture, physics-informed neural network, optimization

1 Introduction

Soil moisture is a vital hydrological state variable that has a significant impact on the global environment and human society. Detailed monitoring and modeling of soil moisture spatiotemporal dynamics are of crucial importance to numerous applications including freshwater allocation, weather forecasting, and natural disaster predictions (e.g., floods, landslides, and droughts). The soil moisture dynamic is governed by a hydrological model in the form of a partial differential equation (PDE), called the Richardson-Richards equation (RRE) [1]. RRE encompasses Buckingham-Darcy law [2] that describes both saturated and unsaturated water flow in soil and continuity equation that helps to describe the water flux. RRE captures the nonlinear function relationship between three important soil moisture variables, i.e., the soil volumetric water content θ𝜃\thetaitalic_θ, the pressure head ψ𝜓\psiitalic_ψ , and hydrologic water conductivity K𝐾Kitalic_K. The characteristic relations between ψ𝜓\psiitalic_ψ with θ𝜃\thetaitalic_θ and ψ𝜓\psiitalic_ψ with K𝐾Kitalic_K can be characterized by the water retention curve and hydraulic conductivity function, respectively, to delineate the characteristics of water and solute movement in soils. Many parametric models that combine both careful experimental work and perceptive theoretical insights have been developed to describe these two soil hydraulic relations (also known as constitutive relationships). In order to quantitatively simulate and visualize the soil moisture dynamics, scientists have applied mesh-based algorithms such as the finite element method, to numerically solve the Richardson-Richards equation. However, mesh-based methods involve the discretization of both the spatial and temporal domains of the soil moisture evolvement process. The computation complexity to solve the RRE at each time step is proportional to the number of discretized nodes within the targeted soil field, resulting in expensive computational costs for detailed modeling. More importantly, real-world sensor measurements cannot be readily assimilated into the FEM numerical procedure, leading to inferior applicability of mesh-based simulation in real-world practice. It is worth noting that the movement of water and their interrelationships in the soil has already been well summarized in the RRE and various parametric constitutive models. To take full advantage of both physics knowledge and empirical sensor observation, Raissi et al. [3] built a physics-informed neural network (PINN) framework that integrates the well-established physics laws with deep learning to suppress the model dependence on training data. The efficacy of the PINN has already been verified in numerous physical systems, such as liquid flow simulation, elastodynamic problems, non-linear structural systems, cardiovascular flow modeling, and cardiac electrodynamics simulation [4]. Previously, researchers have investigated the application of PINNs to model the soil moisture dynamics by engaging RRE. Notably, Banbai et al. [5] embedded RRE into PINN to inversely learn the soil moisture dynamics only from volumetric water content observations without engaging any pre-assumptions on soil hydraulic functions and realize a free-form representation of constitutive relationships. Here, we propose a physics-constrained deep learning (P-DL) based on the PINN framework that accommodates the sensor measurements of the directly observed independent variable, i.e., the water pressure head ψ𝜓\psiitalic_ψ, in a three-dimensional (3D) soil geometry. The van Genuchten model is engaged to provide the explicit nonlinear relations between the pressure head and the other variable in interest, which can also further facilitate the physics embodiment as the model constraint in the PINN. Moreover, the performance of the predictive modeling depends to a great extent on the proper selection of the optimization technique. In this study, we further investigate the effect of the three most commonly-used optimizers, i.e., Adaptive Moment Estimation (Adam), Root Mean Square Propagation (RMSprop), and Gradient Descent (GD) for both mini-batch and full-batch training in minimizing the loss function of P-DL that is designed to not only satisfactorily match the pressure head sensor measurements but also respect the RRE with the constitutive relations.

2 Research Methodology

2.1 Richardson-Richards Equation

In this paper, we engage non-linear RRE to describe the flow of water in 3D unsaturated homogeneous rigid soil and ignore the sink term [1]: θ(ψ)t=q𝜃𝜓𝑡𝑞\frac{\partial\theta(\psi)}{\partial t}=-\nabla\cdot qdivide start_ARG ∂ italic_θ ( italic_ψ ) end_ARG start_ARG ∂ italic_t end_ARG = - ∇ ⋅ italic_q, where θ𝜃\thetaitalic_θ is the soil volumetric water content, ψ𝜓\psiitalic_ψ is pressure head, t𝑡titalic_t denotes time, q𝑞qitalic_q represents the soil water flux density. This relation is also known as the continuity equation with respect to the mass balance of the soil water. The Buckingham-Darcy law [2] defines the relationship between q𝑞qitalic_q and ψ𝜓\psiitalic_ψ, which describes both saturated and unsaturated water flow in the soil: q=K(ψ)(ψ+z)𝑞𝐾𝜓𝜓𝑧q=-K(\psi)\cdot\nabla(\psi+z)italic_q = - italic_K ( italic_ψ ) ⋅ ∇ ( italic_ψ + italic_z ), where K𝐾Kitalic_K is the hydraulic conductivity. The dynamics of soil water can be summarized in RRE using the preceding relations:

θ(ψ)t=(K(ψ)(ψ+z))𝜃𝜓𝑡𝐾𝜓𝜓𝑧\frac{\partial\theta(\psi)}{\partial t}=\nabla\cdot(K(\psi)\nabla(\psi+z))divide start_ARG ∂ italic_θ ( italic_ψ ) end_ARG start_ARG ∂ italic_t end_ARG = ∇ ⋅ ( italic_K ( italic_ψ ) ∇ ( italic_ψ + italic_z ) ) (1)

It is worth noting that the pressure head ψ𝜓\psiitalic_ψ is a primal variable that is dependent on t𝑡titalic_t and spatial instances s=[x,y,z]𝑠𝑥𝑦𝑧s=[x,y,z]italic_s = [ italic_x , italic_y , italic_z ]. Thus, the left-hand side of Eq. (1) can be reformulated as θψψt𝜃𝜓𝜓𝑡\frac{\partial\theta}{\partial\psi}\frac{\partial\psi}{\partial t}divide start_ARG ∂ italic_θ end_ARG start_ARG ∂ italic_ψ end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_t end_ARG based on the chain rule. The function relationship of θ(ψ)𝜃𝜓\theta(\psi)italic_θ ( italic_ψ ) and K(ψ)𝐾𝜓K(\psi)italic_K ( italic_ψ ) are usually referred as water retention curves (WRCs) and hydraulic conductivity functions (HCFs), respectively, which are commonly specified by parametric models. Here, without loss of generality, we choose the commonly-used van Genuchten model [6] to characterize WRCs and HCFs:

θ(ψ)𝜃𝜓\displaystyle\theta(\psi)italic_θ ( italic_ψ ) =θsθr[1+(α|ψ|)n]m+θrabsentsubscript𝜃𝑠subscript𝜃𝑟superscriptdelimited-[]1superscript𝛼𝜓𝑛𝑚subscript𝜃𝑟\displaystyle=\frac{\theta_{s}-\theta_{r}}{\left[1+(\alpha|\psi|)^{n}\right]^{% m}}+\theta_{r}= divide start_ARG italic_θ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG [ 1 + ( italic_α | italic_ψ | ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG + italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (2)
K(ψ)𝐾𝜓\displaystyle K(\psi)italic_K ( italic_ψ ) =Ks{1(α|ψ|)n1[1+(α|ψ|)n]m}2[1+(α|ψ|)n]m/2absentsubscript𝐾𝑠superscript1superscript𝛼𝜓𝑛1superscriptdelimited-[]1superscript𝛼𝜓𝑛𝑚2superscriptdelimited-[]1superscript𝛼𝜓𝑛𝑚2\displaystyle=K_{s}\frac{\big{\{}1-(\alpha|\psi|)^{n-1}[1+(\alpha|\psi|)^{n}]^% {-m}\big{\}}^{2}}{[1+(\alpha|\psi|)^{n}]^{m/2}}= italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT divide start_ARG { 1 - ( italic_α | italic_ψ | ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT [ 1 + ( italic_α | italic_ψ | ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT - italic_m end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG [ 1 + ( italic_α | italic_ψ | ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_m / 2 end_POSTSUPERSCRIPT end_ARG

where Ks,θs,θrsubscript𝐾𝑠subscript𝜃𝑠subscript𝜃𝑟K_{s},\theta_{s},\theta_{r}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are the saturated hydraulic conductivity, saturated volumetric moisture content, and residual moisture content, respectively. Parameters n,m𝑛𝑚n,mitalic_n , italic_m, and α𝛼\alphaitalic_α stand for curve-fitting soil hydraulic properties, where m=11/n𝑚11𝑛m=1-1/nitalic_m = 1 - 1 / italic_n. These soil hydraulic parameters determine the soil properties of the field, the values of which are taken from [7].

2.2 Physics-constrained deep learning (P-DL)

The soil moisture dynamics that are characterized by the WRC and HCF majorly depend on the accurate modeling of ψ(s,t)𝜓𝑠𝑡\psi(s,t)italic_ψ ( italic_s , italic_t ). We engage a feedforward fully-connected DNN that is trained not only to satisfy the sensor measurement of pressure head ψmsubscript𝜓𝑚\psi_{m}italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, but also to respect the underlying physics principles (RRE) to approximate the nonlinear relationships between the input spatiotemporal instances (s,t)𝑠𝑡(s,t)( italic_s , italic_t ) and the decision variable, i.e., pressure head ψ𝜓\psiitalic_ψ. We model the spatiotemporal soil water pressure head distribution as: [s,t]𝒩(s,t;ΘNN)ψ^(s,t)𝒩𝑠𝑡subscriptΘ𝑁𝑁𝑠𝑡^𝜓𝑠𝑡[s,t]\xrightarrow{\mathcal{N}\left(s,t;\Theta_{NN}\right)}\hat{\psi}(s,t)[ italic_s , italic_t ] start_ARROW start_OVERACCENT caligraphic_N ( italic_s , italic_t ; roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) end_OVERACCENT → end_ARROW over^ start_ARG italic_ψ end_ARG ( italic_s , italic_t ), where 𝒩(s,t;ΘNN)𝒩𝑠𝑡subscriptΘ𝑁𝑁\mathcal{N}\left(s,t;\Theta_{NN}\right)caligraphic_N ( italic_s , italic_t ; roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) denotes the DNN model, ΘNNsubscriptΘ𝑁𝑁\Theta_{NN}roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT stands for the neural network parameters. The DNN is constructed with an input layer composed of spatiotemporal coordinates [s,t]𝑠𝑡[s,t][ italic_s , italic_t ], multiple hidden layers to approximate nonlinear functional relationships, and one output layer for the prediction of ψ^(s,t,ΘNN)^𝜓𝑠𝑡subscriptΘ𝑁𝑁\hat{\psi}(s,t,\Theta_{NN})over^ start_ARG italic_ψ end_ARG ( italic_s , italic_t , roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ). The RRE-based physics principles are further embedded into the DNN along with the sensor data constraint through a unique loss function defined as:

(ΘNN)=D+RREsubscriptΘ𝑁𝑁subscript𝐷subscript𝑅𝑅𝐸\mathcal{L}(\Theta_{NN})=\mathcal{L}_{D}+\mathcal{L}_{RRE}caligraphic_L ( roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) = caligraphic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_R italic_R italic_E end_POSTSUBSCRIPT (3)
Refer to caption
Figure 1: Illustration of the proposed P-DL framework for soil moisture prediction.

The total loss (ΘNN)subscriptΘ𝑁𝑁\mathcal{L}(\Theta_{NN})caligraphic_L ( roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) consists of the following two key components: 1) Data-driven loss Dsubscript𝐷\mathcal{L}_{D}caligraphic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT: The pressure head signals can be recorded at multiple locations on the horizontal plane (xy-plane) at different fixed depths using the soil moisture sensors. In this work, the pressure head is sampled for 30 discrete time instances at 5 equally spaced depths in the vertical direction. As such, each sensor fetches a time series signal of pressure heads at a specific location in the soil geometry, denoted as 𝝍m(s,t)subscript𝝍𝑚𝑠𝑡\bm{\psi}_{m}(s,t)bold_italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_s , italic_t ). The DNN needs to be trained to generate ψ^^𝜓\hat{\psi}over^ start_ARG italic_ψ end_ARG that match the sensor measurements closely. Hence, the data-driven loss Dsubscript𝐷\mathcal{L}_{D}caligraphic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, enforcing agreement between the measured and predicted pressure head signals, is defined as:

D=1Nmi=1Nm(ψm(si,ti)ψ^m(si,ti))2subscript𝐷1subscript𝑁𝑚superscriptsubscript𝑖1subscript𝑁𝑚superscriptsubscript𝜓𝑚subscript𝑠𝑖subscript𝑡𝑖subscript^𝜓𝑚subscript𝑠𝑖subscript𝑡𝑖2\mathcal{L}_{D}=\frac{1}{N_{m}}\sum_{i=1}^{N_{m}}(\psi_{m}(s_{i},t_{i})-\hat{% \psi}_{m}(s_{i},t_{i}))^{2}caligraphic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (4)

where Nmsubscript𝑁𝑚{N_{m}}italic_N start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT denotes the number of spatiotemporal measurements. (2) RRE-based loss RREsubscript𝑅𝑅𝐸\mathcal{L}_{RRE}caligraphic_L start_POSTSUBSCRIPT italic_R italic_R italic_E end_POSTSUBSCRIPT: To further enhance the estimation accuracy and the model robustness, physics-based regularization is imposed over the spatiotemporal collocation points, [si,ti]subscript𝑠𝑖subscript𝑡𝑖[s_{i},t_{i}][ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]’s, which are randomly chosen from the spatiotemporal domain of the hydraulic process in the targe soil field. The physics constraint enforces the hydraulic model presented in RRE (see Eq. (1)) by encouraging the physics-based residuals to be close to zero. Specifically, RRE-based residuals are defined as:

rψ(s,t,ΘNN):=θt(K(ψ^)(ψ^+z))\displaystyle r_{\psi}(s,t,\Theta_{NN})\mathrel{\mathop{\ordinarycolon}}=\frac% {\partial\theta}{\partial t}-\nabla\cdot(K(\hat{\psi})\nabla(\hat{\psi}+z))italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_s , italic_t , roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) : = divide start_ARG ∂ italic_θ end_ARG start_ARG ∂ italic_t end_ARG - ∇ ⋅ ( italic_K ( over^ start_ARG italic_ψ end_ARG ) ∇ ( over^ start_ARG italic_ψ end_ARG + italic_z ) ) (5)

It is worth noting that the gradient and partial derivative in Eq.(5) can be reformulated in terms of ψ𝜓\psiitalic_ψ and its derivatives:

rψ(s,t,ΘNN):=θψψt(Kψψxψx+K2ψx2+Kψψyψy+K2ψy2+Kψψz(ψz+1)+K2ψz2)\displaystyle r_{\psi}(s,t,\Theta_{NN})\mathrel{\mathop{\ordinarycolon}}=\frac% {\partial\theta}{\partial\psi}\frac{\partial\psi}{\partial t}-\left(\frac{% \partial K}{\partial\psi}\frac{\partial\psi}{\partial x}\frac{\partial\psi}{% \partial x}+K\frac{\partial^{2}\psi}{\partial x^{2}}+\frac{\partial K}{% \partial\psi}\frac{\partial\psi}{\partial y}\frac{\partial\psi}{\partial y}+K% \frac{\partial^{2}\psi}{\partial y^{2}}+\frac{\partial K}{\partial\psi}\frac{% \partial\psi}{\partial z}\left(\frac{\partial\psi}{\partial z}+1\right)+K\frac% {\partial^{2}\psi}{\partial z^{2}}\right)italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_s , italic_t , roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) : = divide start_ARG ∂ italic_θ end_ARG start_ARG ∂ italic_ψ end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_t end_ARG - ( divide start_ARG ∂ italic_K end_ARG start_ARG ∂ italic_ψ end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_x end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_x end_ARG + italic_K divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ italic_K end_ARG start_ARG ∂ italic_ψ end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_y end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_y end_ARG + italic_K divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ italic_K end_ARG start_ARG ∂ italic_ψ end_ARG divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_z end_ARG ( divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_z end_ARG + 1 ) + italic_K divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ end_ARG start_ARG ∂ italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (6)

The gradient θψ𝜃𝜓\frac{\partial\theta}{\partial\psi}divide start_ARG ∂ italic_θ end_ARG start_ARG ∂ italic_ψ end_ARG and Kψ𝐾𝜓\frac{\partial K}{\partial\psi}divide start_ARG ∂ italic_K end_ARG start_ARG ∂ italic_ψ end_ARG can be directly computed according to the explicit function relationships in Eq. 2. The first and second derivatives of ψ𝜓\psiitalic_ψ can be easily computed using automatic differentiation, which is a widely used technique in deep learning and is more efficient, accurate, and reliable than numerical differentiation used in FEM-based modeling [8]. The physics-based constraint will then be realized by encouraging the values of rψ(s,t;ΘNN)subscript𝑟𝜓𝑠𝑡subscriptΘ𝑁𝑁r_{\psi}(s,t;\Theta_{NN})italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_s , italic_t ; roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) to be close to zero. Thus, the RRE-based loss is defined as:

RRE=1Nfi=1Nf(rψ(si,ti;ΘNN))2subscript𝑅𝑅𝐸1subscript𝑁𝑓superscriptsubscript𝑖1subscript𝑁𝑓superscriptsubscript𝑟𝜓subscript𝑠𝑖subscript𝑡𝑖subscriptΘ𝑁𝑁2\mathcal{L}_{RRE}=\frac{1}{N_{f}}\sum_{i=1}^{N_{f}}(r_{\psi}(s_{i},t_{i};% \Theta_{NN}))^{2}caligraphic_L start_POSTSUBSCRIPT italic_R italic_R italic_E end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; roman_Θ start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (7)

where Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT represents the total number of selected spatiotemporal collocation points in the hydrological process to encode the RRE into the DNN. The physics-based loss is further embedded into the overall loss function in Eq. (3) to respect both the data-driven loss and the underlying soil moisture system physics for the reliable modeling of the spatiotemporal soil moisture dynamics.

2.3 Optimization Techniques

2.3.1 Mini-batch/batch Gradient Descent (GD)

Mini-batch gradient descent can be viewed as a variation of the gradient descent algorithm that involves stochasticity. Batch GD uses the entire training dataset to update the neural network parameters at each iteration. In contrast, mini-batch GD reduces the loss by replacing the actual gradient that is calculated from the whole data set with an estimated counterpart computed from randomly selected subsets of the data, which allows faster updates for one iteration. Mini-batch GD can help to escape from the saddle point by virtue of the introduced stochasticity. However, due to the inherent variance, the path taken by the algorithm to reach the minimum is usually noisier than the typical gradient descent and with a slow convergence asymptotically. The updating rule is given as follows:

Θt=Θt1η1Bi=1BΘ(si,ti,ψi;Θ)subscriptΘ𝑡subscriptΘ𝑡1𝜂1𝐵superscriptsubscript𝑖1𝐵subscriptΘsubscript𝑠𝑖subscript𝑡𝑖subscript𝜓𝑖Θ\vspace{-5pt}\Theta_{t}=\Theta_{t-1}-\eta\frac{1}{B}\sum_{i=1}^{B}\nabla_{% \Theta}\mathcal{L}(s_{i},t_{i},\psi_{i};\Theta)roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Θ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_η divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT caligraphic_L ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; roman_Θ ) (8)

where η𝜂\etaitalic_η is the learning rate, {si,ti;ψi}subscript𝑠𝑖subscript𝑡𝑖subscript𝜓𝑖\{s_{i},t_{i};\psi_{i}\}{ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is one spatiotemporal coordinate and pressure head sensor measurement pair. For classic batch GD, B𝐵Bitalic_B stands for the total number of the training datasets, i.e., B=|ψm|𝐵subscript𝜓𝑚B=|\psi_{m}|italic_B = | italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT |; whereas for mini-batch stochastic GD, B𝐵Bitalic_B denotes the number of samples in a subset of the training datasets, i.e., B>1𝐵1B>1italic_B > 1 but B<|ψm|𝐵subscript𝜓𝑚B<|\psi_{m}|italic_B < | italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT |.

2.3.2 Root Mean Square Propagation (RMSProp)

One of the limitations of traditional gradient descent is engaging the same step size for each neural network parameter throughout the training process. To overcome this shortage, RMSProp is proposed as an extension to the gradient descent that allows the step size in each dimension to be automatically adapted based on a decaying moving average of partial gradients. The neural network is updated by:

E[g2]t=βE[g2]t1+(1β)(δ/δΘ)2,Θt=Θt1(η/E[g2]t)δ/δΘformulae-sequence𝐸subscriptdelimited-[]superscript𝑔2𝑡𝛽𝐸subscriptdelimited-[]superscript𝑔2𝑡11𝛽superscript𝛿𝛿Θ2subscriptΘ𝑡subscriptΘ𝑡1𝜂𝐸subscriptdelimited-[]superscript𝑔2𝑡𝛿𝛿Θ\displaystyle E[g^{2}]_{t}=\beta E[g^{2}]_{t-1}+(1-\beta)\left({\delta\mathcal% {L}}/{\delta\Theta}\right)^{2},\quad\Theta_{t}=\Theta_{t-1}-({\eta}/{\sqrt{E% \left[g^{2}\right]_{t}}}){\delta\mathcal{L}}/{\delta\Theta}italic_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β italic_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β ) ( italic_δ caligraphic_L / italic_δ roman_Θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Θ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - ( italic_η / square-root start_ARG italic_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) italic_δ caligraphic_L / italic_δ roman_Θ (9)

where E[g2]t𝐸subscriptdelimited-[]superscript𝑔2𝑡E[g^{2}]_{t}italic_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the moving average at step t𝑡titalic_t that depends on previous average E[g2]t1𝐸subscriptdelimited-[]superscript𝑔2𝑡1E[g^{2}]_{t-1}italic_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT and current gradient δδΘ𝛿𝛿Θ\frac{\delta\mathcal{L}}{\delta\Theta}divide start_ARG italic_δ caligraphic_L end_ARG start_ARG italic_δ roman_Θ end_ARG. β𝛽\betaitalic_β is a weight of the past time step to the current update moving average, whose default setting is 0.9. Then the adaptive step size is calculated by dividing the step size η𝜂\etaitalic_η with an exponentially decaying average of squared gradients. The use of a decaying moving average allows the algorithm to discard early gradients and focus on the most recently observed partial gradients information during the progress of the search.

2.3.3 Adaptive Moment Estimation (Adam)

Adam is another method that computes adaptive learning rates for each parameter in the neural network [9]. In addition to the automatic learning rate adaption for each input variable by storing an exponentially decaying average of past squared gradients like RMSProp, Adam further smooths the search process by introducing an exponentially decaying moving average of past gradients mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which shares a similar idea as momentum that can help accelerate mini-batch GD in the relevant direction and dampens oscillations by also counting a fraction of the past time step gradients to the current update. The decaying average of the gradient history mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and its squared version vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at step t𝑡titalic_tare computed as:

mt=β1mt1+(1β1)gt,vt=β2vt1+(1β2)gt2formulae-sequencesubscript𝑚𝑡subscript𝛽1subscript𝑚𝑡11subscript𝛽1subscript𝑔𝑡subscript𝑣𝑡subscript𝛽2subscript𝑣𝑡11subscript𝛽2superscriptsubscript𝑔𝑡2\displaystyle m_{t}=\beta_{1}m_{t-1}+(1-\beta_{1})g_{t},\quad v_{t}=\beta_{2}v% _{t-1}+(1-\beta_{2})g_{t}^{2}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (10)

where β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the hyperparameters whose value is usually chosen as 0.9 for β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 0.999 for β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. [9] stated that mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are biased towards zero during the first few steps, especially when the decay rates β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are small. They figured to cancel out these biases by computing bias-corrected versions of mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

m^t=mt1β1t,v^t=vt1β2tformulae-sequencesubscript^𝑚𝑡subscript𝑚𝑡1subscriptsuperscript𝛽𝑡1subscript^𝑣𝑡subscript𝑣𝑡1subscriptsuperscript𝛽𝑡2\displaystyle\hat{m}_{t}=\frac{m_{t}}{1-\beta^{t}_{1}},\quad\hat{v}_{t}=\frac{% v_{t}}{1-\beta^{t}_{2}}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG (11)

Then the neural network updating rule is formulated as:

Θt+1subscriptΘ𝑡1\displaystyle\Theta_{t+1}roman_Θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Θtηv^t+ϵm^tabsentsubscriptΘ𝑡𝜂subscript^𝑣𝑡italic-ϵsubscript^𝑚𝑡\displaystyle=\Theta_{t}-\frac{\eta}{\sqrt{\hat{v}_{t}}+\epsilon}\hat{m}_{t}= roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_η end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_ϵ end_ARG over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (12)

3 Experimental Results

We evaluate the performance of different gradient descent methods, i.e., mini-batch/batch GD, RMSProp, and Adam, to the proposed P-DL framework in a 3D soil moisture system to estimate the pressure head from sparse sensor measurement. The neural network is carried on TensorFlow-GPU with Python application programming interface (API). Note that the CPU used for the computation is Intel(R) Xeon(R) W-2265 CPU @ 3.50GHz. The GPU is NVIDIA RTX A4500. The soil geometry is a cubic shape formed by 20 nodes in x𝑥xitalic_x and y𝑦yitalic_y direction, and 10 nodes in z𝑧zitalic_z direction. The geometry information and the benchmark system dynamics are obtained from [7]. Note that the sensor measurement noise is inevitable in real-world practice. Thus, in this work, we add a noise level σϵ=0.005subscript𝜎italic-ϵ0.005\sigma_{\epsilon}=0.005italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = 0.005 to the simulation data. In other words, sensor observation is denoted as ψm(s,t)=ψ(s,t)+ϵ(s,t)subscript𝜓𝑚𝑠𝑡𝜓𝑠𝑡italic-ϵ𝑠𝑡\psi_{m}(s,t)=\psi(s,t)+\epsilon(s,t)italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_s , italic_t ) = italic_ψ ( italic_s , italic_t ) + italic_ϵ ( italic_s , italic_t ), where the noise follows a Gaussian distribution as ϵ(s,t)𝒩(0,σϵ2)similar-toitalic-ϵ𝑠𝑡𝒩0superscriptsubscript𝜎italic-ϵ2\epsilon(s,t)\sim\mathcal{N}(0,\sigma_{\epsilon}^{2})italic_ϵ ( italic_s , italic_t ) ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). The soil moisture pressure head is measured at 15 randomly selected locations on a horizontal xy𝑥𝑦xyitalic_x italic_y plane, and for each location, sensors are evenly distributed at 5 different depths. Thus, 75 time series data of pressure heads are collected and served as sensor observation, i.e., ψmsubscript𝜓𝑚\psi_{m}italic_ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, in the loss function \mathcal{L}caligraphic_L to train the P-DL. The temporal domain for 0.9 hours is uniformly discretized in 30 temporal instances. The modeling result will be quantified on the whole spatiotemporal domain of the soil using Relative Error (re𝑟𝑒reitalic_r italic_e), which is defined as:

reψ=ψ^(s,t)ψ(s,t)2ψ(s,t)2reθ=θ^(s,t)θ(s,t)2θ(s,t)2formulae-sequence𝑟subscript𝑒𝜓subscriptnorm^𝜓𝑠𝑡𝜓𝑠𝑡2subscriptnorm𝜓𝑠𝑡2𝑟subscript𝑒𝜃subscriptnorm^𝜃𝑠𝑡𝜃𝑠𝑡2subscriptnorm𝜃𝑠𝑡2\displaystyle re_{\psi}=\frac{\|\hat{\psi}(s,t)-\psi(s,t)\|_{2}}{\|\psi(s,t)\|% _{2}}\qquad re_{\theta}=\frac{\|\hat{\theta}(s,t)-\theta(s,t)\|_{2}}{\|\theta(% s,t)\|_{2}}italic_r italic_e start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT = divide start_ARG ∥ over^ start_ARG italic_ψ end_ARG ( italic_s , italic_t ) - italic_ψ ( italic_s , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_ψ ( italic_s , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_r italic_e start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = divide start_ARG ∥ over^ start_ARG italic_θ end_ARG ( italic_s , italic_t ) - italic_θ ( italic_s , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_θ ( italic_s , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG (13)

where ψ^,θ^^𝜓^𝜃\hat{\psi},\hat{\theta}over^ start_ARG italic_ψ end_ARG , over^ start_ARG italic_θ end_ARG and ψ,θ𝜓𝜃\psi,\thetaitalic_ψ , italic_θ denote the predicted and true pressure head dynamics and water content, respectively. In the present investigation, a feedforward fully-connected DNN is constructed by incorporating the physics law in the P-DL approach to model the soil moisture dynamics. The DNN consists of five hidden layers with ten neurons in each layer. We randomly choose collocation points Nf=10,000subscript𝑁𝑓10000N_{f}=10,000italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 10 , 000 among 120000 spatiotemporal instances from the soil moisture dynamic domain to encode RRE into the DNN. The neural network is carried on TensorFlow-GPU with Python application programming interface (API), and we found that the computation speed is around 6.5 times faster than running on CPU only. Note that the CPU used for the computation is Intel(R) Xeon(R) W-2223CPU @ 3.6GHz. The GPU is NVIDIA Quadro P2200

3.1 Convergence and performance observations

In this subsection, we compare the convergence performance for three optimization methods including GD, RMSProp, and Adam with mini-batch and full batch in optimizing the proposed P-DL model. The batch size for mini-batch training is 128. Fig. 2 (a) and (b) demonstrates the training convergence of three optimization techniques using mini-batch and full batch, respectively. When the mini-batch is engaged in P-DL training (Fig. 2 (a)), GD presents the slowest convergence rate. RMSProp demonstrates the fastest loss drop** rate at the beginning and illustrates a similar convergence speed as Adam after 5000 iterations. Adam demonstrates the best convergence performance, whose loss is dropped to 0.0026 when the training ends. When the P-DL model is trained with full batch shown in Fig. 2 (b), for the first 1000 epochs, GD demonstrates the smallest convergence rate and RMSProp provides the fastest loss-reducing speed among three optimization strategies. As the neural network training continues, RMSProp and Adam reach the local minimum and manage to escape from it successively. After 5000 epochs, the losses of the P-DL model optimized by RMSProp and Adam are reduced to 0.0211 and 0.0010, respectively. In contrast, for full batch training, the GD is stuck at a local minimum and fails to further optimize the neural network. Table 1 reveals the relative error (re𝑟𝑒reitalic_r italic_e) of pressure head ψ𝜓\psiitalic_ψ and volumetric content θ𝜃\thetaitalic_θ generated from the P-DL model optimized by GD, RMSProp, and Adam using mini-batch and full batch. As shown in Fig. 2(a-b), full-batch training can lead to convergence with a smaller number of times for backpropagation, i.e., iterations compared with mini-batch. Mini-batch training requires about 10000 iterations for RMSProp and Adam to converge, and 32000 iterations for GD to converge. In contrast, Adam and RMSProp only take less than 3000 iterations to converge for full-batch training. If the Adam optimizer is engaged, finishing one backpropagation using mini-batch requires approximate 94 milliseconds (ms), whereas the full batch training needs about 124 ms to update for 1 iteration. Thus, a mini-batch with batch size 128 requires 16 min to reach convergence, whereas the full batch only demands about 6.4 min to converge. Furthermore, the predictive dynamics generated from the full batch also outperform that from the mini-batch if Adam is engaged to optimize the P-DL, as shown in Table 1. Therefore, in the paper, Adam optimizer with full batch training is recommended for optimizing the soil moisture P-DL model.

Refer to caption
Figure 2: The comparison of training loss \mathcal{L}caligraphic_L of P-DL model using mini-batch (a) and full batch (b) with different optimization methods. (c) Hydraulic conductivity function (HCF), (d) Water retention curves (WRC) generated from the P-DL model and van Genuchten model.
Table 1: Comparison of re𝑟𝑒reitalic_r italic_e of ψ𝜓\psiitalic_ψ and θ𝜃\thetaitalic_θ using different optimization algorithms with mini-batch and full batch.

GD

RMSProp

Adam

ψ𝜓\psiitalic_ψ

Mini-batch

0.0649

0.0106

0.0079

Full batch

0.4251

0.0324

0.0049

θ𝜃\thetaitalic_θ

Mini-batch

0.0140

0.0029

0.0022

Full batch

0.1319

0.0090

0.0009

3.2 P-DL performance using full batch training

In the current investigation, we compare the soil moisture dynamics predicted by the P-DL model optimized with GD, RMSProp, and Adam only using a full batch. The prediction of volumetric water content θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG and the hydraulic conductivity K^^𝐾\hat{K}over^ start_ARG italic_K end_ARG are inferred by plugging the neural network prediction, i.e., the pressure head ψ𝜓\psiitalic_ψ, into the van Genuchten model detailed in Eq. (2). Fig. 2(c,d) shows the constitutive relationships, i.e., HCF and WRC, estimated by the P-DL model using different optimization methods. The P-DL model optimized by GD generates a tiny range prediction of ψ^^𝜓\hat{\psi}over^ start_ARG italic_ψ end_ARG, which makes both GD-based estimations of WRC and HCF localized in small magnitude and results in an unsuccessful prediction. Among these three optimization strategies, Adam generates the most similar WRC and HCF curve pattern with the ground truth, which indicates the robust estimation of pressure head ψ^^𝜓\hat{\psi}over^ start_ARG italic_ψ end_ARG. Fig. 3 (a) and (b) show the ground truth water content distribution and its predictive counterpart generated from the P-DL model with full-batch Adam optimizer in the soil geometry, respectively. Because the soil moisture dynamic is changing over time, Fig. 3 only depicts the map**s at one specific time step, i.e., t=15𝑡15t=15italic_t = 15. The estimated water potential pattern is remarkably similar to ground truth map**, indicating that, the P-DL model, collaborated with proper optimization strategy, is able to effectively predict the spatiotemporal soil moisture dynamics by harnessing the physics-based principles and sensor observation. Fig. 3 (c) displays the map** of the absolute difference between the prediction and ground truth, which keeps a low value with the magnitude of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

Refer to caption
Figure 3: (a)The benchmark water content distribution θ𝜃\thetaitalic_θ, (b) the predicted θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG obtained from P-DL trained by full-batch Adam optimizer, (c) the discrepancy map** between the ground truth and the prediction.

4 Conclusions

We propose a physics-constrained deep learning (P-DL) framework to solve the Richardson-Richards equation, reconstruct the soil moisture dynamics, and recover the WRCs and HCFs. A feed-forward neural network is engaged to simulate the non-linear relationship between the spatiotemporal instances and the pressure head given the sensor measurements. The DNN is trained not only to satisfactorily match the sensor measurements but also to respect the Richardson-Richards equation with the empirical correlations. Since successful and efficient optimization is of vital importance to obtain the optimal solution to the Richardson-Richards equation, this project aims to investigate different optimizing algorithms for training the neural network. Three different optimizers, Adam, RMSprop, and GD, are engaged to minimize the loss function of P-DL. We compare their performance in minimizing the loss function of the P-DL model. The experimental result shows that the predictive model optimized with Adam using full batch demonstrates the best performance compared to other optimization strategies.

References

  • [1] L. A. Richards, “Capillary conduction of liquids through porous mediums,” Physics, vol. 1, no. 5, pp. 318–333, 1931.
  • [2] E. Buckingham, “Studies on the movement of soil moisture,” 1907.
  • [3] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019.
  • [4] J. Xie and B. Yao, “Physics-constrained deep learning for robust inverse ecg modeling,” IEEE Transactions on Automation Science and Engineering, 2022.
  • [5] T. Bandai and T. A. Ghezzehei, “Physics-informed neural networks with monotonicity constraints for richardson-richards equation: Estimation of constitutive relationships and soil water flux density from volumetric water content measurements,” Water Resources Research, vol. 57, no. 2, p. e2020WR027642, 2021.
  • [6] M. T. Van Genuchten, “A closed-form equation for predicting the hydraulic conductivity of unsaturated soils,” Soil science society of America journal, vol. 44, no. 5, pp. 892–898, 1980.
  • [7] K.-A. Lie, An introduction to reservoir simulation using MATLAB/GNU Octave: User guide for the MATLAB Reservoir Simulation Toolbox (MRST).   Cambridge University Press, 2019.
  • [8] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  • [9] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.