Gaussian Process Regression with Soft Inequality and Monotonicity Constraints

Didem Kochan
Department of Industrial and Systems Engineering
Lehigh University
Bethlehem, PA 18015
[email protected]
&Xiu Yang
Department of Industrial and Systems Engineering
Lehigh University
Bethlehem, PA 18015
[email protected]
Abstract

Gaussian process (GP) regression is a non-parametric, Bayesian framework to approximate complex models. Standard GP regression can lead to an unbounded model in which some points can take infeasible values. We introduce a new GP method that enforces the physical constraints in a probabilistic manner. This GP model is trained by the quantum-inspired Hamiltonian Monte Carlo (QHMC). QHMC is an efficient way to sample from a broad class of distributions. Unlike the standard Hamiltonian Monte Carlo algorithm in which a particle has a fixed mass, QHMC allows a particle to have a random mass matrix with a probability distribution. Introducing the QHMC method to the inequality and monotonicity constrained GP regression in the probabilistic sense, our approach improves the accuracy and reduces the variance in the resulting GP model. According to our experiments on several datasets, the proposed approach serves as an efficient method as it accelerates the sampling process while maintaining the accuracy, and it is applicable to high dimensional problems.

Keywords Gaussian process  \cdot Quantum-inspired Monte Carlo  \cdot Soft constraints  \cdot Inequality constraints

1 Introduction

In many real-world applications, measuring complex systems or evaluating computational models can be time-consuming, costly or computationally intensive. Gaussian process regression (GPR) is one of the Bayesian techniques that addresses this problem by building a surrogate model. It is a supervised machine learning framework that has been widely used in regression and classification tasks. A GP can be interpreted as a suitable probability distribution on a set of functions, which can be conditioned on observations using Bayes’ rule [13]. A GP regression model can make predictions incorporating prior information (kernels) and generate uncertainty measures over predictions [22]. However, prior knowledge often includes physical laws, and using the standard GP regression framework may lead to an unbounded model in which some points can take infeasible values that violate physical laws [13]. For example, non-negativity is a requirement for various physical properties such as temperature, density and viscosity [20]. Therefore, incorporating physical information in GP framework can regularize the behaviour of the model and provide more realistic uncertainties [26].

A significant amount of research has been conducted to incorporate physical information in GP framework, resulting in various techniques and methodologies [26]. For example, a probit model for the likelihood of derivative information can be employed to enforce monotonicity constraints [23]. Although this approach can also be used to enforce convexity in one dimension, an additional requirement on Hessian is incorporated for higher dimensions [5]. In [16] an additive GP approach is introduced to account for monotonicity constraints. Although posterior sampling step can be challenging, the additive GP framework enables to satisfy the constraints everywhere in the input space, and it is scalable to higher dimensions.

Enforcing inequality constraints into a GP, on the other hand, is more challenging, as the conditional process, subject to these constraints, does not retain the properties of a GP [18]. One of the approaches to handle this problem is a data augmentation approach in which the inequality constraints are enforced at various locations and approximate samples are drawn from the predictive distribution [1], or using a block covariance kernel [21]. Implicitly constrained GP regression method proposed in [24] shows that the mean prediction of a GP implicitly satisfies linear constraints, if the constraints are satisfied by the training data. A similar approach shows that when we impose linear inequality constraints on a finite set of points in the domain, the resulting process is a compound Gaussian Process with a truncated Gaussian mean [2].

Most of the approaches assume that the inequalities are satisfied on a finite set of input locations. Based on that assumption, the methods approximate the posterior distribution given those constraint input points. The approach introduced in [5] is an example of these methods, where maximum likelihood estimation of GP hyperparameters are investigated under the constraint assumptions. In practice, this should also limit the number of constraint points needed for an effective discrete-location approximation. In addition, the method is not efficient on high-dimensional datasets as it takes a large amount of time to train the GP model.

The first Gaussian method that satisfies certain inequalities at all the input space is proposed by Maatouk and Bay [18]. The GP approximation of the samples are performed in the finite-dimensional space functions, and a rejection sampling method is used for approximating the posterior. The convergence properties of the method is investigated in [19]. Although using the rejection sampling to obtain posterior helps convergence, it might be computationally expensive. Similar to the previous approaches in which a set of inputs satisfy the constraints, this method also suffers from the curse of dimensionality. Later, the truncated Gaussian approach [17] extends the framework in [18] to general sets of linear inequalities. Building upon the approaches in [18] and [19], the work presented in [17] introduces a finite-dimensional approach that incorporates inequalities for both data interpolation and covariance parameter estimation. In this work, the posterior distribution is expressed as a truncated multinormal distribution. The method uses different Markov Chain Monte Carlo (MCMC) methods and exact sampling methods to obtain the posterior distribution. Among the various MCMC sampling techniques including Gibbs, Metropolis-Hastings (MH) and Hamiltonian Monte Carlo (HMC), the results indicate that HMC sampling is the most efficient one. The truncated Gaussian approaches offer several advantages, including the ability to achieve high accuracy and the flexibility in satisfying multiple inequality conditions. However, although those types of methods address the limitations in [18], they might be time consuming particularly in applications with large datasets or high-dimensional spaces.

In this work, we use QHMC algorithm to train the GP model, and enforce the inequality and monotonicity constraints in a probabilistic manner. Our work addresses the computational limitations caused by high dimensions or large datasets. Unlike truncated Gaussian methods in [17] for inequality constraints, or additive GP [16] with monotonicity constraints, the proposed method can maintain its efficiency on higher dimensions. Further, we adopt an adaptive learning algorithm that selects the constraint locations. The efficiency and accuracy of the QHMC algorithms are demonstrated on inequality and monotonicity constrained problems. Inequality constrained examples include lower and higher dimensional synthetic problems, a conservative tracer distribution from sparse tracer concentration measurements and a three-dimensional heat transfer problem, while monotonicity constrained examples provide lower and higher dimensional synthetic problems. Our contributions can be summarized in three key points: (i) QHMC reduces difference between posterior mean and the ground truth, (ii) utilizing QHMC in a probabilistic sense decreases variance and uncertainty, and (iii) the proposed algorithm is a robust, efficient and flexible method applicable to a wide range of problems. We implemented QHMC sampling in the truncated Gaussian approach to enhance accuracy and efficiency while working with the QHMC algorithm.

2 Gaussian Process under inequality constraints

2.1 Standard GP regression framework

Suppose we have a target function represented by values 𝐲=(y(1),y(2),,y(T))N𝐲superscriptsuperscript𝑦1superscript𝑦2superscript𝑦𝑇𝑁\mathbf{y}=(y^{(1)},y^{(2)},...,y^{(T)})^{N}bold_y = ( italic_y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where y(i)superscript𝑦𝑖y^{(i)}\in\mathbb{R}italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_R are observations at locations 𝐗={x(i)}i=1N𝐗superscriptsubscriptsuperscript𝑥𝑖𝑖1𝑁\mathbf{X}=\{x^{(i)}\}_{i=1}^{N}bold_X = { italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. Here, x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT represents d𝑑ditalic_d-dimensional vectors in the domain 𝒟d𝒟superscript𝑑\mathcal{D}\in\mathbb{R}^{d}caligraphic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Using the framework provided in [12], we approximate the target function by a GP, denoted as Y(.,.):D×ΩRY(.,.):D\times\Omega\rightarrow\mathrm{R}italic_Y ( . , . ) : italic_D × roman_Ω → roman_R. We can express Y𝑌Yitalic_Y as

Y(x):=GP[μ(x),K(x,x)],assign𝑌𝑥𝐺𝑃𝜇𝑥𝐾𝑥superscript𝑥\displaystyle Y(x):=GP[\mu(x),K(x,x^{\prime})],italic_Y ( italic_x ) := italic_G italic_P [ italic_μ ( italic_x ) , italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ,

where μ(.)\mu(.)italic_μ ( . ) is the mean function and K(x,x)𝐾𝑥superscript𝑥K(x,x^{\prime})italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is the covariance function defined as

μ(x)=𝔼[Y(x)],andK(x,x)=𝔼[Y(x)μ(x)][Y(x)μ(x)]formulae-sequence𝜇𝑥𝔼delimited-[]𝑌𝑥and𝐾𝑥superscript𝑥𝔼delimited-[]𝑌𝑥𝜇𝑥delimited-[]𝑌superscript𝑥𝜇superscript𝑥\mu(x)=\mathbb{E}[Y(x)],\quad\text{and}\quad K(x,x^{\prime})=\mathbb{E}[Y(x)-% \mu(x)][Y(x^{\prime})-\mu(x^{\prime})]italic_μ ( italic_x ) = blackboard_E [ italic_Y ( italic_x ) ] , and italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = blackboard_E [ italic_Y ( italic_x ) - italic_μ ( italic_x ) ] [ italic_Y ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ]

Typically, the standard squared exponential covariance kernel can be used as a kernel function:

K(x,x)=σ2exp(xx222l2)+σn2δx,x,𝐾𝑥superscript𝑥superscript𝜎2superscriptsubscriptnorm𝑥superscript𝑥222superscript𝑙2subscriptsuperscript𝜎2𝑛subscript𝛿𝑥superscript𝑥K(x,x^{\prime})=\sigma^{2}\exp\left(-\frac{||x-x^{\prime}||_{2}^{2}}{2l^{2}}% \right)+\sigma^{2}_{n}\delta_{x,x^{\prime}},italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG | | italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the signal variance, δx,xsubscript𝛿𝑥superscript𝑥\delta_{x,x^{\prime}}italic_δ start_POSTSUBSCRIPT italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the Kronecker delta function and l𝑙litalic_l is the length-scale. We then assume that the observation includes an additive independent identically distributed (i.i.d.) Gaussian noise term ϵitalic-ϵ\epsilonitalic_ϵ and having zero mean and variance σn2subscriptsuperscript𝜎2𝑛\sigma^{2}_{n}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We denote the hyperparameters by θ=(σ,l,σn)𝜃𝜎𝑙subscript𝜎𝑛\mathbf{\theta}=(\sigma,l,\sigma_{n})italic_θ = ( italic_σ , italic_l , italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), and estimate them using the training data. The parameters can be estimated by minimizing the negative marginal log-likelihood [12, 25, 29]:

log[p(𝐘|𝐗,θ)]=12[(𝐲μ)TK1(𝐲μ)+log|K|+Nlog(2π)].𝑝conditional𝐘𝐗𝜃12delimited-[]superscript𝐲𝜇Tsuperscript𝐾1𝐲𝜇𝐾𝑁2𝜋\displaystyle-\log[p(\mathbf{Y}|\mathbf{X},\theta)]=\frac{1}{2}[(\mathbf{y}-% \mathbf{\mu})^{\text{T}}K^{-1}(\mathbf{y}-\mathbf{\mu})+\log|K|+N\log(2\pi)].- roman_log [ italic_p ( bold_Y | bold_X , italic_θ ) ] = divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ ( bold_y - italic_μ ) start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_y - italic_μ ) + roman_log | italic_K | + italic_N roman_log ( 2 italic_π ) ] . (1)

In the following section, we will show how the parameter updates are performed using the QHMC method.

2.2 Quantum-inspired Hamiltonian Monte Carlo

QHMC is an enhanced version of the Hamiltonian Monte Carlo (HMC) algorithm that incorporates a random mass matrix for the particles, following a probability distribution. In conventional HMC, the position is represented by the original variables (x𝑥xitalic_x), while Gaussian momentum is represented by auxiliary variables (q𝑞qitalic_q). Utilizing the energy-time uncertainty relation of quantum mechanics, QHMC allows a particle to have a random mass matrix with a probability distribution. Consequently, in addition to the position and momentum variables, a mass variable (m𝑚mitalic_m) is introduced within the QHMC framework. Having a third variable offers the advantage of exploring various landscapes in the state-space. As a result, unlike standard HMC, QHMC can perform well on discontinuous, non-smooth and spiky distributions [4, 15].

The quantum nature of QHMC can be understood by considering a one-dimensional harmonic oscillator example provided in [15]. Let us consider a ball with a fixed mass m𝑚mitalic_m attached to a spring at the origin. Assuming x𝑥xitalic_x is the displacement, the magnitude of the restoring force that pulls back the ball to the origin is F=kx𝐹𝑘𝑥F=-kxitalic_F = - italic_k italic_x, and the ball oscillates around the origin with period T=2πmk𝑇2𝜋𝑚𝑘T=2\pi\sqrt{\frac{m}{k}}italic_T = 2 italic_π square-root start_ARG divide start_ARG italic_m end_ARG start_ARG italic_k end_ARG end_ARG. In contrast to standard HMC where the mass m𝑚mitalic_m is fixed at 1, QHMC incorporates a time-varying mass, allowing the ball to experience acceleration and explore various distribution landscapes. That is, QHMC has the capability to employ a short time period T𝑇Titalic_T, corresponding to a small mass m𝑚mitalic_m, to efficiently explore broad but flat regions. Conversely, in spiky regions, it can switch to a larger time period T𝑇Titalic_T, i.e. larger m𝑚mitalic_m, to ensure thorough exploration of all corners of the landscape [15].

The implementation of QHMC is straightforward: we construct a stochastic process M(t)𝑀𝑡M(t)italic_M ( italic_t ) for the mass, and at each time t𝑡titalic_t, we sample M(t)𝑀𝑡M(t)italic_M ( italic_t ) from a distribution PM(M)subscript𝑃𝑀𝑀P_{M}(M)italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_M ). Resampling the positive-definite mass matrix is the only additional step to the standard HMC procedure. In practice, assuming that PM(M)subscript𝑃𝑀𝑀P_{M}(M)italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_M ) is independent of x𝑥xitalic_x and q𝑞qitalic_q, a mass density function PM(M)subscript𝑃𝑀𝑀P_{M}(M)italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_M ) with mean μmsubscript𝜇𝑚\mu_{m}italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and variance σm2subscriptsuperscript𝜎2𝑚\sigma^{2}_{m}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT can be logm𝒩(μm,σm2),M=mI,formulae-sequencesimilar-to𝑚𝒩subscript𝜇𝑚subscriptsuperscript𝜎2𝑚𝑀𝑚𝐼\log m\sim\mathcal{N}(\mu_{m},\sigma^{2}_{m}),~{}M=mI,roman_log italic_m ∼ caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , italic_M = italic_m italic_I , where I𝐼Iitalic_I is the identity matrix. QHMC framework simulates the following dynamical system:

d(xq)=dt(M(t)1qU(x)).𝑑matrix𝑥𝑞𝑑𝑡matrix𝑀superscript𝑡1𝑞𝑈𝑥\displaystyle d\begin{pmatrix}x\\ q\end{pmatrix}=dt\begin{pmatrix}M(t)^{-1}q\\ -\nabla U(x)\end{pmatrix}.italic_d ( start_ARG start_ROW start_CELL italic_x end_CELL end_ROW start_ROW start_CELL italic_q end_CELL end_ROW end_ARG ) = italic_d italic_t ( start_ARG start_ROW start_CELL italic_M ( italic_t ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_q end_CELL end_ROW start_ROW start_CELL - ∇ italic_U ( italic_x ) end_CELL end_ROW end_ARG ) .

In this setting, the potential energy function of the QHMC system is U(x)=log[p(𝐘|𝐗,θ)]𝑈𝑥𝑝conditional𝐘𝐗𝜃U(x)=-\log[p(\mathbf{Y}|\mathbf{X},\theta)]italic_U ( italic_x ) = - roman_log [ italic_p ( bold_Y | bold_X , italic_θ ) ], i.e., the negative of marginal log-likelihood. We summarize the algorithm in Algorithm 1, and, here, we consider the location variables {x(i)}i=1Nsuperscriptsubscriptsuperscript𝑥𝑖𝑖1𝑁\{x^{(i)}\}_{i=1}^{N}{ italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT in GP model as the position variables x𝑥xitalic_x in Algorithm 1. The method evolves the QHMC dynamics to update the locations x𝑥xitalic_x. In this work, we implement the QHMC method for inequality constrained GP regression in a probabilistic manner.

2.3 Proposed method

Instead of enforcing all constraints strictly, the approach introduced in [20] minimizes the negative marginal log-likelihood function in Equation 1 while allowing constraint violations with a small probability. For example, for non-negativity constraints, the following requirement is imposed to the problem:

P[(𝐘(x)|x,θ)<0]η,for allx𝒟,formulae-sequence𝑃delimited-[]conditional𝐘𝑥𝑥𝜃0𝜂for all𝑥𝒟\displaystyle P[(\mathbf{Y}(x)|x,\theta)<0]\leq\eta,\quad\text{for all}\quad x% \in\mathcal{D},italic_P [ ( bold_Y ( italic_x ) | italic_x , italic_θ ) < 0 ] ≤ italic_η , for all italic_x ∈ caligraphic_D ,

where 0<η<<10𝜂much-less-than10<\eta<<10 < italic_η < < 1.

In contrast to enforcing the constraint via truncated Gaussian assumption [18] or performing inference based on the Laplace approximation and expectation propagation [11], the proposed method preserves the Gaussian posterior of the standard GP regression. The method uses a slight modification of the existing cost function. Given a model that follows a Gaussian distribution, we can re-express the constraint by the posterior mean and posterior standard deviation:

y(x)+ϕ1(η)s(x)0,for allx𝒟,formulae-sequencesuperscript𝑦𝑥superscriptitalic-ϕ1𝜂𝑠𝑥0for all𝑥𝒟\displaystyle y^{*}(x)+\phi^{-1}(\eta)s(x)\geq 0,\quad\text{for all}\quad x\in% \mathcal{D},italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) + italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_η ) italic_s ( italic_x ) ≥ 0 , for all italic_x ∈ caligraphic_D , (2)

where y(x)superscript𝑦𝑥y^{*}(x)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) stands for the posterior mean, s𝑠sitalic_s is the standard deviation and ϕitalic-ϕ\phiitalic_ϕ is the cumulative distribution function of a Gaussian random variable [20]. A practical choice for η𝜂\etaitalic_η might be η=2.2%𝜂percent2.2\eta=2.2\%italic_η = 2.2 %, resulting in ϕ1(η)=2superscriptitalic-ϕ1𝜂2\phi^{-1}(\eta)=-2italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_η ) = - 2. Then, we can formulate the optimization problem as

argminθsubscriptargmin𝜃\displaystyle\text{argmin}_{\theta}\quadargmin start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT log[p(𝐘|𝐗,θ)]such that𝑝conditional𝐘𝐗𝜃such that\displaystyle-\log[p(\mathbf{Y}|\mathbf{X},\theta)]\quad\text{such that}- roman_log [ italic_p ( bold_Y | bold_X , italic_θ ) ] such that (3)
y(x)2s(x)0.superscript𝑦𝑥2𝑠𝑥0\displaystyle y^{*}(x)-2s(x)\geq 0.italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) - 2 italic_s ( italic_x ) ≥ 0 . (4)

In this particular form of the optimization problem, we encounter a functional constraint described by 4. It can be prohibitive or impossible to satisfy this constraint at all points across the entire domain. Therefore, we adopt a strategy where we enforce Equation 4 only on a selected set of m𝑚mitalic_m constraint points denoted as 𝐗c=xc(i)i=1msubscript𝐗𝑐superscriptsubscriptsuperscriptsubscript𝑥𝑐𝑖𝑖1𝑚\mathbf{X}_{c}={x_{c}^{(i)}}_{i=1}^{m}bold_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. We reformulate the optimization problem as

argminθsubscriptargmin𝜃\displaystyle\text{argmin}_{\theta}\quadargmin start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT log[p(𝐘|𝐗,θ)]such that𝑝conditional𝐘𝐗𝜃such that\displaystyle-\log[p(\mathbf{Y}|\mathbf{X},\theta)]\quad\text{such that}- roman_log [ italic_p ( bold_Y | bold_X , italic_θ ) ] such that (5)
y(xc(i))2s(xc(i))0for alli=1,2,,m,formulae-sequencesuperscript𝑦superscriptsubscript𝑥𝑐𝑖2𝑠superscriptsubscript𝑥𝑐𝑖0for all𝑖12𝑚\displaystyle y^{*}(x_{c}^{(i)})-2s(x_{c}^{(i)})\geq 0\quad\text{for all}\quad i% =1,2,...,m,italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - 2 italic_s ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ≥ 0 for all italic_i = 1 , 2 , … , italic_m , (6)

where we estimate hyperparameters to enforce bounds. Solving this optimization problem can be very challenging, and hence, in [20] additional regularization terms are added. Rather than directly solving the optimization problem, we adopt the soft-QHMC method, which introduces inequality constraints with a high probability (e.g., 95%) by selecting a specific set of constraint points in the domain. We then enforce non-negativity on the posterior Gaussian Process at these selected points. We minimize the log-likelihood in Equation 1, using the QHMC algorithm. Leveraging the Bayesian estimation [8], we can approximate the posterior distribution by log-likelihood function and prior probability distribution as shown in the following:

p(𝐗,θ|𝐘)p(𝐗,θ,𝐘)=p(θ)p(𝐗|θ)p(𝐘|𝐗,θ).proportional-to𝑝𝐗conditional𝜃𝐘𝑝𝐗𝜃𝐘𝑝𝜃𝑝conditional𝐗𝜃𝑝conditional𝐘𝐗𝜃\displaystyle p(\mathbf{X},\theta|\mathbf{Y})\propto p(\mathbf{X},\theta,% \mathbf{Y})=p(\theta)p(\mathbf{X}|\theta)p(\mathbf{Y}|\mathbf{X},\theta).italic_p ( bold_X , italic_θ | bold_Y ) ∝ italic_p ( bold_X , italic_θ , bold_Y ) = italic_p ( italic_θ ) italic_p ( bold_X | italic_θ ) italic_p ( bold_Y | bold_X , italic_θ ) .

The QHMC training flow starts with this Bayesian learning and proceeds with an MCMC procedure for drawing samples generated by the Bayesian framework. A general sampling procedure at step t𝑡titalic_t is given as

X(t+1)superscript𝑋𝑡1\displaystyle X^{(t+1)}italic_X start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT π(X|θ)=p(X|θ(t),Y),similar-toabsent𝜋conditional𝑋𝜃𝑝conditional𝑋superscript𝜃𝑡𝑌\displaystyle\sim\pi(X|\theta)=p(X|\theta^{(t)},Y),∼ italic_π ( italic_X | italic_θ ) = italic_p ( italic_X | italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_Y ) ,
θ(t+1)superscript𝜃𝑡1\displaystyle\theta^{(t+1)}italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT π(θ|X)=p(θ|X(t+1),Y).similar-toabsent𝜋conditional𝜃𝑋𝑝conditional𝜃superscript𝑋𝑡1𝑌\displaystyle\sim\pi(\theta|X)=p(\theta|X^{(t+1)},Y).∼ italic_π ( italic_θ | italic_X ) = italic_p ( italic_θ | italic_X start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , italic_Y ) .

Input: Initial point x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, step size ϵitalic-ϵ\epsilonitalic_ϵ, number of simulation steps L𝐿Litalic_L, mass distribution parameters μmsubscript𝜇𝑚\mu_{m}italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and σmsubscript𝜎𝑚\sigma_{m}italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

  for t=1,2,𝑡12t=1,2,...italic_t = 1 , 2 , … do
     Resample MtPM(M)similar-tosubscript𝑀𝑡subscript𝑃𝑀𝑀M_{t}\sim P_{M}(M)italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_M )
Resample qtN(0,Mt)similar-tosubscript𝑞𝑡𝑁0subscript𝑀𝑡q_{t}\sim N(0,M_{t})italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_N ( 0 , italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
(x0,q0)=(x(t),q(t))subscript𝑥0subscript𝑞0superscript𝑥𝑡superscript𝑞𝑡(x_{0},q_{0})=(x^{(t)},q^{(t)})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT )
q0q0ϵ2U(x0)subscript𝑞0subscript𝑞0italic-ϵ2𝑈subscript𝑥0q_{0}\leftarrow q_{0}-\frac{\epsilon}{2}\nabla U(x_{0})italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ∇ italic_U ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
     for i=1,2,,L1𝑖12𝐿1i=1,2,...,L-1italic_i = 1 , 2 , … , italic_L - 1 do
        xixi1+ϵMt1qi1subscript𝑥𝑖subscript𝑥𝑖1italic-ϵsuperscriptsubscript𝑀𝑡1subscript𝑞𝑖1x_{i}\leftarrow x_{i-1}+\epsilon M_{t}^{-1}q_{i-1}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_ϵ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT
qiqi1ϵ2U(xi)subscript𝑞𝑖subscript𝑞𝑖1italic-ϵ2𝑈subscript𝑥𝑖q_{i}\leftarrow q_{i-1}-\frac{\epsilon}{2}\nabla U(x_{i})italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ∇ italic_U ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
     end for
xLxL1+ϵMt1qL1subscript𝑥𝐿subscript𝑥𝐿1italic-ϵsuperscriptsubscript𝑀𝑡1subscript𝑞𝐿1x_{L}\leftarrow x_{L-1}+\epsilon M_{t}^{-1}q_{L-1}italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT + italic_ϵ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT
qLqL1ϵ2U(xL)subscript𝑞𝐿subscript𝑞𝐿1italic-ϵ2𝑈subscript𝑥𝐿q_{L}\leftarrow q_{L-1}-\frac{\epsilon}{2}\nabla U(x_{L})italic_q start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ← italic_q start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT - divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ∇ italic_U ( italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )
(x^,q^)=(xL,qL)^𝑥^𝑞subscript𝑥𝐿subscript𝑞𝐿(\hat{x},\hat{q})=(x_{L},q_{L})( over^ start_ARG italic_x end_ARG , over^ start_ARG italic_q end_ARG ) = ( italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )

MH step: usimilar-to𝑢absentu\simitalic_u ∼ Uniform[0,1]01[0,1][ 0 , 1 ];
ρ=eH(x^,q^)+H(x(t),q(t))𝜌superscript𝑒𝐻^𝑥^𝑞𝐻superscript𝑥𝑡superscript𝑞𝑡\rho=e^{-H(\hat{x},\hat{q})+H(x^{(t)},q^{(t)})}italic_ρ = italic_e start_POSTSUPERSCRIPT - italic_H ( over^ start_ARG italic_x end_ARG , over^ start_ARG italic_q end_ARG ) + italic_H ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT;
     if u<min(1,ρ)𝑢1𝜌u<\min(1,\rho)italic_u < roman_min ( 1 , italic_ρ ) then
        (x(t+1),q(t+1))=(x^,q^)superscript𝑥𝑡1superscript𝑞𝑡1^𝑥^𝑞(x^{(t+1)},q^{(t+1)})=(\hat{x},\hat{q})( italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) = ( over^ start_ARG italic_x end_ARG , over^ start_ARG italic_q end_ARG )
     else
        (x(t+1),q(t+1)=(x(t),q(t))(x^{(t+1)},q^{(t+1)}=(x^{(t)},q^{(t)})( italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT )
     end if
  end for
Output:
{x(1),x(2),}superscript𝑥1superscript𝑥2\{x^{(1)},x^{(2)},...\}{ italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … }
Algorithm 1 QHMC Training for GP with Inequality Constraints

2.3.1 Enforcing Monotonicity Constraints

We enforce the monotonicity constraints on a GP using the likelihood of derivative observations. We select active constraints and enforce the non-negativity on the partial derivative, i.e.

fxi(𝐱𝐢)0,𝑓subscript𝑥𝑖subscript𝐱𝐢0\displaystyle\frac{\partial f}{\partial x_{i}}(\mathbf{x_{i}})\geq 0,divide start_ARG ∂ italic_f end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT ) ≥ 0 , (7)

where f𝑓fitalic_f is a vector of N𝑁Nitalic_N latent values. In our soft-constrained method, we introduce the non-negativity information in 7 on a set of selected points, and apply the same procedure as in 5. Since the derivative is also a GP with with mean and covariance matrix [23]:

μ(x)=𝔼[Y(x)xi],andK(x,x)=xixiK(x,x),formulae-sequence𝜇superscript𝑥𝔼delimited-[]𝑌𝑥subscript𝑥𝑖and𝐾𝑥superscript𝑥subscript𝑥𝑖subscriptsuperscript𝑥𝑖𝐾𝑥superscript𝑥\mu(x^{\prime})=\mathbb{E}\left[\frac{\partial Y(x)}{\partial x_{i}}\right],% \quad\text{and}\quad K(x,x^{\prime})=\frac{\partial}{\partial x_{i}}\frac{% \partial}{\partial x^{\prime}_{i}}K(x,x^{\prime}),italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = blackboard_E [ divide start_ARG ∂ italic_Y ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ] , and italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG ∂ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ,

we can state the new posterior distribution as

p(𝐲,θ|𝐲,𝐱,𝐱)=p(𝐲,θ|f)p(f|𝐲,𝐱,𝐱)𝑑f,𝑝superscript𝐲conditional𝜃𝐲𝐱superscript𝐱𝑝superscript𝐲conditional𝜃superscript𝑓𝑝conditionalsuperscript𝑓𝐲𝐱superscript𝐱differential-d𝑓\displaystyle p(\mathbf{y}^{*},\theta|\mathbf{y},\mathbf{x},\mathbf{x}^{*})=% \int p(\mathbf{y}^{*},\theta|f^{*})p(f^{*}|\mathbf{y},\mathbf{x},\mathbf{x}^{*% })df,italic_p ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ | bold_y , bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ∫ italic_p ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ | italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_p ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | bold_y , bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_d italic_f ,
p(f|𝐲,𝐱,𝐱)=p(f|𝐱,f,f)p(f,f|𝐱,𝐲)𝑑f𝑑f,𝑝conditionalsuperscript𝑓𝐲𝐱superscript𝐱𝑝conditionalsuperscript𝑓superscript𝐱𝑓superscript𝑓𝑝𝑓conditionalsuperscript𝑓𝐱𝐲differential-d𝑓differential-dsuperscript𝑓\displaystyle p(f^{*}|\mathbf{y},\mathbf{x},\mathbf{x}^{*})=\int\int p(f^{*}|% \mathbf{x}^{*},f,f^{\prime})p(f,f^{\prime}|\mathbf{x},\mathbf{y})dfdf^{\prime},italic_p ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | bold_y , bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ∫ ∫ italic_p ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_f , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_p ( italic_f , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x , bold_y ) italic_d italic_f italic_d italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

where 𝐲superscript𝐲\mathbf{y}^{*}bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 𝐟superscript𝐟\mathbf{f}^{*}bold_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denote the predictions at location 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

3 Theoretical analysis of the method

In this section, using Bayes’ Theorem, we will first show that QHMC can generate a steady-state distribution that approaches the true posterior distribution. Then, we present the convergence properties of the probabilistic approach on the optimization problem in the form of 6.

3.1 Convergence of QHMC training

The study presented in [15] demonstrates that the QHMC framework can effectively capture a correct steady-state distribution that describes the desired posterior distribution p(x)exp(U(x))proportional-to𝑝𝑥𝑈𝑥p(x)\propto\exp(-U(x))italic_p ( italic_x ) ∝ roman_exp ( - italic_U ( italic_x ) ) via Bayes’ rule. The joint probability density of (x,q,M)𝑥𝑞𝑀(x,q,M)( italic_x , italic_q , italic_M ) can be calculated by Bayesian theorem:

p(x,q,M)=p(x,q|M)PM(M),𝑝𝑥𝑞𝑀𝑝𝑥conditional𝑞𝑀subscript𝑃𝑀𝑀\displaystyle p(x,q,M)=p(x,q|M)P_{M}(M),italic_p ( italic_x , italic_q , italic_M ) = italic_p ( italic_x , italic_q | italic_M ) italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_M ) ,

where the conditional distribution is approximated as follows:

p(x,q|M)exp(U(x)K(q))=exp(U(x))exp(12qTM1q).proportional-to𝑝𝑥conditional𝑞𝑀𝑈𝑥𝐾𝑞𝑈𝑥12superscript𝑞𝑇superscript𝑀1𝑞\displaystyle p(x,q|M)\propto\exp{\left(-U(x)-K(q)\right)}=\exp{(-U(x))}\exp{% \left(-\frac{1}{2}q^{T}M^{-1}q\right)}.italic_p ( italic_x , italic_q | italic_M ) ∝ roman_exp ( - italic_U ( italic_x ) - italic_K ( italic_q ) ) = roman_exp ( - italic_U ( italic_x ) ) roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_q ) .

Then, we can obtain the following

p(x)=qM𝑑q𝑑Mp(x,q,M)exp(U(x)),𝑝𝑥subscript𝑞subscript𝑀differential-d𝑞differential-d𝑀𝑝𝑥𝑞𝑀proportional-to𝑈𝑥\displaystyle p(x)=\int_{q}\int_{M}dqdMp(x,q,M)\propto\exp(-U(x)),italic_p ( italic_x ) = ∫ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_d italic_q italic_d italic_M italic_p ( italic_x , italic_q , italic_M ) ∝ roman_exp ( - italic_U ( italic_x ) ) ,

which shows that the marginal steady distribution approaches the true posterior distribution [15].

3.2 Convergence properties of probabilistic approach

In this section, we show that satisfying the constraints on a set of locations x𝑥xitalic_x in the domain 𝒟𝒟\mathcal{D}caligraphic_D will preserve convergence. Recall that we solve the following inequality-constrained optimization problem:

argminθsubscriptargmin𝜃\displaystyle\text{argmin}_{\theta}\quadargmin start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT log[p(𝐘|𝐗,θ)]such that𝑝conditional𝐘𝐗𝜃such that\displaystyle-\log[p(\mathbf{Y}|\mathbf{X},\theta)]\quad\text{such that}- roman_log [ italic_p ( bold_Y | bold_X , italic_θ ) ] such that
y(xc(i))2s(xc(i))0for alli=1,2,,m.formulae-sequencesuperscript𝑦superscriptsubscript𝑥𝑐𝑖2𝑠superscriptsubscript𝑥𝑐𝑖0for all𝑖12𝑚\displaystyle y^{*}(x_{c}^{(i)})-2s(x_{c}^{(i)})\geq 0\quad\text{for all}\quad i% =1,2,...,m.italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - 2 italic_s ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ≥ 0 for all italic_i = 1 , 2 , … , italic_m .

Now, we need to show that the result obtained by using selected set of input locations will converge to the value of the regression model’s output. This convergence ensures that probabilistic approach will eventually result in a model that satisfy the desired conditions.

We use the assumption of 𝒟𝒟\mathcal{D}caligraphic_D being finite throughout the proof. The proof can be constructed for the cases whether the domain is countable or uncountable.

(i) Assume that the domain 𝒟𝒟\mathcal{D}caligraphic_D is a countable set containing N𝑁Nitalic_N elements. We select a subset 𝒟m𝒟subscript𝒟𝑚𝒟\mathcal{D}_{m}\in\mathcal{D}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_D with m𝑚mitalic_m points, where xc(1),xc(2),,xc(m)𝒟msuperscriptsubscript𝑥𝑐1superscriptsubscript𝑥𝑐2superscriptsubscript𝑥𝑐𝑚subscript𝒟𝑚x_{c}^{(1)},x_{c}^{(2)},...,x_{c}^{(m)}\in\mathcal{D}_{m}italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Each point x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D has an associated Gaussian probability distribution, and we define the set of distributions of x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D as 𝒫𝒫\mathcal{P}caligraphic_P. For the constraint points x𝒟m𝑥subscript𝒟𝑚x\in\mathcal{D}_{m}italic_x ∈ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we have m𝑚mitalic_m constraints and their corresponding probability distributions, which we define as 𝒫msubscript𝒫𝑚\mathcal{P}_{m}caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Additionally, we introduce a set H(x)𝐻𝑥H(x)italic_H ( italic_x ) such that

H(x):={θ|p(𝐘|𝐗,θ)<0},assign𝐻𝑥conditional-set𝜃𝑝conditional𝐘𝐗𝜃0\displaystyle H(x):=\{\theta|p(\mathbf{Y}|\mathbf{X},\theta)<0\},italic_H ( italic_x ) := { italic_θ | italic_p ( bold_Y | bold_X , italic_θ ) < 0 } ,

which covers the locations where the non-negativity constraint is violated. For each fixed x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D, we define

v(x)𝑣𝑥\displaystyle v(x)italic_v ( italic_x ) :=infP𝒫P(𝐘|𝐗,θ)<0infP𝒫P(H(x)),andformulae-sequenceassignabsentsubscriptinfimum𝑃𝒫𝑃conditional𝐘𝐗𝜃0subscriptinfimum𝑃𝒫𝑃𝐻𝑥and\displaystyle:=\inf_{P\in\mathcal{P}}P(\mathbf{Y}|\mathbf{X},\theta)<0\equiv% \inf_{P\in\mathcal{P}}P(H(x)),\quad\text{and}:= roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( bold_Y | bold_X , italic_θ ) < 0 ≡ roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) , and
vm(x)subscript𝑣𝑚𝑥\displaystyle v_{m}(x)italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) :=infP𝒫mP(𝐘|𝐗,θ)<0infP𝒫mP(H(x)).assignabsentsubscriptinfimum𝑃subscript𝒫𝑚𝑃conditional𝐘𝐗𝜃0subscriptinfimum𝑃subscript𝒫𝑚𝑃𝐻𝑥\displaystyle:=\inf_{P\in\mathcal{P}_{m}}P(\mathbf{Y}|\mathbf{X},\theta)<0% \equiv\inf_{P\in\mathcal{P}_{m}}P(H(x)).:= roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( bold_Y | bold_X , italic_θ ) < 0 ≡ roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) .

(ii) Assume that the domain 𝒟𝒟\mathcal{D}caligraphic_D is a finite but uncountable set. In this case, we can construct a countable subset 𝒟~~𝒟\mathcal{\tilde{D}}over~ start_ARG caligraphic_D end_ARG such that x𝒟~𝑥~𝒟x\in\mathcal{\tilde{D}}italic_x ∈ over~ start_ARG caligraphic_D end_ARG. We define the set of probability distributions as in case (i). Since 𝒟𝒟\mathcal{D}caligraphic_D is finite, the set 𝒟{x}𝒟𝑥\mathcal{D}\cup\{x\}caligraphic_D ∪ { italic_x } is also finite. Consequently, the sets H(x),v(x)𝐻𝑥𝑣𝑥H(x),v(x)italic_H ( italic_x ) , italic_v ( italic_x ) and vm(x)subscript𝑣𝑚𝑥v_{m}(x)italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) can be constructed as in the first case. Now, we will establish a convergence of vmsubscript𝑣𝑚v_{m}italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over v𝑣vitalic_v as 𝒫msubscript𝒫𝑚\mathcal{P}_{m}caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT converges to 𝒫𝒫\mathcal{P}caligraphic_P.

First, let us provide distance metrics used throughout the proof. Following the definitions in [9], let

d(x,A):=infxAxxassign𝑑𝑥𝐴subscriptinfimumsuperscript𝑥𝐴norm𝑥superscript𝑥\displaystyle d(x,A):=\inf_{x^{\prime}\in A}||x-x^{\prime}||italic_d ( italic_x , italic_A ) := roman_inf start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A end_POSTSUBSCRIPT | | italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | |

denote the distance from a point x𝑥xitalic_x to a set A𝐴Aitalic_A. Using this, we can define the distance of two compact sets A𝐴Aitalic_A and B𝐵Bitalic_B as

𝔻(A,B):=supxAd(x,B).assign𝔻𝐴𝐵subscriptsupremum𝑥𝐴𝑑𝑥𝐵\displaystyle\mathbb{D}(A,B):=\sup_{x\in A}d(x,B).blackboard_D ( italic_A , italic_B ) := roman_sup start_POSTSUBSCRIPT italic_x ∈ italic_A end_POSTSUBSCRIPT italic_d ( italic_x , italic_B ) .

Then, the Hausdorff distance between A𝐴Aitalic_A and B𝐵Bitalic_B is defined as (A,B):=max{𝔻(A,B),𝔻(B,A)}assign𝐴𝐵𝔻𝐴𝐵𝔻𝐵𝐴\mathbb{H}(A,B):=\max\{\mathbb{D}(A,B),\mathbb{D}(B,A)\}blackboard_H ( italic_A , italic_B ) := roman_max { blackboard_D ( italic_A , italic_B ) , blackboard_D ( italic_B , italic_A ) }. Finally, we define a pseudo-metric 𝐝𝐝\mathbf{d}bold_d to describe the distance between two probability distributions P𝑃Pitalic_P and P~~𝑃\tilde{P}over~ start_ARG italic_P end_ARG as

𝐝(P,P~):=supx𝒟|P(H(x))P~(H(x))|,assign𝐝𝑃~𝑃subscriptsupremum𝑥𝒟𝑃𝐻𝑥~𝑃𝐻𝑥\displaystyle\mathbf{d}(P,\tilde{P}):=\sup_{x\in\mathcal{D}}|P(H(x))-\tilde{P}% (H(x))|,bold_d ( italic_P , over~ start_ARG italic_P end_ARG ) := roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | italic_P ( italic_H ( italic_x ) ) - over~ start_ARG italic_P end_ARG ( italic_H ( italic_x ) ) | ,

where 𝒟𝒟\mathcal{D}caligraphic_D is the domain specified in Section 3.2.

Assumption 1.

We assume that the probability distributions 𝒫𝒫\mathcal{P}caligraphic_P and 𝒫msubscript𝒫𝑚\mathcal{P}_{m}caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT satisfy the following conditions:

  1. 1.

    There exists a weakly compact set 𝒫~~𝒫\mathcal{\tilde{P}}over~ start_ARG caligraphic_P end_ARG such that 𝒫𝒫~𝒫~𝒫\mathcal{P}\subset\mathcal{\tilde{P}}caligraphic_P ⊂ over~ start_ARG caligraphic_P end_ARG and 𝒫m𝒫~subscript𝒫𝑚~𝒫\mathcal{P}_{m}\subset\mathcal{\tilde{P}}caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⊂ over~ start_ARG caligraphic_P end_ARG.

  2. 2.

    limmN𝐝(𝒫,𝒫m)=0,𝑚𝑁𝐝𝒫subscript𝒫𝑚0\underset{m\rightarrow N}{\lim}\mathbf{d}(\mathcal{P},\mathcal{P}_{m})=0,start_UNDERACCENT italic_m → italic_N end_UNDERACCENT start_ARG roman_lim end_ARG bold_d ( caligraphic_P , caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = 0 , with probability 1.

  3. 3.

    limmN𝐝(𝒫m,𝒫)=0𝑚𝑁𝐝subscript𝒫𝑚𝒫0\underset{m\rightarrow N}{\lim}\mathbf{d}(\mathcal{P}_{m},\mathcal{P})=0start_UNDERACCENT italic_m → italic_N end_UNDERACCENT start_ARG roman_lim end_ARG bold_d ( caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_P ) = 0, with probability 1.

Now, we show that Theorem 1 holds under the assumptions in Assumption 1. Recall that we have

(convV,convVm)=max{|supP𝒫mP(H(x))supP𝒫P(H(x))|,|infP𝒫mP(H(x))infP𝒫P(H(x))|}.conv𝑉convsubscript𝑉𝑚subscriptsupremum𝑃subscript𝒫𝑚𝑃𝐻𝑥subscriptsupremum𝑃𝒫𝑃𝐻𝑥subscriptinfimum𝑃subscript𝒫𝑚𝑃𝐻𝑥subscriptinfimum𝑃𝒫𝑃𝐻𝑥\displaystyle\mathbb{H}(\text{conv}V,\text{conv}V_{m})=\max\left\{\left|\sup_{% P\in\mathcal{P}_{m}}P(H(x))-\sup_{P\in\mathcal{P}}P(H(x))\right|,\left|\inf_{P% \in\mathcal{P}_{m}}P(H(x))-\inf_{P\in\mathcal{P}}P(H(x))\right|\right\}.blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = roman_max { | roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) - roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) | , | roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) - roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) | } .

Based on the definition and property of Hausdorff distance [10] we also have

(convV,convVm)(V,Vm)max{𝔻(V,Vm),𝔻(Vm,V)}.conv𝑉convsubscript𝑉𝑚𝑉subscript𝑉𝑚𝔻𝑉subscript𝑉𝑚𝔻subscript𝑉𝑚𝑉\displaystyle\mathbb{H}(\text{conv}V,\text{conv}V_{m})\leq\mathbb{H}(V,V_{m})% \leq\max\{\mathbb{D}(V,V_{m}),\mathbb{D}(V_{m},V)\}.blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ blackboard_H ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ roman_max { blackboard_D ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , blackboard_D ( italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_V ) } .

Here we have

𝔻(V,Vm)𝔻𝑉subscript𝑉𝑚\displaystyle\mathbb{D}(V,V_{m})blackboard_D ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) =supvVinfvVmvvabsentsubscriptsupremum𝑣𝑉subscriptinfimumsuperscript𝑣subscript𝑉𝑚norm𝑣superscript𝑣\displaystyle=\sup_{v\in V}\inf_{v^{\prime}\in V_{m}}||v-v^{\prime}||= roman_sup start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | italic_v - italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | |
=supP𝒫infP~𝒫mP(H(x))P~(H(x))absentsubscriptsupremum𝑃𝒫subscriptinfimum~𝑃subscript𝒫𝑚norm𝑃𝐻𝑥~𝑃𝐻𝑥\displaystyle=\sup_{P\in\mathcal{P}}\inf_{\tilde{P}\in\mathcal{P}_{m}}||P(H(x)% )-\tilde{P}(H(x))||= roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT over~ start_ARG italic_P end_ARG ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | italic_P ( italic_H ( italic_x ) ) - over~ start_ARG italic_P end_ARG ( italic_H ( italic_x ) ) | |
supP𝒫infP~𝒫msupx𝒟P(H(x))P~(H(x))absentsubscriptsupremum𝑃𝒫subscriptinfimum~𝑃subscript𝒫𝑚subscriptsupremum𝑥𝒟norm𝑃𝐻𝑥~𝑃𝐻𝑥\displaystyle\leq\sup_{P\in\mathcal{P}}\inf_{\tilde{P}\in\mathcal{P}_{m}}\sup_% {x\in\mathcal{D}}||P(H(x))-\tilde{P}(H(x))||≤ roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT over~ start_ARG italic_P end_ARG ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | | italic_P ( italic_H ( italic_x ) ) - over~ start_ARG italic_P end_ARG ( italic_H ( italic_x ) ) | |
=𝐝(𝒫,𝒫m).absent𝐝𝒫subscript𝒫𝑚\displaystyle=\mathbf{d}(\mathcal{P},\mathcal{P}_{m}).= bold_d ( caligraphic_P , caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) .

Applying the same procedure, we also obtain 𝔻(Vm,V)𝐝(𝒫m,𝒫)𝔻subscript𝑉𝑚𝑉𝐝subscript𝒫𝑚𝒫\mathbb{D}(V_{m},V)\leq\mathbf{d}(\mathcal{P}_{m},\mathcal{P})blackboard_D ( italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_V ) ≤ bold_d ( caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_P ). Hence,

(convV,convVm)(V,Vm)(𝒫m,𝒫).conv𝑉convsubscript𝑉𝑚𝑉subscript𝑉𝑚subscript𝒫𝑚𝒫\displaystyle\mathbb{H}(\text{conv}{V},\text{conv}{V_{m}})\leq\mathbb{H}(V,V_{% m})\leq\mathbb{H}(\mathcal{P}_{m},\mathcal{P}).blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ blackboard_H ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ blackboard_H ( caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_P ) .

Consequently, we obtain

|vm(x)v(x)|subscript𝑣𝑚𝑥𝑣𝑥\displaystyle|v_{m}(x)-v(x)|| italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) - italic_v ( italic_x ) | |infP𝒫mP(H(x))infP𝒫P(H(x))|absentsubscriptinfimum𝑃subscript𝒫𝑚𝑃𝐻𝑥subscriptinfimum𝑃𝒫𝑃𝐻𝑥\displaystyle\leq\left|\inf_{P\in\mathcal{P}_{m}}P(H(x))-\inf_{P\in\mathcal{P}% }P(H(x))\right|≤ | roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) - roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) |
(convV,convVm)absentconv𝑉convsubscript𝑉𝑚\displaystyle\leq\mathbb{H}(\text{conv}{V},\text{conv}{V_{m}})≤ blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
(𝒫m,𝒫).absentsubscript𝒫𝑚𝒫\displaystyle\leq\mathbb{H}(\mathcal{P}_{m},\mathcal{P}).≤ blackboard_H ( caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_P ) .
Theorem 1.

vmsubscript𝑣𝑚v_{m}italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT converges to v𝑣vitalic_v as 𝒫msubscript𝒫𝑚\mathcal{P}_{m}caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT converges to 𝒫𝒫\mathcal{P}caligraphic_P, that is

limmNsupx𝒟|vm(x)v(x)|=0.subscript𝑚𝑁subscriptsupremum𝑥𝒟subscript𝑣𝑚𝑥𝑣𝑥0\displaystyle\lim_{m\rightarrow N}\sup_{x\in\mathcal{D}}|v_{m}(x)-v(x)|=0.roman_lim start_POSTSUBSCRIPT italic_m → italic_N end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) - italic_v ( italic_x ) | = 0 .
Proof.

Let us assume that x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D is fixed, and define

V:={P(H(x)):Pcl𝒫},and,Vm:={P(H(x)):Pcl𝒫m},formulae-sequenceassign𝑉conditional-set𝑃𝐻𝑥𝑃cl𝒫andassignsubscript𝑉𝑚conditional-set𝑃𝐻𝑥𝑃clsubscript𝒫𝑚\displaystyle V:=\{P(H(x)):P\in\text{cl}\mathcal{P}\},\quad\text{and},\quad V_% {m}:=\{P(H(x)):P\in\text{cl}\mathcal{P}_{m}\},italic_V := { italic_P ( italic_H ( italic_x ) ) : italic_P ∈ cl caligraphic_P } , and , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := { italic_P ( italic_H ( italic_x ) ) : italic_P ∈ cl caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ,

where cl represents the closure. Note that both V𝑉Vitalic_V and Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are bounded subsets in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let us define a,b,am𝑎𝑏subscript𝑎𝑚a,b,a_{m}italic_a , italic_b , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and bmsubscript𝑏𝑚b_{m}italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT such that

a:=infvVv,b:=supvVv,am:=infvVmv,bm:=supvVmv,formulae-sequenceassign𝑎subscriptinfimum𝑣𝑉𝑣formulae-sequenceassign𝑏subscriptsupremum𝑣𝑉𝑣formulae-sequenceassignsubscript𝑎𝑚subscriptinfimum𝑣subscript𝑉𝑚𝑣assignsubscript𝑏𝑚subscriptsupremum𝑣subscript𝑉𝑚𝑣\displaystyle a:=\inf_{v\in V}v,\quad b:=\sup_{v\in V}v,\quad a_{m}:=\inf_{v% \in V_{m}}v,\quad b_{m}:=\sup_{v\in V_{m}}v,italic_a := roman_inf start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_v , italic_b := roman_sup start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_v , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := roman_inf start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v , italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v ,

The Hausdorff distance between convex hulls (conv) of the sets V𝑉Vitalic_V and Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are calculated as [10]

(convV,convVm)=max{|bmb|,|aam|}.conv𝑉convsubscript𝑉𝑚subscript𝑏𝑚𝑏𝑎subscript𝑎𝑚\displaystyle\mathbb{H}(\text{conv}V,\text{conv}V_{m})=\max\{|b_{m}-b|,|a-a_{m% }|\}.blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = roman_max { | italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_b | , | italic_a - italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | } .

Since we know that

bmb=supvVmvsupvVv,andama=infvVmvinfvVv,formulae-sequencesubscript𝑏𝑚𝑏subscriptsupremum𝑣subscript𝑉𝑚𝑣subscriptsupremum𝑣𝑉𝑣andsubscript𝑎𝑚𝑎subscriptinfimum𝑣subscript𝑉𝑚𝑣subscriptinfimum𝑣𝑉𝑣\displaystyle b_{m}-b=\sup_{v\in V_{m}}v-\sup_{v\in V}v,\quad\text{and}\quad a% _{m}-a=\inf_{v\in V_{m}}v-\inf_{v\in V}v,italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_b = roman_sup start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v - roman_sup start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_v , and italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_a = roman_inf start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v - roman_inf start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_v ,

we have

(convV,convVm)=max{|supP𝒫mP(H(x))supP𝒫P(H(x))|,|infP𝒫mP(H(x))infP𝒫P(H(x))|}conv𝑉convsubscript𝑉𝑚subscriptsupremum𝑃subscript𝒫𝑚𝑃𝐻𝑥subscriptsupremum𝑃𝒫𝑃𝐻𝑥subscriptinfimum𝑃subscript𝒫𝑚𝑃𝐻𝑥subscriptinfimum𝑃𝒫𝑃𝐻𝑥\displaystyle\mathbb{H}(\text{conv}V,\text{conv}V_{m})=\max\left\{\left|\sup_{% P\in\mathcal{P}_{m}}P(H(x))-\sup_{P\in\mathcal{P}}P(H(x))\right|,\left|\inf_{P% \in\mathcal{P}_{m}}P(H(x))-\inf_{P\in\mathcal{P}}P(H(x))\right|\right\}blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = roman_max { | roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) - roman_sup start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) | , | roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) - roman_inf start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT italic_P ( italic_H ( italic_x ) ) | }

Based on the definition and property of Hausdorff distance [10] we have

(convV,convVm)(V,Vm),conv𝑉convsubscript𝑉𝑚𝑉subscript𝑉𝑚\displaystyle\mathbb{H}(\text{conv}V,\text{conv}V_{m})\leq\mathbb{H}(V,V_{m}),blackboard_H ( conv italic_V , conv italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ blackboard_H ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ,

resulting in [9]

|vm(x)v(x)|(V,Vm)(𝒫,𝒫m).subscript𝑣𝑚𝑥𝑣𝑥𝑉subscript𝑉𝑚𝒫subscript𝒫𝑚\displaystyle|v_{m}(x)-v(x)|\leq\mathbb{H}(V,V_{m})\leq\mathbb{H}(\mathcal{P},% \mathcal{P}_{m}).| italic_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) - italic_v ( italic_x ) | ≤ blackboard_H ( italic_V , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ blackboard_H ( caligraphic_P , caligraphic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) .

In this setting, x𝑥xitalic_x can be any point in 𝒟𝒟\mathcal{D}caligraphic_D, and the right hand side of the inequality is independent of x𝑥xitalic_x. We can complete the proof by taking the supremum of each side with respect to x𝑥xitalic_x [9]. ∎

4 Numerical examples

In this section, we evaluate the performance of the proposed algorithms on various examples including synthetic and real data. We conduct experiments on datasets that have different scales of sizes and dimensions. We introduce several versions of QHMC algorithms depending on the selection of constraint point locations and probabilistic approach.

Rather than randomly locating m𝑚mitalic_m constraint points, we start with an empty constraint set and determine the locations of the constraint points one by one adaptively. Throughout this process, we employ various strategies for adding the constraints. The specific approaches are outlined as follows:

  1. 1.

    Constraint-adaptive approach: This approach examines whether the constraint is satisfied at a location. The function value is calculated, and if the constraint is violated at that location, then a constraint point is added.

  2. 2.

    Variance-adaptive approach: We calculate the prediction variance in the test set. We select constraint points at the locations where we observe the largest variance values. The goal is basically to reduce the variance in predictions and increase the stability.

  3. 3.

    Combination of constraint and variance adaption: In this approach, we determine a threshold value (e.g. 0.200.200.200.20) for the variance, and the algorithm locates constraint points to the locations where the largest prediction variance is observed. Once the variance reduces to the threshold value, the algorithm switch to the first strategy, in which it locates constraint points where the violation occurs.

We represent the constraint-adaptive, hard-constrained approach as QHMCad and its soft-constrained counterpart as QHMCsoftad. Similarly, QHMCvar refers to the method focusing on variance, while QHMCsoftvar corresponds to its soft-constrained version. We denote the combination of the two approaches with hard and soft constraints by QHMCboth and QHMCsoftboth, respectively. For the sake of comparison, we implement the truncated Gaussian algorithms using an HMC sampler (tnHMC) and a QHMC sampler (tnQHMC) for inequality-constrained examples, while we implement additive GP (additiveGP) algorithm for monotonicity-constrained examples.

For the synthetic examples, we evaluated the time and accuracy performances of the algorithms while simultaneously changing the dataset size and noise level in the data. Following [20], as our metric, we calculate the relative l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error between the posterior mean ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and the true value of the target function f(x)𝑓𝑥f(x)italic_f ( italic_x ) on a set of test points 𝐗t={xT(i)}i=1Ntsubscript𝐗𝑡superscriptsubscriptsuperscriptsubscript𝑥𝑇𝑖𝑖1subscript𝑁𝑡\mathbf{X}_{t}=\{x_{T}^{(i)}\}_{i=1}^{N_{t}}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT:

E=i=1Nt[y(xT(i))f(xT(i))]2i=1Ntf(xT(i))2.𝐸superscriptsubscript𝑖1subscript𝑁𝑡superscriptdelimited-[]superscript𝑦superscriptsubscript𝑥𝑇𝑖𝑓superscriptsubscript𝑥𝑇𝑖2superscriptsubscript𝑖1subscript𝑁𝑡𝑓superscriptsuperscriptsubscript𝑥𝑇𝑖2\displaystyle E=\sqrt{\frac{\sum_{i=1}^{N_{t}}[y^{*}(x_{T}^{(i)})-f(x_{T}^{(i)% })]^{2}}{\sum_{i=1}^{N_{t}}f(x_{T}^{(i)})^{2}}}.italic_E = square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

Additionally, in order to highlight the advantage of QHMC over HMC, we implement our approach using the standard HMC sampling procedure. The relative error, posterior variance and execution time of each version of QHMC and HMC algorithms are presented.

4.1 Inequaltiy Constraints

In this section, we provide two synthetic examples and two real-life application examples to demonstrate the effectiveness of our algorithms on inequality constraints. In synthetic examples, we compare the performance our approach with truncated Gaussian methods for a 2-dimensional and a 10-dimensional problems. For the 2-dimensional example, our primary focus is on enforcing the non-negativity constraints within the GP model. In the case of the 10-dimensional example, we generalize our analysis to satisfy a different inequality constraint. We evaluate the performances of truncated Gaussian, QHMC and soft-QHMC methods. In third example, we consider conservative transport in a steady-state velocity field in heterogeneous porous media. Despite being a two-dimensional problem, the non-homogeneous structure of the solute concentration introduces complexity and increases the level of difficulty. The last example is a 3-dimensional heat transfer problem in a hallow sphere.

4.1.1 Example 1

We consider the following 2D function under non-negativity constraints:

f(x)=arctan5x1+arctanx2,𝑓𝑥5subscript𝑥1subscript𝑥2\displaystyle f(x)=\arctan{5x_{1}}+\arctan{x_{2}},italic_f ( italic_x ) = roman_arctan 5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_arctan italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where {x1,x2}[0,1]2subscript𝑥1subscript𝑥2superscript012\{x_{1},x_{2}\}\in[0,1]^{2}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We train our GP model via QHMC over 20202020 randomly selected locations.

Figure 1 presents the relative error values of the algorithms with respect to two parameters: the size of the dataset and signal-to-noise ratio (SNR). It can be seen that the most accurate results without adding any noise are provided by QHMCboth and tnQHMC algorithms with around 10%percent1010\%10 % relative error. However, upon introducing the noise to the data and increasing its magnitude, we observe a distinct pattern. The QHMC methods exhibit relative error values of approximately 15%percent1515\%15 % within the SNR range of 15%percent1515\%15 % to 20%percent2020\%20 %. In contrast, the relative error of the truncated Gaussian methods increases to 25%percent2525\%25 % within the same noise range. This pattern demonstrates that QHMC methods can tolerate noise and maintain higher accuracy under these conditions.

In Table 1, the comparison between QHMC and HMC algorithms with a dataset size of 200 is presented. The relative error values indicate that QHMC yields approximately 20%percent2020\%20 % more accurate results than HMC, and it achieves this with a shorter processing time. Consequently, QHMC demonstrates both higher accuracy and efficiency compared to HMC.

Refer to caption
Figure 1: Relative error of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 1 (2D), inequality.
Table 1: Comparison of QHMC and HMC on 2D, inequality.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.10 0.14 46s HMC-ad 0.12 0.17 52s
QHMC-soft-ad 0.11 0.16 39s HMC-soft-ad 0.13 0.19 48s
QHMC-var 0.11 0.12 40s HMC-var 0.13 0.14 46s
QHMC-soft-var 0.12 0.15 34s HMC-soft-var 0.15 0.14 42s
QHMC-both 0.08 0.13 48s HMC-both 0.10 0.14 53s
QHMC-soft-both 0.09 0.13 39s HMC-soft-both 0.12 0.15 44s

Further, we compare the time performances of the algorithms in Figure 2 which demonstrates that QHMC methods, especially the probabilistic QHMC approaches can perform much faster than the truncated Gaussian methods. In this simple 2D example, the presence of noise does not significantly impact the running times of the QHMC algorithms. In contrast, truncated Gaussian algorithms are slower under noisy environment even when the dataset size is small. We can also observe in Figure 3 that the QHMC algorithms, especially QHMCvar and QHMCboth are the most robust ones, as their small relative error comes with a small posterior variance. In contrast, the posterior variance values of the truncated Gaussian methods are higher than QHMC posterior variances even when there is no noise, and gets higher along with the relative error (see Figure 1) when the SNR levels increase. Combining all of these experiments, we can conclude that QHMC methods achieve higher accuracy within a shorter time frame. Consequently, these methods prove to be more efficient and robust as they can effectively tolerate changes in parameters. Additionally, it is worth noting that we observed a slight improvement in the performance of truncated Gaussian algorithms by implementing tnQHMC. Based on the numerical results obtained by tnQHMC, it can be concluded that employing tnQHMC not only yields higher accuracy but also saves some computational time compared to tnHMC.

Refer to caption
Figure 2: Execution times (in seconds) of the algorithms with different signal to noise ratios (SNR) and datasizes for Example 1 (2D), inequality.
Refer to caption
Figure 3: Posterior variances of the algorithms with different signal to noise ratios (SNR) and datasizes for Example 1 (2D), inequality.

4.1.2 Example 2

Next, we consider the 10D Ackley function [7] defined as follows:

f(x)=aexp(b1di=1dxi2)exp(b1di=1dcoscxi)+a+exp1,𝑓𝑥𝑎𝑏1𝑑superscriptsubscript𝑖1𝑑superscriptsubscript𝑥𝑖2𝑏1𝑑superscriptsubscript𝑖1𝑑𝑐subscript𝑥𝑖𝑎1\displaystyle f(x)=-a\exp{\left(-b\sqrt{\frac{1}{d}\sum_{i=1}^{d}x_{i}^{2}}% \right)}-\exp{\left(-b\sqrt{\frac{1}{d}\sum_{i=1}^{d}\cos{cx_{i}}}\right)}+a+% \exp{1},italic_f ( italic_x ) = - italic_a roman_exp ( - italic_b square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) - roman_exp ( - italic_b square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_cos italic_c italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) + italic_a + roman_exp 1 ,

where d=10𝑑10d=10italic_d = 10, a=20𝑎20a=20italic_a = 20, b=0.2𝑏0.2b=0.2italic_b = 0.2 and c=2π𝑐2𝜋c=2\piitalic_c = 2 italic_π. We study the performance of the algorithms on the domain [10,10]10superscript101010[-10,10]^{10}[ - 10 , 10 ] start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT while enforcing the function to be greater than 5555.

Table 2: Comparison of QHMC and HMC on 10D, inequality.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.10 0.13 39m 17s HMC-ad 0.12 0.15 43m 33s
QHMC-soft-ad 0.11 0.14 36m 21s HMC-soft-ad 0.13 0.15 41m 10s
QHMC-var 0.11 0.11 37m 4s HMC-var 0.13 0.12 41m 31s
QHMC-soft-var 0.12 0.11 34m 23s HMC-soft-var 0.14 0.12 37m 42s
QHMC-both 0.09 0.12 40m 8s HMC-both 0.10 0.14 44m 23s
QHMC-soft-both 0.10 0.12 37m 53s HMC-soft-both 0.12 0.14 42m 5s

Figure 4 illustrates that QHMCboth, QHMCsoftboth and truncated Gaussian algorithms yield the lowest error when there is no noise in the data. However, as the noise level increases, truncated Gaussian methods fall behind all QHMC approaches. Specifically, both the QHMCboth and QHMCsofthboth algorithms demonstrate the ability to tolerate noise levels up to 15%percent1515\%15 % with an associated relative error of approximately 15%percent1515\%15 %. However, other variants of QHMC methods display greater noise tolerance when dealing with larger datasets. With fewer than 100100100100 data points, the error rate reaches around 25%percent2525\%25 %, but it decreases to 1520%15percent2015-20\%15 - 20 % when the number of data points exceeds 100100100100.

Refer to caption
Figure 4: Relative error of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 2 (10D), inequality.

Figure 5 illustrates the time comparison of the algorithms, where we can observe that QHMC methods provide around 3035%30percent3530-35\%30 - 35 % time efficiency for the datasets larger than a size of 150150150150. Combining this time advantage with the higher accuracy of QHMC indicates that both soft and hard constrained QHMC algorithms outperform truncated Gaussian methods across various criteria. QHMC methods offer the flexibility to employ one of the algorithms depending on the priority of the experiments. For example, if speed is the primary consideration, QHMCsoftvar is the fastest method while maintaining a good level of accuracy. If accuracy is the most important metric, employing QHMCboth would be a wiser choice, as it still offers significant time savings compared to other methods.

Figure 6 presents that the posterior variance values of truncated Gaussian methods are significantly higher than that of the QHMC algorithms, especially when the noise levels are higher than 5%percent55\%5 %. As expected, QHMCvar and QHMCsoftvar algorithms offer the lowest variance, while QHMCboth and QHMCsoftboth follow them. A clear pattern is shown in the figure, in which QHMC approaches can tolerate higher noise levels especially when the dataset is large.

Refer to caption
Figure 5: Execution times (in minutes) of the algorithms with different signal to noise ratios (SNR) and datasizes for Example 2 (10D), inequality.

It is notable that our method demonstrates a significant increase in efficiency as the dimension increases. When comparing this 10D example to the 2D case, the execution times of the truncated Gaussian methods are notably impacted by the dimension, even in the absence of noise in the datasets. Although their relative error levels can remain low without noise, it takes 1.51.51.51.5 times longer than the QHMC methods to offer those accuracy. Additionally, this observation holds only for cases where the data is noise-free. As soon as noise is present, the accuracy of truncated Gaussian methods deteriorates, whereas QHMC methods can withstand the noise and yield good results in a shorter time span.

Refer to caption
Figure 6: Posterior variances of the algorithms with different signal to noise ratios (SNR) and datasizes for Example 2 (10D), inequality.

4.1.3 Example 3: Solute transport in heterogeneous porous media

Following the example in [27], we examine conservative transport within a constant velocity field in heterogeneous porous media. Let us denote the solute concentration by C(𝐱,t)(𝐱=(x,y)T)𝐶𝐱𝑡𝐱superscript𝑥𝑦TC(\mathbf{x},t)(\mathbf{x}=(x,y)^{\text{T}})italic_C ( bold_x , italic_t ) ( bold_x = ( italic_x , italic_y ) start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ), and suppose that the measurements of C(𝐱,t)𝐶𝐱𝑡C(\mathbf{x},t)italic_C ( bold_x , italic_t ) are available at various locations at different times. Conservation laws can be used to describe the processes of flow and transport. Specifically, we can describe the flow using Darcy flow equation [27]

{(Kh)=0,𝐱𝔻,h𝐧=0,y=0 or y=L2,h=H1,x=0,h=H2,x=L1,cases𝐾0𝐱𝔻𝐧0𝑦0 or 𝑦subscript𝐿2subscript𝐻1𝑥0subscript𝐻2𝑥subscript𝐿1\displaystyle\begin{cases}\nabla\cdot(K\nabla h)=0,&\mathbf{x}\in\mathbb{D},\\ \frac{\partial h}{\partial\mathbf{n}}=0,&y=0\text{ or }y=L_{2},\\ h=H_{1},&x=0,\\ h=H_{2},&x=L_{1},\end{cases}{ start_ROW start_CELL ∇ ⋅ ( italic_K ∇ italic_h ) = 0 , end_CELL start_CELL bold_x ∈ blackboard_D , end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_h end_ARG start_ARG ∂ bold_n end_ARG = 0 , end_CELL start_CELL italic_y = 0 or italic_y = italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_h = italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL start_CELL italic_x = 0 , end_CELL end_ROW start_ROW start_CELL italic_h = italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL start_CELL italic_x = italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL end_ROW (8)

where h(𝐱,w)𝐱𝑤h(\mathbf{x},w)italic_h ( bold_x , italic_w ) is the hydraulic head, 𝔻=[0,L1]×[0,L2]𝔻0subscript𝐿10subscript𝐿2\mathbb{D}=[0,L_{1}]\times[0,L_{2}]blackboard_D = [ 0 , italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] × [ 0 , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] is the simulation domain with L1=256subscript𝐿1256L_{1}=256italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 256 and L2=128subscript𝐿2128L_{2}=128italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 128, H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and H2subscript𝐻2H_{2}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are known boundary head values and K(𝐱,w)𝐾𝐱𝑤K(\mathbf{x},w)italic_K ( bold_x , italic_w ) is the unknown hydraulic conductivity field. The field is represented as a stochastic process, with the distribution of values described by a log-normal distribution. Specifically, it is expressed as K(𝐱,w)=expZ(𝐱,w)𝐾𝐱𝑤𝑍𝐱𝑤K(\mathbf{x},w)=\exp Z(\mathbf{x},w)italic_K ( bold_x , italic_w ) = roman_exp italic_Z ( bold_x , italic_w ), where is a second-order stationary GP with a known exponential covariance function, Cov{Z(𝐱),Z(𝐱)}=σZ2exp(|𝐱𝐱|/lz)Cov𝑍𝐱𝑍superscript𝐱superscriptsubscript𝜎𝑍2𝐱superscript𝐱subscript𝑙𝑧\text{Cov}\{Z(\mathbf{x}),Z(\mathbf{x}^{\prime})\}=\sigma_{Z}^{2}\exp{(-|% \mathbf{x}-\mathbf{x}^{\prime}|/l_{z})}Cov { italic_Z ( bold_x ) , italic_Z ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } = italic_σ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - | bold_x - bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | / italic_l start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) where variance σZ2=2superscriptsubscript𝜎𝑍22\sigma_{Z}^{2}=2italic_σ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 and correlation length lz=5subscript𝑙𝑧5l_{z}=5italic_l start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = 5. We can describe the solute transport by the advection-dispersion equation [6, 14, 27] as

{Ct+(v𝐂)=(Dwτ+α𝐯2)C,𝐱 in 𝔻,C=Qδ(𝐱𝐱),t=0,C𝐧=0,y=0 or y=L2 or x=L1,C=0,x=0.cases𝐶𝑡𝑣𝐂subscript𝐷𝑤𝜏𝛼subscriptnorm𝐯2𝐶𝐱 in 𝔻𝐶𝑄𝛿𝐱superscript𝐱𝑡0𝐶𝐧0𝑦0 or 𝑦subscript𝐿2 or 𝑥subscript𝐿1𝐶0𝑥0\displaystyle\begin{cases}\frac{\partial C}{\partial t}+\nabla\cdot(v\mathbf{C% })=\nabla\cdot\left(\frac{D_{w}}{\tau}+\mathbf{\alpha}||\mathbf{v}||_{2}\right% )\nabla C,&\mathbf{x}\text{ in }\mathbb{D},\\ C=Q\delta(\mathbf{x}-\mathbf{x}^{*}),&t=0,\\ \frac{\partial C}{\partial\mathbf{n}}=0,&y=0\text{ or }y=L_{2}\text{ or }x=L_{% 1},\\ C=0,&x=0.\end{cases}{ start_ROW start_CELL divide start_ARG ∂ italic_C end_ARG start_ARG ∂ italic_t end_ARG + ∇ ⋅ ( italic_v bold_C ) = ∇ ⋅ ( divide start_ARG italic_D start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG start_ARG italic_τ end_ARG + italic_α | | bold_v | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∇ italic_C , end_CELL start_CELL bold_x in blackboard_D , end_CELL end_ROW start_ROW start_CELL italic_C = italic_Q italic_δ ( bold_x - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , end_CELL start_CELL italic_t = 0 , end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_C end_ARG start_ARG ∂ bold_n end_ARG = 0 , end_CELL start_CELL italic_y = 0 or italic_y = italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT or italic_x = italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_C = 0 , end_CELL start_CELL italic_x = 0 . end_CELL end_ROW (9)

In this context, C(𝐱,t;w)𝐶𝐱𝑡𝑤C(\mathbf{x},t;w)italic_C ( bold_x , italic_t ; italic_w ) represents the solute concentration defined over 𝔻×[0,T]×Ω𝔻0𝑇Ω\mathbb{D}\times[0,T]\times\Omegablackboard_D × [ 0 , italic_T ] × roman_Ω, 𝐯𝐯\mathbf{v}bold_v denotes the fluid velocity given by 𝐯(𝐱;w)=K(𝐱;ω)h(𝐱,ω)/ϕ𝐯𝐱𝑤𝐾𝐱𝜔𝐱𝜔italic-ϕ\mathbf{v}(\mathbf{x};w)=-K(\mathbf{x};\omega)\nabla h(\mathbf{x},\omega)/\phibold_v ( bold_x ; italic_w ) = - italic_K ( bold_x ; italic_ω ) ∇ italic_h ( bold_x , italic_ω ) / italic_ϕ with ϕitalic-ϕ\phiitalic_ϕ being porosity; Dwsubscript𝐷𝑤D_{w}italic_D start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT is the diffusion coefficient, τ𝜏\tauitalic_τ stands for the tortuosity, and α𝛼\mathbf{\alpha}italic_α is the dispersivity tensor, with diagonal components αLsubscript𝛼𝐿\alpha_{L}italic_α start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and αTsubscript𝛼𝑇\alpha_{T}italic_α start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. In this study, the transport parameters are defined as follows: ϕ=0.317,τ=ϕ1/3,Dw=2.5×105,αL=5formulae-sequenceitalic-ϕ0.317formulae-sequence𝜏superscriptitalic-ϕ13formulae-sequencesubscript𝐷𝑤2.5superscript105subscript𝛼𝐿5\phi=0.317,\tau=\phi^{1/3},D_{w}=2.5\times 10^{-5},\alpha_{L}=5italic_ϕ = 0.317 , italic_τ = italic_ϕ start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 2.5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = 5 and αT=0.5subscript𝛼𝑇0.5\alpha_{T}=0.5italic_α start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 0.5. Lastly, the solute is instantaneously injected at 𝐱=(50,64)superscript𝐱5064\mathbf{x}^{*}=(50,64)bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( 50 , 64 ) at t=0𝑡0t=0italic_t = 0 with the intensity Q=1𝑄1Q=1italic_Q = 1 [27]. In Figure 7, the ground truth with observation locations and constraint locations are presented to provide an insight into the structure of solute concentration.

Refer to caption
Figure 7: Observation locations (black squares) and constraint locations (black stars).

Table 3 presents a comparison of all versions of QHMC and HMC methods, along with the truncated Gaussian algorithms. Similar to the results observed with synthetic examples, the QHMCboth, QHMCsoftboth, and tnQHMC algorithms demonstrate the most accurate predictions with a relative error of 1315%13percent1513-15\%13 - 15 %. Notably, QHMCsoftboth emerges as the fastest among the methods while achieving higher accuracy. For instance, the error value obtained by QHMCsoftboth is 0.140.140.140.14, whereas tnQHMC’s error is 0.150.150.150.15. However, QHMCsoftboth delivers a 20%percent2020\%20 % time efficiency gain with slightly superior accuracy. In Figure 8, a comprehensive comparison of the algorithms is presented. We can observe the decrease in the relative error values while we add the constraints step by step, according the the adopted adaptive approach. Initially, the error is 0.50.50.50.5 and gradually decreases to approximately 0.130.130.130.13. Furthermore, it is evident that the QHMCboth and QHMCsoftboth methods consistently deliver the most accurate results at each step, whereas the performance of the QHMCsoftvar method is outperformed by other approaches.

Table 3: Comparison of QHMC and HMC on solute transport with nonnegativity.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.18 0.13 83s HMC-ad 0.20 0.14 89s
QHMC-soft-ad 0.19 0.13 75s HMC-soft-ad 0.22 0.15 83s
QHMC-var 0.20 0.12 80s HMC-var 0.23 0.13 91s
QHMC-soft-var 0.21 0.13 71s HMC-soft-var 0.24 0.14 79s
QHMC-both 0.13 0.12 86s HMC-both 0.15 0.14 97s
QHMC-soft-both 0.14 0.13 74s HMC-soft-both 0.15 0.15 82s
tnQHMC 0.15 0.13 96s tnHMC 0.16 0.16 103s
Refer to caption
Figure 8: The change in relative error while adding constraints, solute transport.

4.1.4 Example 4: Heat Transfer in a Hallow Sphere

In this 3-dimensional example, we consider a heat transfer problem in a hallow sphere. Let Br(0)subscript𝐵𝑟0B_{r}(0)italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( 0 ) represent a ball centered at 00 with radius r𝑟ritalic_r. Defining the hallow sphere as D=B4(0)B2(0)𝐷subscript𝐵40subscript𝐵20D=B_{4}(0)-B_{2}(0)italic_D = italic_B start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( 0 ) - italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 0 ), the equations are given as [28]

{u(x,t)t(κu(x,t))=0,xD,κu(x,t)n=θ2(πθ)2ϕ2(πϕ)2,if x|2=4 and ϕ0,κu(x,t)n=0,if x2=4 and ϕ<0,u(x,t)=0,if x2=2.cases𝑢x𝑡𝑡𝜅𝑢x𝑡0x𝐷𝜅𝑢x𝑡nsuperscript𝜃2superscript𝜋𝜃2superscriptitalic-ϕ2superscript𝜋italic-ϕ2if subscriptdelimited-‖|x24 and italic-ϕ0𝜅𝑢x𝑡n0if subscriptnormx24 and italic-ϕ0𝑢x𝑡0if subscriptnormx22\displaystyle\begin{cases}\frac{\partial u(\textbf{x},t)}{\partial t}-\nabla% \cdot(\kappa\nabla u(\textbf{x},t))=0,&\textbf{x}\in D,\\ \kappa\frac{\partial u(\textbf{x},t)}{\partial\textbf{n}}=\theta^{2}(\pi-% \theta)^{2}{\phi^{2}(\pi-\phi)^{2}},&\text{if }\|\textbf{x}|_{2}=4\text{ and }% \phi\geq 0,\\ \kappa\frac{\partial u(\textbf{x},t)}{\partial\textbf{n}}=0,&\text{if }\|% \textbf{x}\|_{2}=4\text{ and }\phi<0,\\ u(\textbf{x},t)=0,&\text{if }\|\textbf{x}\|_{2}=2.\end{cases}{ start_ROW start_CELL divide start_ARG ∂ italic_u ( x , italic_t ) end_ARG start_ARG ∂ italic_t end_ARG - ∇ ⋅ ( italic_κ ∇ italic_u ( x , italic_t ) ) = 0 , end_CELL start_CELL x ∈ italic_D , end_CELL end_ROW start_ROW start_CELL italic_κ divide start_ARG ∂ italic_u ( x , italic_t ) end_ARG start_ARG ∂ n end_ARG = italic_θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_π - italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_π - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL start_CELL if ∥ x | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 and italic_ϕ ≥ 0 , end_CELL end_ROW start_ROW start_CELL italic_κ divide start_ARG ∂ italic_u ( x , italic_t ) end_ARG start_ARG ∂ n end_ARG = 0 , end_CELL start_CELL if ∥ x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 and italic_ϕ < 0 , end_CELL end_ROW start_ROW start_CELL italic_u ( x , italic_t ) = 0 , end_CELL start_CELL if ∥ x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 . end_CELL end_ROW (10)

In this context, 𝐧𝐧\mathbf{n}bold_n denotes the normal vector pointing outward, while θ𝜃\thetaitalic_θ and ϕitalic-ϕ\phiitalic_ϕ represent the azimuthal and elevation angles, respectively, of points within the sphere. We determine the precise heat conductivity using κ=1.0+exp(0.05u)𝜅1.00.05𝑢\kappa=1.0+\exp(0.05u)italic_κ = 1.0 + roman_exp ( 0.05 italic_u ). The quadratic elements with 12,854 degrees of freedom are employed, and we set y(𝐱)=u(𝐱,10)𝑦𝐱𝑢𝐱10y(\mathbf{x})=u(\mathbf{x},10)italic_y ( bold_x ) = italic_u ( bold_x , 10 ) to solve the PDEs. Starting with 6 initial locations at 0 on the surface, 6 new constraint locations are introduced based on the active learning approach of the QHMC version. In Figure 9, we can observe the decrease in relative error while the constraints are added step by step. In addition, Figure 10 shows the ground truth and the GP result obtained by QHMCsoftboth algorithm, where we see that QHMCsoftboth y(x)superscript𝑦𝑥y^{*}(x)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) matches the reference model. Moreover, its posterior variance is small based on the results shown in Table 4. The table provides the error, posterior variance and time performances of QHMC and HMC algorithms, and we can see the advantages of QHMC over HMC in all categories, even with the truncated Gaussian algorithm. Although all of the algorithms complete the GP regression in less than 1111 minute, comparing the truncated Gaussian method with QHMC-based algorithms, we observe 4060%40percent6040-60\%40 - 60 % time efficiency along with compatible accuracy of QHMC algorithms. In addition to the time and accuracy performances, it is shown that the posterior variance values are smallest with QHMCvar and QHMCboth approaches, followed by tnQHMC and QHMCad approaches. Using HMC sampling in all methods generates larger posterior variances.

Table 4: Comparison of QHMC and HMC on heat transfer with nonnegativity.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.04 0.04 34s HMC-ad 0.06 0.07 40s
QHMC-soft-ad 0.05 0.04 30s HMC-soft-ad 0.07 0.07 32s
QHMC-var 0.05 0.02 30s HMC-var 0.09 0.05 27s
QHMC-soft-var 0.06 0.03 26s HMC-soft-var 0.10 0.05 29s
QHMC-both 0.02 0.03 32s HMC-both 0.04 0.05 37s
QHMC-soft-both 0.03 0.03 27s HMC-soft-both 0.05 0.06 35s
tnQHMC 0.04 0.05 51s tnHMC 0.06 0.07 56s
Refer to caption
Figure 9: The change in relative error while adding constraints, heat equation.
Refer to caption
(a) Heat equation data, ground truth y(x)𝑦𝑥y(x)italic_y ( italic_x ).
Refer to caption
(b) QHMCsoftboth prediction y(x)superscript𝑦𝑥y^{*}(x)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ).
Figure 10: Comparison of the ground truth and QHMCsoftboth result.
Refer to caption
(a) Initial locations
Refer to caption
(b) Constraint locations added by QHMC.
Figure 11: Initial locations and adaptively added constraint locations.

4.2 Monotonicity Constraints

In this section, we provide two numerical examples to investigate the effectiveness of our algorithms on monotonicity constraints. We enforce the monotonicity constraints in the direction of active variables. Similar to the comparisons in previous section, we illustrate the advantages of QHMC over HMC, and then compare the performance of QHMC algorithms with additive GP approach introduced in [16] with respect to the same criteria as in the previous section.

4.2.1 Example 1

We consider the following 5D function with monotonicity constraints [16]:

f(x)=arctan(5x1)+arctan(2x2)+x3+2x42+21+exp10(x512).𝑓𝑥5subscript𝑥12subscript𝑥2subscript𝑥32superscriptsubscript𝑥422110subscript𝑥512\displaystyle f(x)=\arctan{(5x_{1})}+\arctan{(2x_{2})}+x_{3}+2x_{4}^{2}+\frac{% 2}{1+\exp{-10(x_{5}-\frac{1}{2})}}.italic_f ( italic_x ) = roman_arctan ( 5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + roman_arctan ( 2 italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + 2 italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 end_ARG start_ARG 1 + roman_exp - 10 ( italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG .

Table 5 shows the performances of HMC and QHMC algorithms, where we observe that QHMC achieves higher accuracy with lower variance in a shorter amount of time. The comparison proves that each version of QHMC is more efficient than HMC In addition, Figure 12 shows the relative error values of QHMC and additive GP algorithms with respect to the change in SNR and dataset size. Based on the results, it is clear that QHMCboth and QHMCsoftboth provide the most accurate results under every different condition, while the difference is more remarkable for the cases in which noise is higher. Although QHMCboth and QHMCsoftboth provides the most accurate results, other QHMC versions also generate more accurate results then additive GP method. Moreover, Figure 13 shows that the soft-constrained QHMC approaches are faster than the hard-constrained QHMC, while hard-constrained QHMC versions are still faster than additive GP algorithm.

Table 5: Comparison of QHMC and HMC on 5D, monotonicity.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.11 0.16 2m 23s HMC-ad 0.13 0.17 3m 14s
QHMC-soft-ad 0.14 0.18 1m 57s HMC-soft-ad 0.17 0.20 2m 48s
QHMC-var 0.12 0.15 2m 13s HMC-var 0.15 0.17 2m 58s
QHMC-soft-var 0.15 0.17 1m 42s HMC-soft-var 0.18 0.19 2m 16s
QHMC-both 0.10 0.13 2m 25s HMC-both 0.12 0.15 2m 58s
QHMC-soft-both 0.12 0.14 1m 55s HMC-soft-both 0.14 0.15 2m 39s
Refer to caption
Figure 12: Relative error of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 1 (5D), monotonicity.
Refer to caption
Figure 13: Time comparison of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 1 (5D), monotonicity.
Refer to caption
Figure 14: Posterior variances of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 1 (5D), monotonicity.

4.2.2 Example 2

We consider the target function used in [16, 3]

f(x1,x2,,xd)=i=1darctan5[1id+1]xi,whered=2.formulae-sequence𝑓subscript𝑥1subscript𝑥2subscript𝑥𝑑superscriptsubscript𝑖1𝑑5delimited-[]1𝑖𝑑1subscript𝑥𝑖where𝑑2\displaystyle f(x_{1},x_{2},...,x_{d})=\sum_{i=1}^{d}\arctan{5\left[1-\frac{i}% {d+1}\right]x_{i}},\quad\text{where}\quad d=2.italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_arctan 5 [ 1 - divide start_ARG italic_i end_ARG start_ARG italic_d + 1 end_ARG ] italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , where italic_d = 2 .

In Table 6, we illustrate accuracy and time advantages of QHMC over HMC. For each version of QHMC and HMC, we see that using QHMC sampling in a specific version accelerates the process while increasing the accuracy. Overall comparison shows that among all versions with QHMC and HMC sampling, QHMCboth is the most accurate approach, while QHMCsoftboth is the fastest and ranked second in accuracy. In this set of experiments, we included the results of HMCsoftboth in the comparison of QHMC-based methods and additive GP. Figure 15 and Figure 16 show the relative error and time performances of QHMC-based algorithms, HMCsoftboth and additive GP algorithm, respectively. In this last example with the highest dimension, we observe the same phenomena as the previous results, in which soft-constrained versions are more efficient, while hard-constrained QHMC approaches are still faster than additive GP under various conditions such as high noise. Depending on Figure 15 and Figure 17, we can state that QHMCboth can tolerate noise levels up to 10%percent1010\%10 % with the smallest error and posterior variance, and it can still provide good accuracy (error is around 0.150.150.150.15) even when the SNR is higher than 10%percent1010\%10 %. It is also worth to mention that although the error values generated by HMCsoftboth and additiveGP are pretty close, HMCsoftboth performs faster than additiveGP, especially when the dataset is larger and noise level is higher.

Table 6: Comparison of QHMC and HMC on 20D, monotonicity.
Method Error Posterior Var Time Method Error Posterior Var Time
QHMC-ad 0.13 0.18 33m 1s HMC-ad 0.15 0.21 35m 38s
QHMC-soft-ad 0.15 0.19 31m 21s HMC-soft-ad 0.18 0.22 33m 41s
QHMC-var 0.14 0.16 32m 53s HMC-var 0.17 0.17 34m 21s
QHMC-soft-var 0.16 0.17 29m 42s HMC-soft-var 0.19 0.18 31m 17s
QHMC-both 0.11 0.14 33m 45s HMC-both 0.14 0.16 36m 21s
QHMC-soft-both 0.12 0.15 29m 48s HMC-soft-both 0.15 0.17 33m 11s
Refer to caption
Figure 15: Relative error of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 2 (20D), monotonicity.
Refer to caption
Figure 16: Time comparison of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 2 (20D), monotonicity.
Refer to caption
Figure 17: Posterior variances of the algorithms with different data sizes and signal to noise ratios (SNR) for Example 2 (20D), monotonicity.

5 Conclusion and Discussion

Leveraging the accuracy of QHMC training and the efficiency of probabilistic approach, we introduced a soft-constrained QHMC algorithm to enforce inequality and monotonicity constraints on the GP. The proposed algorithm reduces the difference between ground truth and the posterior mean in the resulting GP model, while increasing the efficiency by attaining the accurate results in a short amount of time. To further enhance the performance of the QHMC algorithms across various scenarios, we have implemented modified versions adopting adaptive learning. These versions provide flexibility in selecting the most suitable algorithm based on the specific priorities of a given problem.

We provided the convergence of QHMC by showing that its steady-state distribution approach the true posterior density, and theoretically justified that the probabilistic approach preserves convergence. Finally, we have implemented our methods to solve several types of optimization problems. In each experiment, we initially outlined the benefits of QHMC sampling in comparison to HMC sampling. These advantages remained consistent across all cases, resulting in approximately a 20%percent2020\%20 % time-saving and 15%percent1515\%15 % higher accuracy. Having demonstrated the advantages of QHMC sampling, we proceed to evaluate the performance of the algorithms across various scenarios. Our examples cover higher-dimensional problems featuring both inequality and monotonicity constraints. Furthermore, our evaluations include real-world applications where injecting physical properties is essential, particularly in cases involving inequality constraints.
In the context of inequality-constrained Gaussian processes (GPs), we explored 2-dimensional and 10-dimensional synthetic problems, along with two real applications involving 2-dimensional and 3-dimensional data. For synthetic examples, we observe the relative error, posterior variance and execution time of the algorithms while gradually increasing the noise level and dataset size. Overall, QHMC-based algorithms outperformed the truncated Gaussian methods. Although the truncated Gaussian methods provide high accuracy in the absence of noise and are compatible with QHMC approaches, their relative error and posterior variances increase as the noise appeared and increased. Moreover, the advantages of soft-constrained QHMC became more evident, particularly in higher-dimensional cases, when compared to truncated Gaussian and even hard-constrained QHMC. The time comparison of the algorithms underscores that the truncated Gaussian methods are significantly impacted by the curse of dimensionality and large datasets, exhibiting slower performance under these conditions. In real-world application scenarios featuring 2-dimensional and 3-dimensional data, the findings were consistent with those observed in the synthetic examples. Although the accuracy level may not reach the highest levels observed in the synthetic examples and 3-dimensional heat equation problem, the observed trend remains consistent. The lower accuracy observed in the latter problem can be attributed to the non-homogeneous structure of solute concentration.
In the case of monotonicity-constrained GP, we addressed 5-dimensional and 20-dimensional examples, utilizing the same configuration as employed for inequality-constrained GP. A comprehensive comparison was conducted between all versions of QHMC algorithms and the additive GP method. The results indicate that QHMC-based approaches hold a notable advantage, particularly in scenarios involving noise and large datasets. While additive GP proves to be a strong method suitable for high-dimensional cases, QHMC algorithms performed faster and yielded lower variances.
In conclusion, the work has demonstrated that soft-constrained QHMC is a robust, efficient and flexible method that can be applicable to higher dimensional cases and large datasets. Numerical results have shown that soft-constrained QHMC is promising to be generalized to various applications with different physical properties.

References

  • [1] Petter Abrahamsen and Fred Espen Benth. Kriging with inequality constraints. Mathematical Geology, 33:719–744, 2001.
  • [2] Christian Agrell. Gaussian processes with linear operator inequality constraints. arXiv preprint arXiv:1901.03134, 2019.
  • [3] François Bachoc, Andrés F López-Lopera, and Olivier Roustant. Sequential construction and dimension reduction of gaussian processes under inequality constraints. SIAM Journal on Mathematics of Data Science, 4(2):772–800, 2022.
  • [4] Adrian Barbu and Song-Chun Zhu. Monte Carlo Methods, volume 35. Springer, 2020.
  • [5] Sébastien Da Veiga and Amandine Marrel. Gaussian process modeling with inequality constraints. In Annales de la Faculté des sciences de Toulouse: Mathématiques, volume 21, pages 529–555, 2012.
  • [6] Simon Emmanuel and Brian Berkowitz. Mixing-induced precipitation and porosity evolution in porous media. Advances in water resources, 28(4):337–344, 2005.
  • [7] David Eriksson and Matthias Poloczek. Scalable constrained Bayesian optimization. In International Conference on Artificial Intelligence and Statistics, pages 730–738. PMLR, 2021.
  • [8] Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis. Tyler & Francis Group, Inc., 2014.
  • [9] Shaoyan Guo, Huifu Xu, and Liwei Zhang. Stability analysis for mathematical programs with distributionally robust chance constraint. SIAM J. Optim (to appear), 2015.
  • [10] Christian Hess. Conditional expectation and martingales of random sets. Pattern Recognition, 32(9):1543–1567, 1999.
  • [11] Bjørn Sand Jensen, Jens Brehm Nielsen, and Jan Larsen. Bounded Gaussian process regression. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2013.
  • [12] Malte Kuss and Carl Rasmussen. Gaussian processes in reinforcement learning. Advances in neural information processing systems, 16, 2003.
  • [13] Markus Lange-Hegermann. Linearly constrained Gaussian processes with boundary conditions. In International Conference on Artificial Intelligence and Statistics, pages 1090–1098. PMLR, 2021.
  • [14] Guang Lin and Alexandre M Tartakovsky. An efficient, high-order probabilistic collocation method on sparse grids for three-dimensional flow and solute transport in randomly heterogeneous porous media. Advances in Water Resources, 32(5):712–722, 2009.
  • [15] Ziming Liu and Zheng Zhang. Quantum-inspired Hamiltonian Monte Carlo for Bayesian sampling, 2020.
  • [16] Andrés López-Lopera, François Bachoc, and Olivier Roustant. High-dimensional additive gaussian processes under monotonicity constraints. Advances in Neural Information Processing Systems, 35:8041–8053, 2022.
  • [17] Andrés F López-Lopera, François Bachoc, Nicolas Durrande, and Olivier Roustant. Finite-dimensional Gaussian approximation with linear inequality constraints. SIAM/ASA Journal on Uncertainty Quantification, 6(3):1224–1255, 2018.
  • [18] Hassan Maatouk and Xavier Bay. Gaussian process emulators for computer experiments with inequality constraints. Mathematical Geosciences, 49:557–582, 2017.
  • [19] Hassan Maatouk, Olivier Roustant, and Yann Richet. Cross-validation estimations of hyper-parameters of Gaussian processes with inequality constraints. Procedia Environmental Sciences, 27:38–44, 2015.
  • [20] Andrew Pensoneault, Xiu Yang, and Xueyu Zhu. Nonnegativity-enforced Gaussian process regression. Theoretical and Applied Mechanics Letters, 10(3):182–187, 2020.
  • [21] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using gaussian processes. Journal of Computational Physics, 348:683–693, 2017.
  • [22] Carl Edward Rasmussen, Christopher KI Williams, et al. Gaussian processes for machine learning, volume 1. Springer, 2006.
  • [23] Jaakko Riihimäki and Aki Vehtari. Gaussian processes with monotonicity information. In Proceedings of the thirteenth international conference on artificial intelligence andf statistics, pages 645–652. JMLR Workshop and Conference Proceedings, 2010.
  • [24] Mathieu Salzmann and Raquel Urtasun. Implicitly constrained Gaussian process regression for monocular non-rigid pose estimation. Advances in neural information processing systems, 23, 2010.
  • [25] Michael L Stein. Asymptotically efficient prediction of a random field with a misspecified covariance function. The Annals of Statistics, 16(1):55–63, 1988.
  • [26] Laura P Swiler, Mamikon Gulian, Ari L Frankel, Cosmin Safta, and John D Jakeman. A survey of constrained Gaussian process regression: Approaches and implementation challenges. Journal of Machine Learning for Modeling and Computing, 1(2), 2020.
  • [27] Xiu Yang, David Barajas-Solano, Guzel Tartakovsky, and Alexandre M Tartakovsky. Physics-informed cokriging: A Gaussian-process-regression-based multifidelity method for data-model convergence. Journal of Computational Physics, 395:410–431, 2019.
  • [28] Xiu Yang, Guzel Tartakovsky, and Alexandre M Tartakovsky. Physics information aided kriging using stochastic simulation models. SIAM Journal on Scientific Computing, 43(6):A3862–A3891, 2021.
  • [29] Hao Zhang. Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. Journal of the American Statistical Association, 99(465):250–261, 2004.