Bayesian grey-box identification
of nonlinear convection effects in heat transfer dynamics

Wouter M. Kouw, Caspar Gruijthuijsen, Lennart Blanken, Enzo Evers, Timothy Rogers Kouw and Blanken are with TU Eindhoven. Gruijthuijsen, Evers and Blanken are with Sioux Technologies B.V., Eindhoven, the Netherlands, and are supported by the Intelligent Motion Control consortium project. Rogers is with Sheffield University, Sheffield, United Kingdom, and is supported by EPSRC u/w/002140/1. Corresponding email: [email protected]

Abstract

We propose a computational procedure for identifying convection in heat transfer dynamics. The procedure is based on a Gaussian process latent force model, consisting of a white-box component (i.e., known physics) for the conduction and linear convection effects and a Gaussian process that acts as a black-box component for the nonlinear convection effects. States are inferred through Bayesian smoothing and we obtain approximate posterior distributions for the kernel covariance function’s hyperparameters using Laplace’s method. The nonlinear convection function is recovered from the Gaussian process states using a Bayesian regression model. We validate the procedure by simulation error using the identified nonlinear convection function, on both data from a simulated system and measurements from a physical assembly.

I Introduction

Motion control systems are becoming ever more demanding in terms of throughput and accuracy. It has now become highly important to consider thermally induced deformations that cause slow drifts during positioning [1, 2]. The challenge in modelling these deformations lies in capturing the effects of convection on heat transfer. A pure model-driven approach would describe the airflow surrounding a motion control system in detail, but this requires extensive expert knowledge and considerable computational resources. Recently, hybrid model- (white-box) / data-driven (black-box) approaches have been proposed to approximate airflow dynamics, such as physics-informed neural networks [3]. We propose such a hybrid model- / data-driven method, i.e., a grey-box, where the effects of conduction, linear convection and heat input are expressed explicitly and the nonlinear effects of convection are captured by a regression model.

Our method is based on Gaussian process latent force models (GPLFM), which were originally developed to estimate unmeasured forces in mechanical systems [4, 5, 6, 7]. For example, one could identify the strength of the restoring force in a nonlinear oscillator [8]. But GPLFMs have been applied to other domains as well, such as identifying thermal dynamics in structures [9]. We extend this work by tackling the effects of convection in heat transfer dynamics. GPLFMs are based on the conversion of temporal Gaussian processes (GP) to stochastic differential equations, allowing them to be incorporated into state-space models [10, 11]. This step ensures that GPLFMs are much faster grey-box identification methods than physics-informed neural networks and that uncertainty estimates on predictions can be calculated easily.

Our contribution consists of the application of a GPLFM to identify convection in heat transfer dynamics, quantification of uncertainty over the identified nonlinear convection function and validation of the proposed procedure on both simulated data and physical measurements.

II Model specification

We briefly review heat transfer dynamics and GPLFMs. We then demonstrate how the two may be combined.

II-A Heat transfer dynamics

Consider a lumped-element model of a thermal system with $D$ components. Let $T_{i}(t)\in\mathbb{R}$ be the temperatures at time $t$ in each of the components, $T_{a}(t)\in\mathbb{R}$ be the ambient temperature and $u_{i}(t)\in\mathbb{R}^{+}$ be heat input. The evolution of these temperatures is assumed to be governed predominantly by conduction, convection and heat input:

\displaystyle M\dot{T}=\underbrace{KT}_{\textrm{conduction}}+\underbrace{h\big% {(}T,T_{a}\big{)}}_{\textrm{convection}}+\underbrace{u}_{\textrm{input}}\,.

(1)

The dependence on $t$ will often be omitted for the sake of brevity in the remainder of the article. The diagonal matrix $M$ represents the mass $m_{i}\in\mathbb{R}^{+}$ of each component multiplied with the specific heat capacity $\mathrm{c}_{p,i}\in\mathbb{R}^{+}$ of the component’s material. The conductance matrix $K$ describes how heat is shared between components, i.e., $k_{i,j}\in\mathbb{R}^{+}$ indicates how much heat is conducted from component $i$ to component $j$ .

Convection is the loss of heat due to exchange with the medium surrounding the mechanical system. It can be split into a linear and a nonlinear term:

\displaystyle\underbrace{h\big{(}T_{i},T_{a}\big{)}}_{\textrm{total cooling}}=% \underbrace{h_{a}a_{i}\big{(}T_{a}-T_{i}\big{)}}_{\textrm{linear convection}}+% \underbrace{r\big{(}T_{i},T_{a}\big{)}}_{\textrm{nonlinear convection}}\,.

(2)

The linear convection term describes the loss of heat proportional to the surface area $a_{i}$ times the difference between the temperature of the material $T_{i}$ and the ambient temperature $T_{a}$ . We assume uniform cooling over the surface with a heat transfer coefficient $h_{a}$ . The remainder is a nonlinear function of the temperature of the material and the ambient temperature. We call this $r(T_{i},T_{a})$ the nonlinear convection function. Proper physics-driven modelling would employ computational fluid dynamics and include explicit dependencies on the airflow of the surrounding environment. Here, we do not model these terms but will instead capture the effect of the nonlinear convection function, i.e., the output of $r(T,T_{a})$ , using a black-box function approximator (c.f. Section II-C).

The ambient temperature is assumed to be measured and will be treated as an input. If we incorporate the linear convection term into the governing equations, they become:

\displaystyle\dot{T}=M^{-1}FT+M^{-1}r\big{(}T_{a},T\big{)}+M^{-1}G\bar{u}\,,

(3)

where $\bar{u}=[T_{a}\ u_{1}\ \dots\ u_{D}]^{\intercal}$ and

\displaystyle F=K-\begin{bmatrix}h_{a}a_{1}&&\\ &\!\ddots\!&\\ &&h_{a}a_{D}\end{bmatrix},\,G=\begin{bmatrix}\begin{matrix}h_{a}a_{1}\\ \vdots\\ h_{a}a_{D}\end{matrix}&I\ \end{bmatrix}.

(4)

II-B Temporal Gaussian Processes

Temporal Gaussian processes describe distributions over functions of time [12]. We shall use these to estimate the state $\rho(t)=r(T(t),T_{a}(t))$ over time, and later fit a regression model from $T(t)$ and $T_{a}(t)$ to $\rho(t)$ (see Section III-C) [5, 6]. To do so, we must construct a dynamical form of a temporal Gaussian process. Consider a GP prior distribution over functions $\rho(t)$

\displaystyle p\big{(}\rho(t);\psi\big{)}=\mathcal{GP}\big{(}\rho(t)\>|\>0,% \kappa_{\psi}(t,t^{\prime})\big{)}\,,

(5)

with kernel covariance function $\kappa$ with hyperparameters $\psi$ . We have chosen a zero-mean prior distribution because we have no reason to believe our function of interest has a systematic offset. For the kernel covariance function, we select the lowest order of the Whittle-Matérn class, namely the exponential covariance function

\displaystyle\kappa_{\psi}(t,t^{\prime})=\gamma^{2}\exp\Big{(}-\frac{\sqrt{3}}% {l}\,|\,t-t^{\prime}\,|\Big{)}\,,

(6)

with scale hyperparameters $\psi=(\gamma,l)$ [12]. Note that this kernel is stationary, i.e., only a function of $t-t^{\prime}$ . The choice for an exponential covariance is based on two qualitative features: it is flexible (we make no strong smoothness assumptions) and it leads to a scalar dynamical systems (see also Sec. V).

The exponential kernel covariance function is dual to the power spectral density [10]:

\displaystyle\mathcal{K}_{\psi}(\omega)\propto\frac{2\lambda\gamma^{2}}{(% \lambda^{2}+\omega^{2})}\,,

(7)

where $\lambda=\sqrt{3}/l$ . Factorization of the denominator produces a transfer function that serves as the state transition in the stochastic differential equation (SDE) [10];

\displaystyle\dot{\rho}(t)=-\lambda\rho(t)+w(t)\,.

(8)

The white noise process $w(t)$ has a spectral density equal to the numerator of Eq. 7, $v_{c}=2\lambda\gamma^{2}$ .

II-C Augmented discrete-time model

In this section, we will augment the system of differential equations (Eq. 3) with the SDE representation of the Gaussian process. Note that the nonlinear convection function $r(T,T_{a})$ is a vector and that Eq. 8 describes a scalar function. We therefore pose an independent GP SDE $\rho(t)$ for every component:

\displaystyle r(T(t),T_{a}(t))=\begin{bmatrix}r(T_{1}(t),T_{a}(t))\\ \vdots\\ r(T_{D}(t),T_{a}(t))\end{bmatrix}\approx\begin{bmatrix}\rho_{1}(t)\\ \vdots\\ \rho_{D}(t)\end{bmatrix}\,.

(9)

Using $\rho(t)=[\rho_{1}(t)\dots\rho_{D}(t)]^{\intercal}$ , we can reformulate Eq. 3 as an augmented system:

\displaystyle\begin{bmatrix}\dot{T}\\ \dot{\rho}\end{bmatrix}=\begin{bmatrix}M^{\text{-}1}F&M^{\text{-}1}\\ 0&-\lambda I\end{bmatrix}\begin{bmatrix}T\\ \rho\end{bmatrix}+\begin{bmatrix}M^{\text{-}1}G\\ 0\end{bmatrix}\bar{u}+\begin{bmatrix}0\\ 1\end{bmatrix}w,

(10)

where $\lambda$ and $\gamma$ are shared across all GP states.

We shall discretize the system using a regular sampling interval $\Delta t=t_{k}-t_{k-1}$ for all steps $k$ . Let $x_{k}=[T_{1k}\dots T_{Dk}\ \rho_{1k}\dots\rho_{Dk}]^{\intercal}$ . The system then becomes:

\displaystyle x_{k}=Ax_{k\text{-}1}+B\bar{u}_{k}+w_{k}\,,

(11)

with state transition and control matrix

\displaystyle A=\exp\big{(}\Delta t\begin{bmatrix}M^{\text{-}1}F&M^{\text{-}1}% \\ 0&-\lambda I\end{bmatrix}\big{)}\,,\ B=\Delta t\begin{bmatrix}\,M^{\text{-}1}G% \\ 0\end{bmatrix}\,.

(12)

The discrete-time noise $w_{k}$ is zero-mean Gaussian distributed, with covariance matrix

\displaystyle Q=\int_{0}^{\Delta t}\exp(At)\begin{bmatrix}0\\ I\end{bmatrix}(v_{c}I)\begin{bmatrix}0\\ I\end{bmatrix}^{\intercal}\exp(At)^{\intercal}dt\,.

(13)

We approximate this integral using a first-order Taylor approximation of the matrix exponential: $\exp(At)\approx I+At$ . This provides an analytic expression that can be easily differentiated (important for Sec. III-B). The approximation (13) evaluates to

\displaystyle Q

\displaystyle\approx\begin{bmatrix}Q_{11}&Q_{12}\\ Q_{21}&Q_{22}\end{bmatrix}\,,

(14)

with block matrices


$\displaystyle Q_{11}$	$\displaystyle=\frac{1}{3}\Delta t^{3}v_{c}M^{-1}$	(15a)
$\displaystyle Q_{12}$	$\displaystyle=Q_{21}=(\frac{1}{2}\Delta t^{2}-\frac{1}{3}\lambda\Delta t^{3})v% _{c}M^{-1}$	(15b)
$\displaystyle Q_{22}$	$\displaystyle=(\Delta t-\lambda\Delta t^{2}+\frac{1}{3}\lambda^{2}\Delta t^{3}% )v_{c}I\,.$	(15c)

Note that both $A$ and $Q$ depend on the hyperparameters $\lambda$ and $\gamma$ , and will henceforth be referred to as $A_{\psi}$ and $Q_{\psi}$ .

II-D Probabilistic state-space model

Our goal will be to infer the temperature and GP states, for which we required a probabilistic state-space model. If we integrate out the process noise instance $w_{k}$ , then the distribution of the next state $x_{k+1}$ is Gaussian:

\displaystyle p(x_{k}\>|\>x_{k\text{-}1},\bar{u}_{k};\psi)=\mathcal{N}(x_{k}\>% |\>A_{\psi}x_{k\text{-}1}+B\bar{u}_{k},Q_{\psi})\,.

(16)

We assume to have noisy measurements of the temperatures:

\displaystyle p(y_{k}\>|\>x_{k})=\mathcal{N}(y_{k}\>|\>Cx_{k},R)\,,

(17)

where $C$ indicates which components are measured and $R$ is the measurement noise covariance matrix.

The prior distribution of the temperatures is assumed to be Gaussian distributed;

\displaystyle p(T_{0})=\mathcal{N}\big{(}T_{0}\>|\>\hat{m}_{0},\hat{S}_{0}\big% {)}\,.

(18)

For the SDE of the temporal Gaussian process to be stable, it must start from the steady-state solution of the process. Setting the time derivatives of the state distribution’s parameters to $0$ yields a stationary mean of $0$ and, through Lyapunov’s equation, a stationary variance of $\gamma^{2}$ [13]. The prior state distribution for the GP states thus becomes:

\displaystyle p(\rho_{i0};\psi)=\mathcal{N}\big{(}\rho_{i0}\>|\>0,\gamma^{2}% \big{)}\,.

(19)

Combining the heat transfer model and GP priors gives:

\displaystyle p(x_{0};\psi)=\mathcal{N}\Big{(}\begin{bmatrix}T_{0}\\ \rho_{0}\end{bmatrix}\>|\>\underbrace{\begin{bmatrix}\hat{m}_{0}\\ 0\end{bmatrix}}_{m_{0}},\underbrace{\begin{bmatrix}\hat{S}_{0}&0\\ 0&\gamma^{2}I\end{bmatrix}}_{S_{0}}\Big{)}\,.

(20)

The complete grey-box probabilistic model for a time-series of length $N$ is:

	$\displaystyle p(y_{1:N},$	$\displaystyle\,x_{0:N}\>\|\>\bar{u}_{1:N};\psi)=$		(21)
		$\displaystyle\quad p(x_{0};\psi)\prod_{k=1}^{N}p(y_{k}\>\|\>x_{k})p(x_{k}\>\|\>x% _{k\text{-}1},\bar{u}_{k};\psi)\,.$

We shall use this model to infer marginal posterior distributions over states $x_{k}$ and hyperparameters $\psi$ .

III Inference

The inference procedure has two phases: firstly, states and hyperparameters are estimated, and secondly, nonlinear convection is estimated as a function of temperature.

III-A State estimation

States are inferred using the Bayesian smoothing equations [14]. These start with a filtering step, moving from $k=1$ to $k=N$ . Let $\mathcal{D}_{k}=\{y_{i},\bar{u}_{i}\}_{i=1}^{k}$ be the input-output pairs up to time $k$ . The prediction step is the marginalization of the Gaussian state transition over the previous Gaussian marginal state posterior:

$\displaystyle p$	$\displaystyle(x_{k}\>\|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})$
	$\displaystyle=\!\int\!p(x_{k}\>\|\>x_{k\text{-}1},\bar{u}_{k};\hat{\psi})\,p(x_% {k\text{-}1}\>\|\>\mathcal{D}_{k\text{-}1};\hat{\psi})\,\mathrm{d}x_{k\text{-}1}$	(22)
	$\displaystyle=\mathcal{N}\big{(}x_{k}\>\|\>\bar{m}_{k},\bar{S}_{k}\big{)}\,,$	(23)

where the predictive mean and variance are

\displaystyle\bar{m}_{k}=A_{\hat{\psi}}m_{k\text{-}1}+B\bar{u}_{k}\,,\quad\bar% {S}_{k}=A_{\hat{\psi}}S_{k\text{-}1}A_{\hat{\psi}}^{\intercal}+Q_{\hat{\psi}}\,.

(24)

Note that the kernel hyperparameters are fixed to a point estimate, $\hat{\psi}$ (see Sec. III-B).

In the correction step, we apply Bayes’ rule using the predicted marginal state as prior distribution:

\displaystyle\underbrace{p(x_{k}\>|\>\mathcal{D}_{k};\hat{\psi})}_{\text{% posterior}}

\displaystyle=\frac{\overbrace{p(y_{k}\>|\>x_{k})}^{\text{likelihood}}}{% \underbrace{p(y_{k}\>|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1})}_{\text{evidence% }}}\,\underbrace{p(x_{k}\>|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})}% _{\text{prior}}\,,

(25)

where the evidence is

\displaystyle p(y_{k}|\bar{u}_{k},\mathcal{D}_{k\text{-}1})\!=\!\int\!p(y_{k}|% x_{k})p(x_{k}|\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})\mathrm{d}x_{k}\,.

(26)

For our Gaussian distributed likelihood and Gaussian distributed predicted state distribution, this yields a Gaussian state posterior $\mathcal{N}(x_{k}\>|\>\tilde{m}_{k},\tilde{S}_{k})$ with parameters [14]:


$\displaystyle\tilde{m}_{k}$	$\displaystyle=\bar{m}_{k}+\bar{S}_{k}C^{\intercal}(C\bar{S}_{k}C^{\intercal}+R% )^{-1}(y_{k}-C\bar{m}_{k})$	(27a)
$\displaystyle\tilde{S}_{k}$	$\displaystyle=\bar{S}_{k}-\bar{S}_{k}C^{\intercal}(C\bar{S}_{k}C^{\intercal}+R% )^{-1}C\bar{S}_{k}^{\intercal}\,.$	(27b)

Smoothing consists of correcting these state estimates based on future data [14]:

	$\displaystyle p(x_{k}\>\|\>\mathcal{D}_{N};\hat{\psi})=p(x_{k}\>\|\>\mathcal{D}_% {k};\hat{\psi})$		(28)
	$\displaystyle\quad\cdot\int\frac{p(x_{k+1}\>\|\>x_{k},\bar{u}_{k+1};\hat{\psi})% p(x_{k+1}\>\|\>\mathcal{D}_{N};\hat{\psi})}{p(x_{k+1}\>\|\>\mathcal{D}_{k};\hat{% \psi})}\mathrm{d}x_{k+1}\,.$

These corrections are executed by the following updates, running backwards from $k=N\dots 1$ :


$\displaystyle G_{k}$	$\displaystyle=\tilde{S}_{k}A_{\hat{\psi}}^{\intercal}\bar{S}_{k+1}^{-1}$	(29a)
$\displaystyle m_{k}$	$\displaystyle=\tilde{m}_{k}+G_{k}\big{(}m_{k+1}-\bar{m}_{k+1}\big{)}$	(29b)
$\displaystyle S_{k}$	$\displaystyle=\tilde{S}_{k}+G_{k}\big{(}S_{k+1}-\bar{S}_{k+1}\big{)}G_{k}^{% \intercal}\,,$	(29c)

where $G_{k}$ represents the strength of the correction by future observations.

To obtain state estimates, the runtime is $O(D^{3}N)$ due to inversions of $D\times D$ covariance matrices. This algorithm therefore excels in situations where $D$ is small (i.e., few components in a lumped-element model) and $N$ is large (long time-series).

III-B Hyperparameter estimation

Tuning GP kernel hyperparameters can be challenging when the landscape is multi-modal or contains regions of divergence. Maximum likelihood estimation in those cases may lead to poor solutions [15]. Here we propose a Laplace approximation of the posterior distribution, for two reasons: one, the use of prior distributions may enforce convergence, and two, the approximate posterior variance provides a quick method for assessing the quality of selected hyperparameters.

We assume the hyperparameters are independent of each other, so $p(\psi)=p(\gamma)p(l)$ . Since the length scales are strictly positive, we choose to employ Gamma distributed prior distributions:

\displaystyle p(\gamma)=\mathcal{G}(\gamma\>|\>\alpha_{\gamma},\beta_{\gamma})% \,,\quad p(l)=\mathcal{G}(l\>|\>\alpha_{l},\beta_{l})\,,

(30)

We then form a Gaussian approximation of the posterior distribution whose mean $\hat{\psi}$ is the maximum a posteriori and whose precision matrix $\Lambda$ is based on the curvature at the maximum [16]:

	$\displaystyle\hat{\psi}$	$\displaystyle=\underset{\psi\in\Psi}{\arg\max}\ \ln p(y_{1:N}\>\|\>\bar{u}_{1:N% };\psi)p(\psi)$		(31)
	$\displaystyle\Lambda_{ij}$	$\displaystyle=-\frac{\partial^{2}}{\partial\psi_{i}\,\partial\psi_{j}}\ln p(y_% {1:N}\>\|\>\bar{u}_{1:N};\psi)p(\psi)\Big{\|}_{\psi=\hat{\psi}}\,.$		(32)

Note that the Hessian only has to be computed once.

III-C Static nonlinearity estimation

State estimation will give us estimates of $T_{k}$ and $\rho_{k}$ . But $\rho_{k}$ only represents the value of $r(T_{i},T_{a})$ at time $t_{k}$ . To obtain a functional form for $r(\cdot,\cdot)$ , we employ a regression model that takes as inputs the estimates of $T_{ik}$ (i.e., means $m_{ik}$ for $i=1,\dots D$ ) as well as the measurements of $T_{ak}$ , and as outputs takes the estimates of $\rho_{ik}$ (i.e., the means $m_{jk}$ for $j=i+D$ ). We consider specifically a Bayesian polynomial regression model because low-order polynomials typically suffice to capture the relatively slow change in convection over temperatures and because we aim to quantify uncertainty on the function estimate. Let $\phi:\mathbb{R}^{2}\rightarrow\mathbb{R}^{D_{\phi}}$ be a polynomial basis expansion function, map** a cell and ambient temperature to a $D_{\phi}$ -dimensional space, and $\theta_{i}\in\mathbb{R}^{D_{\phi}}$ be regression coefficients. We consider a likelihood of the form,

\displaystyle p(m_{jk}|m_{ik},T_{ak},\theta_{i})\!=\!\mathcal{N}(m_{jk}\>|\>% \theta_{i}^{\intercal}\phi(m_{ik},T_{ak}),\sigma_{j}^{2})\,,

(33)

where $\sigma_{j}^{2}=\frac{1}{N}\sum_{k=1}^{N}S_{jjk}$ is the average variance of the GP estimated states. Our prior distribution on the regression weights $\theta_{i}$ is:

\displaystyle p(\theta_{i})=\mathcal{N}(\theta_{i}\>|\>\mu_{i0},\Sigma_{i0})\,.

(34)

This Gaussian prior distribution is conjugate and we may thus obtain a posterior distribution exactly [16],

	$\displaystyle p(\theta_{i}\>\|\>\mathcal{D}_{N})$	$\displaystyle=\frac{p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>\|\>m_{ik},T_{ak},% \theta_{i})}{\int p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>\|\>m_{ik},T_{ak},% \theta_{i})\mathrm{d}\theta_{i}}$		(35)
		$\displaystyle=\mathcal{N}(\theta_{i}\>\|\>\mu_{i},\Sigma_{i})\,,$		(36)

where, using $\phi_{k}=\phi(m_{ik},T_{ak})$ , the parameters are:

	$\displaystyle\Sigma_{i}$	$\displaystyle=\big{(}\Sigma_{i0}^{-1}\!+\!\sigma_{j}^{-2}\sum_{k=1}^{N}\phi_{k% }\phi_{k}^{\intercal}\big{)}^{-1}$		(37)
	$\displaystyle\mu_{i}$	$\displaystyle=\Sigma_{i}\big{(}\sigma_{j}^{-2}\sum_{k=1}^{N}\phi_{k}m_{jk}\!+% \!\Sigma_{i0}^{-1}\mu_{i0}\big{)}\,.$		(38)

Given a posterior distribution over the regression coefficients, we can derive a predictive distribution over new values of the nonlinear convection function. These are essentially predictions for the values of $r(T_{i},T_{a})$ but with a variance parameter indicating the amount of uncertainty originating from the estimated values of $\rho_{i}$ and $\theta_{i}$ [16]. Let $\phi_{*}=\phi(m_{i*},T_{a*})$ . Then, the posterior predictive is:

$\displaystyle p(m_{j*}\>\|\>$	$\displaystyle m_{i},T_{a},\mathcal{D}_{N})$
	$\displaystyle=\int p(m_{j}\>\|\>m_{i},T_{a*},\theta_{i})p(\theta_{i}\>\|\>% \mathcal{D}_{N})\mathrm{d}\theta_{i}$	(39)
	$\displaystyle=\mathcal{N}(m_{j}\>\|\>\mu_{i}^{\intercal}\phi_{},\,\phi_{}^{% \intercal}\Sigma_{i}\phi_{}+\sigma_{j}^{2}\big{)}\,.$	(40)

For ease of reference, we refer to the mean of the posterior predictive as:

\displaystyle\hat{r}(T_{i*},T_{a*})=\mu_{i}^{\intercal}\phi(T_{i*},T_{a*})

(41)

for an arbitrary numerical temperature $T_{i*}$ .

IV Experiments

We perform experiments¹¹1Code: github.com/biaslab/CCTA2024-BIDconvection on a heated rod demonstrator involving 3 aluminum blocks separated by insulating discs (see Figure 1 top). The first experiment involves a simulation of the system and the second experiment involves measurements from the physical device. Only the first block receives heat input. The conductance matrix is of the form:

\displaystyle K=\begin{bmatrix}-k_{12}&k_{12}&0\\ k_{12}&-(k_{12}+k_{23})&k_{23}\\ 0&k_{23}&-k_{23}\end{bmatrix}\,,

(42)

whose $k_{ij}$ are assumed to be known or estimated separately using a maximum likelihood or MAP estimator. Further details are explained in the following two subsections.

Refer to caption — Figure 1: (Top) Photo of heated rod demonstrator, with 3 blocks, 2 insulation discs, 13 temperature sensors and 1 heater. (Bottom left) Data simulated according to Eq. 3 with parameters described in Sec. IV-A. (Bottom right) Data measured from demonstrator with parameters described in Sec. IV-B.

IV-A Simulated data

In the simulated system, each aluminum block is modeled as a uniform component. Conductance of the nylon pads is set to $k_{12}=k_{23}=10\,\mathrm{W}/(\mathrm{m}\mathrm{K}$ ) and the mass times specific heat capacity parameter is set to $m_{i}\mathrm{c}_{p,i}=1000\,\mathrm{J}/\mathrm{K}$ for all components. The ambient temperature is fixed to $21\,\degree\mathrm{C}$ , the heat transfer coefficient $h_{a}$ is set to $2.0\,\mathrm{W}/(\mathrm{m}^{2}\mathrm{K})$ and the outer surface areas of all the blocks are set to $a_{i}=1.0\,\mathrm{m}^{2}$ . The first block receives $100\,\mathrm{W}$ of heating after $100$ seconds. We used a simple surrogate for the nonlinear convection function: $r(T_{a},T_{i})=(T_{a}-T_{i})^{3}/100$ . Its main feature is that when the block temperature is much higher than the ambient temperature, convection will cause the block to cool rapidly. We simulated the temperatures forward in time for $1000$ steps of $\Delta t=1$ seconds, using DifferentialEquations.jl [17]. Finally, we add zero-mean Gaussian distributed noise with a variance of $10^{-3}$ to the simulated states, to generate artificial measurements. The state prior distribution’s parameters are $\bar{m}_{0}=[21\ 21\ 21]$ and $\bar{S}_{0}=I$ . Bayesian smoothing was implemented with RxInfer.jl, which also produces the marginal likelihood in Eq. 31 [18]. Derivatives were computed with Optim.jl [19].

The top row in Figure 2 shows results using the proposed method. The first three subplots (from left-to-right) show the value of $r(T_{i},\,T_{a}=21)$ for the temperatures that the blocks experienced during the simulation (solid lines) as well as the GPLFM’s state estimates (red = $T_{1}$ , blue = $T_{2}$ and orange = $T_{3}$ ; marker size proportional to state variance). The black dotted line with ribbon shows the mean and standard deviation of the posterior predictive distribution in Eq. 40, under a third-order Bayesian regression model with $\mu_{i0}=[0\,0\,0]^{\intercal},\Sigma_{i0}=10^{3}I$ . To test this estimate of the convection function, we simulated the system for $1000$ seconds under starting temperatures of $25\,\degree\mathrm{C}$ and with heat input of 100 $\mathrm{W}$ from $120$ to $600$ seconds. The fourth subplot shows the true simulation (dotted lines) and a simulation using the mean of the posterior predictive (i.e., Eq.41; solid lines). The fifth subplot shows the error between the simulations, producing an overall mean-squared error over time of $0.011$ .

We compare the GPLFM with an offline estimate of the effect of the nonlinear convection function. First, run the state-space model without a nonlinear convection term on the simulated temperature measurements (note that this requires a small noise injection of $Q=10^{-8}I$ ). Then, calculate the post-fit residuals, i.e., $y_{ik}-m_{ik}$ for all blocks $i$ and time $k$ . Lastly, fit a standard third-order polynomial regression model on the residuals. The first three subplots on the bottom row of Figure 2 show the residuals as a function of the temperature state estimates (red = block 1, blue = block 2, orange = block 3), with the estimated polynomial overlaid (black dotted line). Note that the polynomials fit well. The fourth subplot shows the effect of simulating the temperatures forward with the polynomial regression instead of the true convection function and the fifth subplot shows the error between the simulation with the true convection function and the regression function. The error is much larger in this case (a mean-squared error over time of $6.473$ ), which highlights the value of the GPLFM in estimating the effect of nonlinear convection dynamically.

For the hyperparameter optimization of the GPLFM, the prior distributions had parameters $\alpha_{l}=\alpha_{\gamma}=5.0$ and $\beta_{l}=\beta_{\gamma}=0.1$ . These favour estimates in the region of $1$ to $100$ . Figure 3 show the prior and posterior distributions for each parameter. For $l$ , the posterior distribution moves to much larger length scales but also becomes wider indicating that there is a high degree of uncertainty on this estimate. For $\gamma$ , the posterior concentrates sharply around roughly $20$ , with high certainty.

IV-B Measured data

In the physical device, block 1 weighs $m_{1}=0.276$ kg, block 2 weighs $m_{2}=0.160$ kg and block 3 weighs $m_{3}=0.74$ kg. The specific heat capacity of aluminum is $c_{ip}=910\,\mathrm{J}/(\mathrm{kg}\mathrm{K})$ . There are 13 thermistors, with #1 to #5 on block 1, #6 to #10 on block 2 (sensor #7 malfunctioned), and #11 to #13 on block 3. The measurements within a block are highly similar, and we thus consider the middle ones as representative of the three components (#4 = block 1, #9 = block 2, and #12 = block 3). The conductance parameters of the pads as well as the heat transfer coefficient $h_{a}$ are identified during a separate experiment after reaching a steady-state, yielding $k_{12}=0.272\,\mathrm{W}/(\mathrm{m}\mathrm{K})$ , $k_{23}=0.218\,\mathrm{W}/(\mathrm{m}\mathrm{K})$ and $h_{a}=7.75\,\mathrm{W}/(\mathrm{m}^{2}\mathrm{K})$ . The surface areas of the blocks are $a_{1}=0.0066\,\mathrm{m}^{2}$ , $a_{2}=0.0055\,\mathrm{m}^{2}$ and $a_{3}=0.0037\,\mathrm{m}^{2}$ .

Figure 4 (top row) shows the GP states (marker size proportional to state variance) as a function of the temperature states (block 1 = red, block 2 = blue, block 3 = orange), with the mean of the posterior predictive distribution of the polynomial regression model overlaid (dotted black line). In general, the GP states are fairly smooth as a function of temperature and the polynomial fits well. Only the first block shows something which is not picked up by the polynomial regression model: a rapid drop when the system first starts deviating from the ambient temperature. The bottom row in the figure compares the open-loop simulation of the state-space model with (GPLFM; using the mean posterior predictive of the polynomial regression) and without an identified nonlinear convection term (SSM). The error is the difference between the temperature measurements and the predicted temperatures. Note that the simulation with the identified nonlinear convection term is closer to zero than without the term.

V Discussion

The value of being able to identify nonlinear convection quickly and with quantified uncertainty lies in the fact that it enables optimizing a subsequent step in a control process. For example, maintaining precision on a motion control system or optimizing the placement of cooling fans.

We chose the exponential kernel covariance function as it generates a scalar SDE. One could alternatively choose a higher-order Whittle-Matérn, but that leads to vector SDE’s. For example, a smoothness parameter of $3/2$ generates a two-dimensional system and $5/2$ generates a three-dimensional one [10]. Recall that the smoothing procedure scales cubically in state-space dimensionality, $O(D^{3}N)$ . $D$ is related to the dimensionality of the SDE $L$ through $D=(2+L)M$ . Increasing $L$ increases the computational cost drastically.

VI Conclusions

We proposed a procedure to estimate the effects of nonlinear convection in a lumped-element heat transfer dynamics model. The procedure is based on a state-space model with known conductance and linear convection effects, augmented with a Gaussian process. Through Bayesian smoothing, we obtain states for temperatures and through Laplace’s method, we obtain approximate posterior distributions for the GP kernel hyperparameters. We fitted a Bayesian polynomial regression model that predicts the GP states from the temperature states and ambient temperature measurements, and used it to simulate the effects of nonlinear convection.

References

[1] E. Evers, N. van Tuijl, R. Lamers, and T. Oomen, “Identifying thermal dynamics for precision motion control,” IFAC-PapersOnLine, vol. 52, no. 15, pp. 73–78, 2019.
[2] E. Evers, N. van Tuijl, R. Lamers, B. de Jager, and T. Oomen, “Fast and accurate identification of thermal dynamics for precision motion control: Exploiting transient data and additional disturbance inputs,” Mechatronics, vol. 70, p. 102401, 2020.
[3] S. Cai, Z. Wang, S. Wang, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks for heat transfer problems,” Journal of Heat Transfer, vol. 143, no. 6, p. 060801, 2021.
[4] M. Alvarez, D. Luengo, and N. D. Lawrence, “Latent force models,” in Artificial Intelligence and Statistics, pp. 9–16, PMLR, 2009.
[5] M. A. Alvarez, D. Luengo, and N. D. Lawrence, “Linear latent force models using Gaussian processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 2693–2705, 2013.
[6] S. Särkkä, M. A. Alvarez, and N. D. Lawrence, “Gaussian process latent force models for learning and stochastic control of physical systems,” IEEE Transactions on Automatic Control, vol. 64, pp. 2953–2960, 2018.
[7] T. J. Rogers, K. Worden, and E. J. Cross, “Bayesian joint input-state estimation for nonlinear systems,” Vibration, vol. 3, no. 3, pp. 281–303, 2020.
[8] T. J. Rogers and T. Friis, “A latent restoring force approach to nonlinear system identification,” Mechanical Systems and Signal Processing, vol. 180, p. 109426, 2022.
[9] S. Ghosh, S. Reece, A. Rogers, S. Roberts, A. Malibari, and N. R. Jennings, “Modeling the thermal dynamics of buildings: A latent-force-model-based approach,” ACM Transactions on Intelligent Systems and Technology, vol. 6, no. 1, pp. 1–27, 2015.
[10] J. Hartikainen and S. Särkkä, “Kalman filtering and smoothing solutions to temporal Gaussian process regression models,” in IEEE International Workshop on Machine Learning for Signal Processing, pp. 379–384, 2010.
[11] J. Hartikainen and S. Sarkka, “Sequential inference for latent force models,” arXiv:1202.3730, 2012.
[12] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning, vol. 1. Springer, 2006.
[13] S. Särkkä and A. Solin, Applied stochastic differential equations, vol. 10. Cambridge University Press, 2019.
[14] S. Särkkä and L. Svensson, Bayesian filtering and smoothing, vol. 17. Cambridge University Press, 2023.
[15] A. Svensson, J. Dahlin, and T. B. Schön, “Marginalizing Gaussian process hyperparameters using sequential Monte Carlo,” in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 477–480, 2015.
[16] K. P. Murphy, Machine learning: a probabilistic perspective. MIT press, 2012.
[17] C. Rackauckas and Q. Nie, “DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equations in Julia,” Journal of Open Research Software, vol. 5, no. 1, 2017.
[18] D. Bagaev, A. Podusenko, and B. de Vries, “Rxinfer: A Julia package for reactive real-time Bayesian inference,” Journal of Open Source Software, vol. 8, no. 84, p. 5161, 2023.
[19] P. Mogensen and A. Riseth, “Optim: A mathematical optimization package for Julia,” Journal of Open Source Software, vol. 3, 2018.

	$\displaystyle p(\theta_{i}\>\|\>\mathcal{D}_{N})$	$\displaystyle=\frac{p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>\|\>m_{ik},T_{ak},% \theta_{i})}{\int p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>\|\>m_{ik},T_{ak},% \theta_{i})\mathrm{d}\theta_{i}}$		(35)
		$\displaystyle=\mathcal{N}(\theta_{i}\>\|\>\mu_{i},\Sigma_{i})\,,$		(36)

$\displaystyle p(m_{j*}\>\|\>$	$\displaystyle m_{i},T_{a},\mathcal{D}_{N})$
	$\displaystyle=\int p(m_{j}\>\|\>m_{i},T_{a*},\theta_{i})p(\theta_{i}\>\|\>% \mathcal{D}_{N})\mathrm{d}\theta_{i}$	(39)
	$\displaystyle=\mathcal{N}(m_{j}\>\|\>\mu_{i}^{\intercal}\phi_{},\,\phi_{}^{% \intercal}\Sigma_{i}\phi_{}+\sigma_{j}^{2}\big{)}\,.$	(40)

Bayesian grey-box identification of nonlinear convection effects in heat transfer dynamics