Bayesian grey-box identification
of nonlinear convection effects in heat transfer dynamics

Wouter M. Kouw, Caspar Gruijthuijsen, Lennart Blanken, Enzo Evers, Timothy Rogers Kouw and Blanken are with TU Eindhoven. Gruijthuijsen, Evers and Blanken are with Sioux Technologies B.V., Eindhoven, the Netherlands, and are supported by the Intelligent Motion Control consortium project. Rogers is with Sheffield University, Sheffield, United Kingdom, and is supported by EPSRC u/w/002140/1. Corresponding email: [email protected]
Abstract

We propose a computational procedure for identifying convection in heat transfer dynamics. The procedure is based on a Gaussian process latent force model, consisting of a white-box component (i.e., known physics) for the conduction and linear convection effects and a Gaussian process that acts as a black-box component for the nonlinear convection effects. States are inferred through Bayesian smoothing and we obtain approximate posterior distributions for the kernel covariance function’s hyperparameters using Laplace’s method. The nonlinear convection function is recovered from the Gaussian process states using a Bayesian regression model. We validate the procedure by simulation error using the identified nonlinear convection function, on both data from a simulated system and measurements from a physical assembly.

I Introduction

Motion control systems are becoming ever more demanding in terms of throughput and accuracy. It has now become highly important to consider thermally induced deformations that cause slow drifts during positioning [1, 2]. The challenge in modelling these deformations lies in capturing the effects of convection on heat transfer. A pure model-driven approach would describe the airflow surrounding a motion control system in detail, but this requires extensive expert knowledge and considerable computational resources. Recently, hybrid model- (white-box) / data-driven (black-box) approaches have been proposed to approximate airflow dynamics, such as physics-informed neural networks [3]. We propose such a hybrid model- / data-driven method, i.e., a grey-box, where the effects of conduction, linear convection and heat input are expressed explicitly and the nonlinear effects of convection are captured by a regression model.

Our method is based on Gaussian process latent force models (GPLFM), which were originally developed to estimate unmeasured forces in mechanical systems [4, 5, 6, 7]. For example, one could identify the strength of the restoring force in a nonlinear oscillator [8]. But GPLFMs have been applied to other domains as well, such as identifying thermal dynamics in structures [9]. We extend this work by tackling the effects of convection in heat transfer dynamics. GPLFMs are based on the conversion of temporal Gaussian processes (GP) to stochastic differential equations, allowing them to be incorporated into state-space models [10, 11]. This step ensures that GPLFMs are much faster grey-box identification methods than physics-informed neural networks and that uncertainty estimates on predictions can be calculated easily.

Our contribution consists of the application of a GPLFM to identify convection in heat transfer dynamics, quantification of uncertainty over the identified nonlinear convection function and validation of the proposed procedure on both simulated data and physical measurements.

II Model specification

We briefly review heat transfer dynamics and GPLFMs. We then demonstrate how the two may be combined.

II-A Heat transfer dynamics

Consider a lumped-element model of a thermal system with D𝐷Ditalic_D components. Let Ti(t)subscript𝑇𝑖𝑡T_{i}(t)\in\mathbb{R}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ∈ blackboard_R be the temperatures at time t𝑡titalic_t in each of the components, Ta(t)subscript𝑇𝑎𝑡T_{a}(t)\in\mathbb{R}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) ∈ blackboard_R be the ambient temperature and ui(t)+subscript𝑢𝑖𝑡superscriptu_{i}(t)\in\mathbb{R}^{+}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT be heat input. The evolution of these temperatures is assumed to be governed predominantly by conduction, convection and heat input:

MT˙=KTconduction+h(T,Ta)convection+uinput.𝑀˙𝑇subscript𝐾𝑇conductionsubscript𝑇subscript𝑇𝑎convectionsubscript𝑢input\displaystyle M\dot{T}=\underbrace{KT}_{\textrm{conduction}}+\underbrace{h\big% {(}T,T_{a}\big{)}}_{\textrm{convection}}+\underbrace{u}_{\textrm{input}}\,.italic_M over˙ start_ARG italic_T end_ARG = under⏟ start_ARG italic_K italic_T end_ARG start_POSTSUBSCRIPT conduction end_POSTSUBSCRIPT + under⏟ start_ARG italic_h ( italic_T , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT convection end_POSTSUBSCRIPT + under⏟ start_ARG italic_u end_ARG start_POSTSUBSCRIPT input end_POSTSUBSCRIPT . (1)

The dependence on t𝑡titalic_t will often be omitted for the sake of brevity in the remainder of the article. The diagonal matrix M𝑀Mitalic_M represents the mass mi+subscript𝑚𝑖superscriptm_{i}\in\mathbb{R}^{+}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT of each component multiplied with the specific heat capacity cp,i+subscriptc𝑝𝑖superscript\mathrm{c}_{p,i}\in\mathbb{R}^{+}roman_c start_POSTSUBSCRIPT italic_p , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT of the component’s material. The conductance matrix K𝐾Kitalic_K describes how heat is shared between components, i.e., ki,j+subscript𝑘𝑖𝑗superscriptk_{i,j}\in\mathbb{R}^{+}italic_k start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT indicates how much heat is conducted from component i𝑖iitalic_i to component j𝑗jitalic_j.

Convection is the loss of heat due to exchange with the medium surrounding the mechanical system. It can be split into a linear and a nonlinear term:

h(Ti,Ta)total cooling=haai(TaTi)linear convection+r(Ti,Ta)nonlinear convection.subscriptsubscript𝑇𝑖subscript𝑇𝑎total coolingsubscriptsubscript𝑎subscript𝑎𝑖subscript𝑇𝑎subscript𝑇𝑖linear convectionsubscript𝑟subscript𝑇𝑖subscript𝑇𝑎nonlinear convection\displaystyle\underbrace{h\big{(}T_{i},T_{a}\big{)}}_{\textrm{total cooling}}=% \underbrace{h_{a}a_{i}\big{(}T_{a}-T_{i}\big{)}}_{\textrm{linear convection}}+% \underbrace{r\big{(}T_{i},T_{a}\big{)}}_{\textrm{nonlinear convection}}\,.under⏟ start_ARG italic_h ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT total cooling end_POSTSUBSCRIPT = under⏟ start_ARG italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT linear convection end_POSTSUBSCRIPT + under⏟ start_ARG italic_r ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT nonlinear convection end_POSTSUBSCRIPT . (2)

The linear convection term describes the loss of heat proportional to the surface area aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT times the difference between the temperature of the material Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the ambient temperature Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. We assume uniform cooling over the surface with a heat transfer coefficient hasubscript𝑎h_{a}italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. The remainder is a nonlinear function of the temperature of the material and the ambient temperature. We call this r(Ti,Ta)𝑟subscript𝑇𝑖subscript𝑇𝑎r(T_{i},T_{a})italic_r ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) the nonlinear convection function. Proper physics-driven modelling would employ computational fluid dynamics and include explicit dependencies on the airflow of the surrounding environment. Here, we do not model these terms but will instead capture the effect of the nonlinear convection function, i.e., the output of r(T,Ta)𝑟𝑇subscript𝑇𝑎r(T,T_{a})italic_r ( italic_T , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ), using a black-box function approximator (c.f. Section II-C).

The ambient temperature is assumed to be measured and will be treated as an input. If we incorporate the linear convection term into the governing equations, they become:

T˙=M1FT+M1r(Ta,T)+M1Gu¯,˙𝑇superscript𝑀1𝐹𝑇superscript𝑀1𝑟subscript𝑇𝑎𝑇superscript𝑀1𝐺¯𝑢\displaystyle\dot{T}=M^{-1}FT+M^{-1}r\big{(}T_{a},T\big{)}+M^{-1}G\bar{u}\,,over˙ start_ARG italic_T end_ARG = italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_F italic_T + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_T ) + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_G over¯ start_ARG italic_u end_ARG , (3)

where u¯=[Tau1uD]¯𝑢superscriptdelimited-[]subscript𝑇𝑎subscript𝑢1subscript𝑢𝐷\bar{u}=[T_{a}\ u_{1}\ \dots\ u_{D}]^{\intercal}over¯ start_ARG italic_u end_ARG = [ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_u start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT and

F=K[haa1haaD],G=[haa1haaDI].formulae-sequence𝐹𝐾matrixsubscript𝑎subscript𝑎1missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionsubscript𝑎subscript𝑎𝐷𝐺matrixmatrixsubscript𝑎subscript𝑎1subscript𝑎subscript𝑎𝐷𝐼\displaystyle F=K-\begin{bmatrix}h_{a}a_{1}&&\\ &\!\ddots\!&\\ &&h_{a}a_{D}\end{bmatrix},\,G=\begin{bmatrix}\begin{matrix}h_{a}a_{1}\\ \vdots\\ h_{a}a_{D}\end{matrix}&I\ \end{bmatrix}.italic_F = italic_K - [ start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋱ end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , italic_G = [ start_ARG start_ROW start_CELL start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] . (4)

II-B Temporal Gaussian Processes

Temporal Gaussian processes describe distributions over functions of time [12]. We shall use these to estimate the state ρ(t)=r(T(t),Ta(t))𝜌𝑡𝑟𝑇𝑡subscript𝑇𝑎𝑡\rho(t)=r(T(t),T_{a}(t))italic_ρ ( italic_t ) = italic_r ( italic_T ( italic_t ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) ) over time, and later fit a regression model from T(t)𝑇𝑡T(t)italic_T ( italic_t ) and Ta(t)subscript𝑇𝑎𝑡T_{a}(t)italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) to ρ(t)𝜌𝑡\rho(t)italic_ρ ( italic_t ) (see Section III-C) [5, 6]. To do so, we must construct a dynamical form of a temporal Gaussian process. Consider a GP prior distribution over functions ρ(t)𝜌𝑡\rho(t)italic_ρ ( italic_t )

p(ρ(t);ψ)=𝒢𝒫(ρ(t)| 0,κψ(t,t)),𝑝𝜌𝑡𝜓𝒢𝒫conditional𝜌𝑡 0subscript𝜅𝜓𝑡superscript𝑡\displaystyle p\big{(}\rho(t);\psi\big{)}=\mathcal{GP}\big{(}\rho(t)\>|\>0,% \kappa_{\psi}(t,t^{\prime})\big{)}\,,italic_p ( italic_ρ ( italic_t ) ; italic_ψ ) = caligraphic_G caligraphic_P ( italic_ρ ( italic_t ) | 0 , italic_κ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) , (5)

with kernel covariance function κ𝜅\kappaitalic_κ with hyperparameters ψ𝜓\psiitalic_ψ. We have chosen a zero-mean prior distribution because we have no reason to believe our function of interest has a systematic offset. For the kernel covariance function, we select the lowest order of the Whittle-Matérn class, namely the exponential covariance function

κψ(t,t)=γ2exp(3l|tt|),subscript𝜅𝜓𝑡superscript𝑡superscript𝛾23𝑙𝑡superscript𝑡\displaystyle\kappa_{\psi}(t,t^{\prime})=\gamma^{2}\exp\Big{(}-\frac{\sqrt{3}}% {l}\,|\,t-t^{\prime}\,|\Big{)}\,,italic_κ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG italic_l end_ARG | italic_t - italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) , (6)

with scale hyperparameters ψ=(γ,l)𝜓𝛾𝑙\psi=(\gamma,l)italic_ψ = ( italic_γ , italic_l ) [12]. Note that this kernel is stationary, i.e., only a function of tt𝑡superscript𝑡t-t^{\prime}italic_t - italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The choice for an exponential covariance is based on two qualitative features: it is flexible (we make no strong smoothness assumptions) and it leads to a scalar dynamical systems (see also Sec. V).

The exponential kernel covariance function is dual to the power spectral density [10]:

𝒦ψ(ω)2λγ2(λ2+ω2),proportional-tosubscript𝒦𝜓𝜔2𝜆superscript𝛾2superscript𝜆2superscript𝜔2\displaystyle\mathcal{K}_{\psi}(\omega)\propto\frac{2\lambda\gamma^{2}}{(% \lambda^{2}+\omega^{2})}\,,caligraphic_K start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_ω ) ∝ divide start_ARG 2 italic_λ italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG , (7)

where λ=3/l𝜆3𝑙\lambda=\sqrt{3}/litalic_λ = square-root start_ARG 3 end_ARG / italic_l. Factorization of the denominator produces a transfer function that serves as the state transition in the stochastic differential equation (SDE) [10];

ρ˙(t)=λρ(t)+w(t).˙𝜌𝑡𝜆𝜌𝑡𝑤𝑡\displaystyle\dot{\rho}(t)=-\lambda\rho(t)+w(t)\,.over˙ start_ARG italic_ρ end_ARG ( italic_t ) = - italic_λ italic_ρ ( italic_t ) + italic_w ( italic_t ) . (8)

The white noise process w(t)𝑤𝑡w(t)italic_w ( italic_t ) has a spectral density equal to the numerator of Eq. 7, vc=2λγ2subscript𝑣𝑐2𝜆superscript𝛾2v_{c}=2\lambda\gamma^{2}italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 2 italic_λ italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

II-C Augmented discrete-time model

In this section, we will augment the system of differential equations (Eq. 3) with the SDE representation of the Gaussian process. Note that the nonlinear convection function r(T,Ta)𝑟𝑇subscript𝑇𝑎r(T,T_{a})italic_r ( italic_T , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) is a vector and that Eq. 8 describes a scalar function. We therefore pose an independent GP SDE ρ(t)𝜌𝑡\rho(t)italic_ρ ( italic_t ) for every component:

r(T(t),Ta(t))=[r(T1(t),Ta(t))r(TD(t),Ta(t))][ρ1(t)ρD(t)].𝑟𝑇𝑡subscript𝑇𝑎𝑡matrix𝑟subscript𝑇1𝑡subscript𝑇𝑎𝑡𝑟subscript𝑇𝐷𝑡subscript𝑇𝑎𝑡matrixsubscript𝜌1𝑡subscript𝜌𝐷𝑡\displaystyle r(T(t),T_{a}(t))=\begin{bmatrix}r(T_{1}(t),T_{a}(t))\\ \vdots\\ r(T_{D}(t),T_{a}(t))\end{bmatrix}\approx\begin{bmatrix}\rho_{1}(t)\\ \vdots\\ \rho_{D}(t)\end{bmatrix}\,.italic_r ( italic_T ( italic_t ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) ) = [ start_ARG start_ROW start_CELL italic_r ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_r ( italic_T start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_t ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_t ) ) end_CELL end_ROW end_ARG ] ≈ [ start_ARG start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ] . (9)

Using ρ(t)=[ρ1(t)ρD(t)]𝜌𝑡superscriptdelimited-[]subscript𝜌1𝑡subscript𝜌𝐷𝑡\rho(t)=[\rho_{1}(t)\dots\rho_{D}(t)]^{\intercal}italic_ρ ( italic_t ) = [ italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) … italic_ρ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_t ) ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT, we can reformulate Eq. 3 as an augmented system:

[T˙ρ˙]=[M-1FM-10λI][Tρ]+[M-1G0]u¯+[01]w,matrix˙𝑇˙𝜌matrixsuperscript𝑀-1𝐹superscript𝑀-10𝜆𝐼matrix𝑇𝜌matrixsuperscript𝑀-1𝐺0¯𝑢matrix01𝑤\displaystyle\begin{bmatrix}\dot{T}\\ \dot{\rho}\end{bmatrix}=\begin{bmatrix}M^{\text{-}1}F&M^{\text{-}1}\\ 0&-\lambda I\end{bmatrix}\begin{bmatrix}T\\ \rho\end{bmatrix}+\begin{bmatrix}M^{\text{-}1}G\\ 0\end{bmatrix}\bar{u}+\begin{bmatrix}0\\ 1\end{bmatrix}w,[ start_ARG start_ROW start_CELL over˙ start_ARG italic_T end_ARG end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_ρ end_ARG end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_F end_CELL start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - italic_λ italic_I end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_T end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_G end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] over¯ start_ARG italic_u end_ARG + [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ] italic_w , (10)

where λ𝜆\lambdaitalic_λ and γ𝛾\gammaitalic_γ are shared across all GP states.

We shall discretize the system using a regular sampling interval Δt=tktk1Δ𝑡subscript𝑡𝑘subscript𝑡𝑘1\Delta t=t_{k}-t_{k-1}roman_Δ italic_t = italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT for all steps k𝑘kitalic_k. Let xk=[T1kTDkρ1kρDk]subscript𝑥𝑘superscriptdelimited-[]subscript𝑇1𝑘subscript𝑇𝐷𝑘subscript𝜌1𝑘subscript𝜌𝐷𝑘x_{k}=[T_{1k}\dots T_{Dk}\ \rho_{1k}\dots\rho_{Dk}]^{\intercal}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ italic_T start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT … italic_T start_POSTSUBSCRIPT italic_D italic_k end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT … italic_ρ start_POSTSUBSCRIPT italic_D italic_k end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT. The system then becomes:

xk=Axk-1+Bu¯k+wk,subscript𝑥𝑘𝐴subscript𝑥𝑘-1𝐵subscript¯𝑢𝑘subscript𝑤𝑘\displaystyle x_{k}=Ax_{k\text{-}1}+B\bar{u}_{k}+w_{k}\,,italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_A italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_B over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (11)

with state transition and control matrix

A=exp(Δt[M-1FM-10λI]),B=Δt[M-1G0].formulae-sequence𝐴Δ𝑡matrixsuperscript𝑀-1𝐹superscript𝑀-10𝜆𝐼𝐵Δ𝑡matrixsuperscript𝑀-1𝐺0\displaystyle A=\exp\big{(}\Delta t\begin{bmatrix}M^{\text{-}1}F&M^{\text{-}1}% \\ 0&-\lambda I\end{bmatrix}\big{)}\,,\ B=\Delta t\begin{bmatrix}\,M^{\text{-}1}G% \\ 0\end{bmatrix}\,.italic_A = roman_exp ( roman_Δ italic_t [ start_ARG start_ROW start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_F end_CELL start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - italic_λ italic_I end_CELL end_ROW end_ARG ] ) , italic_B = roman_Δ italic_t [ start_ARG start_ROW start_CELL italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_G end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] . (12)

The discrete-time noise wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is zero-mean Gaussian distributed, with covariance matrix

Q=0Δtexp(At)[0I](vcI)[0I]exp(At)dt.\displaystyle Q=\int_{0}^{\Delta t}\exp(At)\begin{bmatrix}0\\ I\end{bmatrix}(v_{c}I)\begin{bmatrix}0\\ I\end{bmatrix}^{\intercal}\exp(At)^{\intercal}dt\,.italic_Q = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ italic_t end_POSTSUPERSCRIPT roman_exp ( italic_A italic_t ) [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_I end_CELL end_ROW end_ARG ] ( italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_I ) [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_I end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT roman_exp ( italic_A italic_t ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_d italic_t . (13)

We approximate this integral using a first-order Taylor approximation of the matrix exponential: exp(At)I+At𝐴𝑡𝐼𝐴𝑡\exp(At)\approx I+Atroman_exp ( italic_A italic_t ) ≈ italic_I + italic_A italic_t. This provides an analytic expression that can be easily differentiated (important for Sec. III-B). The approximation (13) evaluates to

Q𝑄\displaystyle Qitalic_Q [Q11Q12Q21Q22],absentmatrixsubscript𝑄11subscript𝑄12subscript𝑄21subscript𝑄22\displaystyle\approx\begin{bmatrix}Q_{11}&Q_{12}\\ Q_{21}&Q_{22}\end{bmatrix}\,,≈ [ start_ARG start_ROW start_CELL italic_Q start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_Q start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Q start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_Q start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (14)

with block matrices

Q11subscript𝑄11\displaystyle Q_{11}italic_Q start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT =13Δt3vcM1absent13Δsuperscript𝑡3subscript𝑣𝑐superscript𝑀1\displaystyle=\frac{1}{3}\Delta t^{3}v_{c}M^{-1}= divide start_ARG 1 end_ARG start_ARG 3 end_ARG roman_Δ italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (15a)
Q12subscript𝑄12\displaystyle Q_{12}italic_Q start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT =Q21=(12Δt213λΔt3)vcM1absentsubscript𝑄2112Δsuperscript𝑡213𝜆Δsuperscript𝑡3subscript𝑣𝑐superscript𝑀1\displaystyle=Q_{21}=(\frac{1}{2}\Delta t^{2}-\frac{1}{3}\lambda\Delta t^{3})v% _{c}M^{-1}= italic_Q start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 end_ARG italic_λ roman_Δ italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (15b)
Q22subscript𝑄22\displaystyle Q_{22}italic_Q start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT =(ΔtλΔt2+13λ2Δt3)vcI.absentΔ𝑡𝜆Δsuperscript𝑡213superscript𝜆2Δsuperscript𝑡3subscript𝑣𝑐𝐼\displaystyle=(\Delta t-\lambda\Delta t^{2}+\frac{1}{3}\lambda^{2}\Delta t^{3}% )v_{c}I\,.= ( roman_Δ italic_t - italic_λ roman_Δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 3 end_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_I . (15c)

Note that both A𝐴Aitalic_A and Q𝑄Qitalic_Q depend on the hyperparameters λ𝜆\lambdaitalic_λ and γ𝛾\gammaitalic_γ, and will henceforth be referred to as Aψsubscript𝐴𝜓A_{\psi}italic_A start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT and Qψsubscript𝑄𝜓Q_{\psi}italic_Q start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT.

II-D Probabilistic state-space model

Our goal will be to infer the temperature and GP states, for which we required a probabilistic state-space model. If we integrate out the process noise instance wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, then the distribution of the next state xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is Gaussian:

p(xk|xk-1,u¯k;ψ)=𝒩(xk|Aψxk-1+Bu¯k,Qψ).𝑝conditionalsubscript𝑥𝑘subscript𝑥𝑘-1subscript¯𝑢𝑘𝜓𝒩conditionalsubscript𝑥𝑘subscript𝐴𝜓subscript𝑥𝑘-1𝐵subscript¯𝑢𝑘subscript𝑄𝜓\displaystyle p(x_{k}\>|\>x_{k\text{-}1},\bar{u}_{k};\psi)=\mathcal{N}(x_{k}\>% |\>A_{\psi}x_{k\text{-}1}+B\bar{u}_{k},Q_{\psi})\,.italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_ψ ) = caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_A start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_B over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ) . (16)

We assume to have noisy measurements of the temperatures:

p(yk|xk)=𝒩(yk|Cxk,R),𝑝conditionalsubscript𝑦𝑘subscript𝑥𝑘𝒩conditionalsubscript𝑦𝑘𝐶subscript𝑥𝑘𝑅\displaystyle p(y_{k}\>|\>x_{k})=\mathcal{N}(y_{k}\>|\>Cx_{k},R)\,,italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = caligraphic_N ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_C italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_R ) , (17)

where C𝐶Citalic_C indicates which components are measured and R𝑅Ritalic_R is the measurement noise covariance matrix.

The prior distribution of the temperatures is assumed to be Gaussian distributed;

p(T0)=𝒩(T0|m^0,S^0).𝑝subscript𝑇0𝒩conditionalsubscript𝑇0subscript^𝑚0subscript^𝑆0\displaystyle p(T_{0})=\mathcal{N}\big{(}T_{0}\>|\>\hat{m}_{0},\hat{S}_{0}\big% {)}\,.italic_p ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (18)

For the SDE of the temporal Gaussian process to be stable, it must start from the steady-state solution of the process. Setting the time derivatives of the state distribution’s parameters to 00 yields a stationary mean of 00 and, through Lyapunov’s equation, a stationary variance of γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [13]. The prior state distribution for the GP states thus becomes:

p(ρi0;ψ)=𝒩(ρi0| 0,γ2).𝑝subscript𝜌𝑖0𝜓𝒩conditionalsubscript𝜌𝑖0 0superscript𝛾2\displaystyle p(\rho_{i0};\psi)=\mathcal{N}\big{(}\rho_{i0}\>|\>0,\gamma^{2}% \big{)}\,.italic_p ( italic_ρ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT ; italic_ψ ) = caligraphic_N ( italic_ρ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT | 0 , italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (19)

Combining the heat transfer model and GP priors gives:

p(x0;ψ)=𝒩([T0ρ0]|[m^00]m0,[S^000γ2I]S0).𝑝subscript𝑥0𝜓𝒩conditionalmatrixsubscript𝑇0subscript𝜌0subscriptmatrixsubscript^𝑚00subscript𝑚0subscriptmatrixsubscript^𝑆000superscript𝛾2𝐼subscript𝑆0\displaystyle p(x_{0};\psi)=\mathcal{N}\Big{(}\begin{bmatrix}T_{0}\\ \rho_{0}\end{bmatrix}\>|\>\underbrace{\begin{bmatrix}\hat{m}_{0}\\ 0\end{bmatrix}}_{m_{0}},\underbrace{\begin{bmatrix}\hat{S}_{0}&0\\ 0&\gamma^{2}I\end{bmatrix}}_{S_{0}}\Big{)}\,.italic_p ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_ψ ) = caligraphic_N ( [ start_ARG start_ROW start_CELL italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] | under⏟ start_ARG [ start_ARG start_ROW start_CELL over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , under⏟ start_ARG [ start_ARG start_ROW start_CELL over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (20)

The complete grey-box probabilistic model for a time-series of length N𝑁Nitalic_N is:

p(y1:N,\displaystyle p(y_{1:N},italic_p ( italic_y start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT , x0:N|u¯1:N;ψ)=\displaystyle\,x_{0:N}\>|\>\bar{u}_{1:N};\psi)=italic_x start_POSTSUBSCRIPT 0 : italic_N end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT ; italic_ψ ) = (21)
p(x0;ψ)k=1Np(yk|xk)p(xk|xk-1,u¯k;ψ).𝑝subscript𝑥0𝜓superscriptsubscriptproduct𝑘1𝑁𝑝conditionalsubscript𝑦𝑘subscript𝑥𝑘𝑝conditionalsubscript𝑥𝑘subscript𝑥𝑘-1subscript¯𝑢𝑘𝜓\displaystyle\quad p(x_{0};\psi)\prod_{k=1}^{N}p(y_{k}\>|\>x_{k})p(x_{k}\>|\>x% _{k\text{-}1},\bar{u}_{k};\psi)\,.italic_p ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_ψ ) ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_ψ ) .

We shall use this model to infer marginal posterior distributions over states xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and hyperparameters ψ𝜓\psiitalic_ψ.

III Inference

The inference procedure has two phases: firstly, states and hyperparameters are estimated, and secondly, nonlinear convection is estimated as a function of temperature.

III-A State estimation

States are inferred using the Bayesian smoothing equations [14]. These start with a filtering step, moving from k=1𝑘1k=1italic_k = 1 to k=N𝑘𝑁k=Nitalic_k = italic_N. Let 𝒟k={yi,u¯i}i=1ksubscript𝒟𝑘superscriptsubscriptsubscript𝑦𝑖subscript¯𝑢𝑖𝑖1𝑘\mathcal{D}_{k}=\{y_{i},\bar{u}_{i}\}_{i=1}^{k}caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT be the input-output pairs up to time k𝑘kitalic_k. The prediction step is the marginalization of the Gaussian state transition over the previous Gaussian marginal state posterior:

p𝑝\displaystyle pitalic_p (xk|u¯k,𝒟k-1;ψ^)conditionalsubscript𝑥𝑘subscript¯𝑢𝑘subscript𝒟𝑘-1^𝜓\displaystyle(x_{k}\>|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG )
=p(xk|xk-1,u¯k;ψ^)p(xk-1|𝒟k-1;ψ^)dxk-1absent𝑝conditionalsubscript𝑥𝑘subscript𝑥𝑘-1subscript¯𝑢𝑘^𝜓𝑝conditionalsubscript𝑥𝑘-1subscript𝒟𝑘-1^𝜓differential-dsubscript𝑥𝑘-1\displaystyle=\!\int\!p(x_{k}\>|\>x_{k\text{-}1},\bar{u}_{k};\hat{\psi})\,p(x_% {k\text{-}1}\>|\>\mathcal{D}_{k\text{-}1};\hat{\psi})\,\mathrm{d}x_{k\text{-}1}= ∫ italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) italic_p ( italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) roman_d italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT (22)
=𝒩(xk|m¯k,S¯k),absent𝒩conditionalsubscript𝑥𝑘subscript¯𝑚𝑘subscript¯𝑆𝑘\displaystyle=\mathcal{N}\big{(}x_{k}\>|\>\bar{m}_{k},\bar{S}_{k}\big{)}\,,= caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (23)

where the predictive mean and variance are

m¯k=Aψ^mk-1+Bu¯k,S¯k=Aψ^Sk-1Aψ^+Qψ^.formulae-sequencesubscript¯𝑚𝑘subscript𝐴^𝜓subscript𝑚𝑘-1𝐵subscript¯𝑢𝑘subscript¯𝑆𝑘subscript𝐴^𝜓subscript𝑆𝑘-1superscriptsubscript𝐴^𝜓subscript𝑄^𝜓\displaystyle\bar{m}_{k}=A_{\hat{\psi}}m_{k\text{-}1}+B\bar{u}_{k}\,,\quad\bar% {S}_{k}=A_{\hat{\psi}}S_{k\text{-}1}A_{\hat{\psi}}^{\intercal}+Q_{\hat{\psi}}\,.over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_B over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT + italic_Q start_POSTSUBSCRIPT over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT . (24)

Note that the kernel hyperparameters are fixed to a point estimate, ψ^^𝜓\hat{\psi}over^ start_ARG italic_ψ end_ARG (see Sec. III-B).

In the correction step, we apply Bayes’ rule using the predicted marginal state as prior distribution:

p(xk|𝒟k;ψ^)posteriorsubscript𝑝conditionalsubscript𝑥𝑘subscript𝒟𝑘^𝜓posterior\displaystyle\underbrace{p(x_{k}\>|\>\mathcal{D}_{k};\hat{\psi})}_{\text{% posterior}}under⏟ start_ARG italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) end_ARG start_POSTSUBSCRIPT posterior end_POSTSUBSCRIPT =p(yk|xk)likelihoodp(yk|u¯k,𝒟k-1)evidencep(xk|u¯k,𝒟k-1;ψ^)prior,absentsuperscript𝑝conditionalsubscript𝑦𝑘subscript𝑥𝑘likelihoodsubscript𝑝conditionalsubscript𝑦𝑘subscript¯𝑢𝑘subscript𝒟𝑘-1evidencesubscript𝑝conditionalsubscript𝑥𝑘subscript¯𝑢𝑘subscript𝒟𝑘-1^𝜓prior\displaystyle=\frac{\overbrace{p(y_{k}\>|\>x_{k})}^{\text{likelihood}}}{% \underbrace{p(y_{k}\>|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1})}_{\text{evidence% }}}\,\underbrace{p(x_{k}\>|\>\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})}% _{\text{prior}}\,,= divide start_ARG over⏞ start_ARG italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_POSTSUPERSCRIPT likelihood end_POSTSUPERSCRIPT end_ARG start_ARG under⏟ start_ARG italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT evidence end_POSTSUBSCRIPT end_ARG under⏟ start_ARG italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) end_ARG start_POSTSUBSCRIPT prior end_POSTSUBSCRIPT , (25)

where the evidence is

p(yk|u¯k,𝒟k-1)=p(yk|xk)p(xk|u¯k,𝒟k-1;ψ^)dxk.𝑝conditionalsubscript𝑦𝑘subscript¯𝑢𝑘subscript𝒟𝑘-1𝑝conditionalsubscript𝑦𝑘subscript𝑥𝑘𝑝conditionalsubscript𝑥𝑘subscript¯𝑢𝑘subscript𝒟𝑘-1^𝜓differential-dsubscript𝑥𝑘\displaystyle p(y_{k}|\bar{u}_{k},\mathcal{D}_{k\text{-}1})\!=\!\int\!p(y_{k}|% x_{k})p(x_{k}|\bar{u}_{k},\mathcal{D}_{k\text{-}1};\hat{\psi})\mathrm{d}x_{k}\,.italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) = ∫ italic_p ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) roman_d italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (26)

For our Gaussian distributed likelihood and Gaussian distributed predicted state distribution, this yields a Gaussian state posterior 𝒩(xk|m~k,S~k)𝒩conditionalsubscript𝑥𝑘subscript~𝑚𝑘subscript~𝑆𝑘\mathcal{N}(x_{k}\>|\>\tilde{m}_{k},\tilde{S}_{k})caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) with parameters [14]:

m~ksubscript~𝑚𝑘\displaystyle\tilde{m}_{k}over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =m¯k+S¯kC(CS¯kC+R)1(ykCm¯k)absentsubscript¯𝑚𝑘subscript¯𝑆𝑘superscript𝐶superscript𝐶subscript¯𝑆𝑘superscript𝐶𝑅1subscript𝑦𝑘𝐶subscript¯𝑚𝑘\displaystyle=\bar{m}_{k}+\bar{S}_{k}C^{\intercal}(C\bar{S}_{k}C^{\intercal}+R% )^{-1}(y_{k}-C\bar{m}_{k})= over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_C over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT + italic_R ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_C over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (27a)
S~ksubscript~𝑆𝑘\displaystyle\tilde{S}_{k}over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =S¯kS¯kC(CS¯kC+R)1CS¯k.absentsubscript¯𝑆𝑘subscript¯𝑆𝑘superscript𝐶superscript𝐶subscript¯𝑆𝑘superscript𝐶𝑅1𝐶superscriptsubscript¯𝑆𝑘\displaystyle=\bar{S}_{k}-\bar{S}_{k}C^{\intercal}(C\bar{S}_{k}C^{\intercal}+R% )^{-1}C\bar{S}_{k}^{\intercal}\,.= over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_C over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT + italic_R ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_C over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT . (27b)

Smoothing consists of correcting these state estimates based on future data [14]:

p(xk|𝒟N;ψ^)=p(xk|𝒟k;ψ^)𝑝conditionalsubscript𝑥𝑘subscript𝒟𝑁^𝜓𝑝conditionalsubscript𝑥𝑘subscript𝒟𝑘^𝜓\displaystyle p(x_{k}\>|\>\mathcal{D}_{N};\hat{\psi})=p(x_{k}\>|\>\mathcal{D}_% {k};\hat{\psi})italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) = italic_p ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) (28)
p(xk+1|xk,u¯k+1;ψ^)p(xk+1|𝒟N;ψ^)p(xk+1|𝒟k;ψ^)dxk+1.\displaystyle\quad\cdot\int\frac{p(x_{k+1}\>|\>x_{k},\bar{u}_{k+1};\hat{\psi})% p(x_{k+1}\>|\>\mathcal{D}_{N};\hat{\psi})}{p(x_{k+1}\>|\>\mathcal{D}_{k};\hat{% \psi})}\mathrm{d}x_{k+1}\,.⋅ ∫ divide start_ARG italic_p ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) italic_p ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) end_ARG start_ARG italic_p ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; over^ start_ARG italic_ψ end_ARG ) end_ARG roman_d italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT .

These corrections are executed by the following updates, running backwards from k=N1𝑘𝑁1k=N\dots 1italic_k = italic_N … 1:

Gksubscript𝐺𝑘\displaystyle G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =S~kAψ^S¯k+11absentsubscript~𝑆𝑘superscriptsubscript𝐴^𝜓superscriptsubscript¯𝑆𝑘11\displaystyle=\tilde{S}_{k}A_{\hat{\psi}}^{\intercal}\bar{S}_{k+1}^{-1}= over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (29a)
mksubscript𝑚𝑘\displaystyle m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =m~k+Gk(mk+1m¯k+1)absentsubscript~𝑚𝑘subscript𝐺𝑘subscript𝑚𝑘1subscript¯𝑚𝑘1\displaystyle=\tilde{m}_{k}+G_{k}\big{(}m_{k+1}-\bar{m}_{k+1}\big{)}= over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_m start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) (29b)
Sksubscript𝑆𝑘\displaystyle S_{k}italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =S~k+Gk(Sk+1S¯k+1)Gk,absentsubscript~𝑆𝑘subscript𝐺𝑘subscript𝑆𝑘1subscript¯𝑆𝑘1superscriptsubscript𝐺𝑘\displaystyle=\tilde{S}_{k}+G_{k}\big{(}S_{k+1}-\bar{S}_{k+1}\big{)}G_{k}^{% \intercal}\,,= over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT , (29c)

where Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents the strength of the correction by future observations.

To obtain state estimates, the runtime is O(D3N)𝑂superscript𝐷3𝑁O(D^{3}N)italic_O ( italic_D start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_N ) due to inversions of D×D𝐷𝐷D\times Ditalic_D × italic_D covariance matrices. This algorithm therefore excels in situations where D𝐷Ditalic_D is small (i.e., few components in a lumped-element model) and N𝑁Nitalic_N is large (long time-series).

III-B Hyperparameter estimation

Tuning GP kernel hyperparameters can be challenging when the landscape is multi-modal or contains regions of divergence. Maximum likelihood estimation in those cases may lead to poor solutions [15]. Here we propose a Laplace approximation of the posterior distribution, for two reasons: one, the use of prior distributions may enforce convergence, and two, the approximate posterior variance provides a quick method for assessing the quality of selected hyperparameters.

We assume the hyperparameters are independent of each other, so p(ψ)=p(γ)p(l)𝑝𝜓𝑝𝛾𝑝𝑙p(\psi)=p(\gamma)p(l)italic_p ( italic_ψ ) = italic_p ( italic_γ ) italic_p ( italic_l ). Since the length scales are strictly positive, we choose to employ Gamma distributed prior distributions:

p(γ)=𝒢(γ|αγ,βγ),p(l)=𝒢(l|αl,βl),formulae-sequence𝑝𝛾𝒢conditional𝛾subscript𝛼𝛾subscript𝛽𝛾𝑝𝑙𝒢conditional𝑙subscript𝛼𝑙subscript𝛽𝑙\displaystyle p(\gamma)=\mathcal{G}(\gamma\>|\>\alpha_{\gamma},\beta_{\gamma})% \,,\quad p(l)=\mathcal{G}(l\>|\>\alpha_{l},\beta_{l})\,,italic_p ( italic_γ ) = caligraphic_G ( italic_γ | italic_α start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) , italic_p ( italic_l ) = caligraphic_G ( italic_l | italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) , (30)

We then form a Gaussian approximation of the posterior distribution whose mean ψ^^𝜓\hat{\psi}over^ start_ARG italic_ψ end_ARG is the maximum a posteriori and whose precision matrix ΛΛ\Lambdaroman_Λ is based on the curvature at the maximum [16]:

ψ^^𝜓\displaystyle\hat{\psi}over^ start_ARG italic_ψ end_ARG =argmaxψΨlnp(y1:N|u¯1:N;ψ)p(ψ)absent𝜓Ψ𝑝conditionalsubscript𝑦:1𝑁subscript¯𝑢:1𝑁𝜓𝑝𝜓\displaystyle=\underset{\psi\in\Psi}{\arg\max}\ \ln p(y_{1:N}\>|\>\bar{u}_{1:N% };\psi)p(\psi)= start_UNDERACCENT italic_ψ ∈ roman_Ψ end_UNDERACCENT start_ARG roman_arg roman_max end_ARG roman_ln italic_p ( italic_y start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT ; italic_ψ ) italic_p ( italic_ψ ) (31)
ΛijsubscriptΛ𝑖𝑗\displaystyle\Lambda_{ij}roman_Λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =2ψiψjlnp(y1:N|u¯1:N;ψ)p(ψ)|ψ=ψ^.absentevaluated-atsuperscript2subscript𝜓𝑖subscript𝜓𝑗𝑝conditionalsubscript𝑦:1𝑁subscript¯𝑢:1𝑁𝜓𝑝𝜓𝜓^𝜓\displaystyle=-\frac{\partial^{2}}{\partial\psi_{i}\,\partial\psi_{j}}\ln p(y_% {1:N}\>|\>\bar{u}_{1:N};\psi)p(\psi)\Big{|}_{\psi=\hat{\psi}}\,.= - divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( italic_y start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT | over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 : italic_N end_POSTSUBSCRIPT ; italic_ψ ) italic_p ( italic_ψ ) | start_POSTSUBSCRIPT italic_ψ = over^ start_ARG italic_ψ end_ARG end_POSTSUBSCRIPT . (32)

Note that the Hessian only has to be computed once.

III-C Static nonlinearity estimation

State estimation will give us estimates of Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ρksubscript𝜌𝑘\rho_{k}italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. But ρksubscript𝜌𝑘\rho_{k}italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT only represents the value of r(Ti,Ta)𝑟subscript𝑇𝑖subscript𝑇𝑎r(T_{i},T_{a})italic_r ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) at time tksubscript𝑡𝑘t_{k}italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. To obtain a functional form for r(,)𝑟r(\cdot,\cdot)italic_r ( ⋅ , ⋅ ), we employ a regression model that takes as inputs the estimates of Tiksubscript𝑇𝑖𝑘T_{ik}italic_T start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT (i.e., means miksubscript𝑚𝑖𝑘m_{ik}italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT for i=1,D𝑖1𝐷i=1,\dots Ditalic_i = 1 , … italic_D) as well as the measurements of Taksubscript𝑇𝑎𝑘T_{ak}italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT, and as outputs takes the estimates of ρiksubscript𝜌𝑖𝑘\rho_{ik}italic_ρ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT (i.e., the means mjksubscript𝑚𝑗𝑘m_{jk}italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT for j=i+D𝑗𝑖𝐷j=i+Ditalic_j = italic_i + italic_D). We consider specifically a Bayesian polynomial regression model because low-order polynomials typically suffice to capture the relatively slow change in convection over temperatures and because we aim to quantify uncertainty on the function estimate. Let ϕ:2Dϕ:italic-ϕsuperscript2superscriptsubscript𝐷italic-ϕ\phi:\mathbb{R}^{2}\rightarrow\mathbb{R}^{D_{\phi}}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be a polynomial basis expansion function, map** a cell and ambient temperature to a Dϕsubscript𝐷italic-ϕD_{\phi}italic_D start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT-dimensional space, and θiDϕsubscript𝜃𝑖superscriptsubscript𝐷italic-ϕ\theta_{i}\in\mathbb{R}^{D_{\phi}}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be regression coefficients. We consider a likelihood of the form,

p(mjk|mik,Tak,θi)=𝒩(mjk|θiϕ(mik,Tak),σj2),𝑝conditionalsubscript𝑚𝑗𝑘subscript𝑚𝑖𝑘subscript𝑇𝑎𝑘subscript𝜃𝑖𝒩conditionalsubscript𝑚𝑗𝑘superscriptsubscript𝜃𝑖italic-ϕsubscript𝑚𝑖𝑘subscript𝑇𝑎𝑘superscriptsubscript𝜎𝑗2\displaystyle p(m_{jk}|m_{ik},T_{ak},\theta_{i})\!=\!\mathcal{N}(m_{jk}\>|\>% \theta_{i}^{\intercal}\phi(m_{ik},T_{ak}),\sigma_{j}^{2})\,,italic_p ( italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_N ( italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_ϕ ( italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT ) , italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (33)

where σj2=1Nk=1NSjjksuperscriptsubscript𝜎𝑗21𝑁superscriptsubscript𝑘1𝑁subscript𝑆𝑗𝑗𝑘\sigma_{j}^{2}=\frac{1}{N}\sum_{k=1}^{N}S_{jjk}italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_j italic_j italic_k end_POSTSUBSCRIPT is the average variance of the GP estimated states. Our prior distribution on the regression weights θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is:

p(θi)=𝒩(θi|μi0,Σi0).𝑝subscript𝜃𝑖𝒩conditionalsubscript𝜃𝑖subscript𝜇𝑖0subscriptΣ𝑖0\displaystyle p(\theta_{i})=\mathcal{N}(\theta_{i}\>|\>\mu_{i0},\Sigma_{i0})\,.italic_p ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_N ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT ) . (34)

This Gaussian prior distribution is conjugate and we may thus obtain a posterior distribution exactly [16],

p(θi|𝒟N)𝑝conditionalsubscript𝜃𝑖subscript𝒟𝑁\displaystyle p(\theta_{i}\>|\>\mathcal{D}_{N})italic_p ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) =p(θi)k=1Np(mjk|mik,Tak,θi)p(θi)k=1Np(mjk|mik,Tak,θi)dθiabsent𝑝subscript𝜃𝑖superscriptsubscriptproduct𝑘1𝑁𝑝conditionalsubscript𝑚𝑗𝑘subscript𝑚𝑖𝑘subscript𝑇𝑎𝑘subscript𝜃𝑖𝑝subscript𝜃𝑖superscriptsubscriptproduct𝑘1𝑁𝑝conditionalsubscript𝑚𝑗𝑘subscript𝑚𝑖𝑘subscript𝑇𝑎𝑘subscript𝜃𝑖dsubscript𝜃𝑖\displaystyle=\frac{p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>|\>m_{ik},T_{ak},% \theta_{i})}{\int p(\theta_{i})\prod_{k=1}^{N}p(m_{jk}\>|\>m_{ik},T_{ak},% \theta_{i})\mathrm{d}\theta_{i}}= divide start_ARG italic_p ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p ( italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∫ italic_p ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p ( italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (35)
=𝒩(θi|μi,Σi),absent𝒩conditionalsubscript𝜃𝑖subscript𝜇𝑖subscriptΣ𝑖\displaystyle=\mathcal{N}(\theta_{i}\>|\>\mu_{i},\Sigma_{i})\,,= caligraphic_N ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (36)

where, using ϕk=ϕ(mik,Tak)subscriptitalic-ϕ𝑘italic-ϕsubscript𝑚𝑖𝑘subscript𝑇𝑎𝑘\phi_{k}=\phi(m_{ik},T_{ak})italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_ϕ ( italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT ), the parameters are:

ΣisubscriptΣ𝑖\displaystyle\Sigma_{i}roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(Σi01+σj2k=1Nϕkϕk)1absentsuperscriptsuperscriptsubscriptΣ𝑖01superscriptsubscript𝜎𝑗2superscriptsubscript𝑘1𝑁subscriptitalic-ϕ𝑘superscriptsubscriptitalic-ϕ𝑘1\displaystyle=\big{(}\Sigma_{i0}^{-1}\!+\!\sigma_{j}^{-2}\sum_{k=1}^{N}\phi_{k% }\phi_{k}^{\intercal}\big{)}^{-1}= ( roman_Σ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (37)
μisubscript𝜇𝑖\displaystyle\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Σi(σj2k=1Nϕkmjk+Σi01μi0).absentsubscriptΣ𝑖superscriptsubscript𝜎𝑗2superscriptsubscript𝑘1𝑁subscriptitalic-ϕ𝑘subscript𝑚𝑗𝑘superscriptsubscriptΣ𝑖01subscript𝜇𝑖0\displaystyle=\Sigma_{i}\big{(}\sigma_{j}^{-2}\sum_{k=1}^{N}\phi_{k}m_{jk}\!+% \!\Sigma_{i0}^{-1}\mu_{i0}\big{)}\,.= roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT ) . (38)

Given a posterior distribution over the regression coefficients, we can derive a predictive distribution over new values of the nonlinear convection function. These are essentially predictions for the values of r(Ti,Ta)𝑟subscript𝑇𝑖subscript𝑇𝑎r(T_{i},T_{a})italic_r ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) but with a variance parameter indicating the amount of uncertainty originating from the estimated values of ρisubscript𝜌𝑖\rho_{i}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [16]. Let ϕ=ϕ(mi,Ta)subscriptitalic-ϕitalic-ϕsubscript𝑚𝑖subscript𝑇𝑎\phi_{*}=\phi(m_{i*},T_{a*})italic_ϕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = italic_ϕ ( italic_m start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a ∗ end_POSTSUBSCRIPT ). Then, the posterior predictive is:

p(mj|\displaystyle p(m_{j*}\>|\>italic_p ( italic_m start_POSTSUBSCRIPT italic_j ∗ end_POSTSUBSCRIPT | mi,Ta,𝒟N)\displaystyle m_{i*},T_{a*},\mathcal{D}_{N})italic_m start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a ∗ end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT )
=p(mj|mi,Ta,θi)p(θi|𝒟N)dθiabsent𝑝conditionalsubscript𝑚𝑗subscript𝑚𝑖subscript𝑇𝑎subscript𝜃𝑖𝑝conditionalsubscript𝜃𝑖subscript𝒟𝑁differential-dsubscript𝜃𝑖\displaystyle=\int p(m_{j*}\>|\>m_{i*},T_{a*},\theta_{i})p(\theta_{i}\>|\>% \mathcal{D}_{N})\mathrm{d}\theta_{i}= ∫ italic_p ( italic_m start_POSTSUBSCRIPT italic_j ∗ end_POSTSUBSCRIPT | italic_m start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a ∗ end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_p ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) roman_d italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (39)
=𝒩(mj|μiϕ,ϕΣiϕ+σj2).absent𝒩conditionalsubscript𝑚𝑗superscriptsubscript𝜇𝑖subscriptitalic-ϕsuperscriptsubscriptitalic-ϕsubscriptΣ𝑖subscriptitalic-ϕsuperscriptsubscript𝜎𝑗2\displaystyle=\mathcal{N}(m_{j*}\>|\>\mu_{i}^{\intercal}\phi_{*},\,\phi_{*}^{% \intercal}\Sigma_{i}\phi_{*}+\sigma_{j}^{2}\big{)}\,.= caligraphic_N ( italic_m start_POSTSUBSCRIPT italic_j ∗ end_POSTSUBSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (40)

For ease of reference, we refer to the mean of the posterior predictive as:

r^(Ti,Ta)=μiϕ(Ti,Ta)^𝑟subscript𝑇𝑖subscript𝑇𝑎superscriptsubscript𝜇𝑖italic-ϕsubscript𝑇𝑖subscript𝑇𝑎\displaystyle\hat{r}(T_{i*},T_{a*})=\mu_{i}^{\intercal}\phi(T_{i*},T_{a*})over^ start_ARG italic_r end_ARG ( italic_T start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a ∗ end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_ϕ ( italic_T start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a ∗ end_POSTSUBSCRIPT ) (41)

for an arbitrary numerical temperature Tisubscript𝑇𝑖T_{i*}italic_T start_POSTSUBSCRIPT italic_i ∗ end_POSTSUBSCRIPT.

IV Experiments

We perform experiments111Code: github.com/biaslab/CCTA2024-BIDconvection on a heated rod demonstrator involving 3 aluminum blocks separated by insulating discs (see Figure 1 top). The first experiment involves a simulation of the system and the second experiment involves measurements from the physical device. Only the first block receives heat input. The conductance matrix is of the form:

K=[k12k120k12(k12+k23)k230k23k23],𝐾matrixsubscript𝑘12subscript𝑘120subscript𝑘12subscript𝑘12subscript𝑘23subscript𝑘230subscript𝑘23subscript𝑘23\displaystyle K=\begin{bmatrix}-k_{12}&k_{12}&0\\ k_{12}&-(k_{12}+k_{23})&k_{23}\\ 0&k_{23}&-k_{23}\end{bmatrix}\,,italic_K = [ start_ARG start_ROW start_CELL - italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL - ( italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT + italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT ) end_CELL start_CELL italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT end_CELL start_CELL - italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (42)

whose kijsubscript𝑘𝑖𝑗k_{ij}italic_k start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are assumed to be known or estimated separately using a maximum likelihood or MAP estimator. Further details are explained in the following two subsections.

Refer to caption
Refer to caption
Refer to caption
Figure 1: (Top) Photo of heated rod demonstrator, with 3 blocks, 2 insulation discs, 13 temperature sensors and 1 heater. (Bottom left) Data simulated according to Eq. 3 with parameters described in Sec. IV-A. (Bottom right) Data measured from demonstrator with parameters described in Sec. IV-B.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Experiment on data from simulated system. Top row: the first three subplots (left-to-right) show the value of the nonlinear convection function over visited temperature states, the GP estimates of this function and the posterior predictive distribution. The fourth subplot shows a forward simulation with the mean posterior predictive and the fifth subplot shows the error between simulation with the true and identified convection function. Bottom row: first three subplots show standard polynomials fitted to residuals left by a state-space model without nonlinear convection terms. The fourth subplot shows a forward simulation using the regression function fit to the residuals, which deviates far from the forward simulation with the true convection function. The error between the two is shown in the fifth subplot.

IV-A Simulated data

In the simulated system, each aluminum block is modeled as a uniform component. Conductance of the nylon pads is set to k12=k23=10W/(mKk_{12}=k_{23}=10\,\mathrm{W}/(\mathrm{m}\mathrm{K}italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT = 10 roman_W / ( roman_mK) and the mass times specific heat capacity parameter is set to micp,i=1000J/Ksubscript𝑚𝑖subscriptc𝑝𝑖1000JKm_{i}\mathrm{c}_{p,i}=1000\,\mathrm{J}/\mathrm{K}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_c start_POSTSUBSCRIPT italic_p , italic_i end_POSTSUBSCRIPT = 1000 roman_J / roman_K for all components. The ambient temperature is fixed to 21°C21°C21\,\degree\mathrm{C}21 ° roman_C, the heat transfer coefficient hasubscript𝑎h_{a}italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is set to 2.0W/(m2K)2.0Wsuperscriptm2K2.0\,\mathrm{W}/(\mathrm{m}^{2}\mathrm{K})2.0 roman_W / ( roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_K ) and the outer surface areas of all the blocks are set to ai=1.0m2subscript𝑎𝑖1.0superscriptm2a_{i}=1.0\,\mathrm{m}^{2}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1.0 roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The first block receives 100W100W100\,\mathrm{W}100 roman_W of heating after 100100100100 seconds. We used a simple surrogate for the nonlinear convection function: r(Ta,Ti)=(TaTi)3/100𝑟subscript𝑇𝑎subscript𝑇𝑖superscriptsubscript𝑇𝑎subscript𝑇𝑖3100r(T_{a},T_{i})=(T_{a}-T_{i})^{3}/100italic_r ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT / 100. Its main feature is that when the block temperature is much higher than the ambient temperature, convection will cause the block to cool rapidly. We simulated the temperatures forward in time for 1000100010001000 steps of Δt=1Δ𝑡1\Delta t=1roman_Δ italic_t = 1 seconds, using DifferentialEquations.jl [17]. Finally, we add zero-mean Gaussian distributed noise with a variance of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT to the simulated states, to generate artificial measurements. The state prior distribution’s parameters are m¯0=[21 21 21]subscript¯𝑚0delimited-[]212121\bar{m}_{0}=[21\ 21\ 21]over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ 21 21 21 ] and S¯0=Isubscript¯𝑆0𝐼\bar{S}_{0}=Iover¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_I. Bayesian smoothing was implemented with RxInfer.jl, which also produces the marginal likelihood in Eq. 31 [18]. Derivatives were computed with Optim.jl [19].

The top row in Figure 2 shows results using the proposed method. The first three subplots (from left-to-right) show the value of r(Ti,Ta=21)𝑟subscript𝑇𝑖subscript𝑇𝑎21r(T_{i},\,T_{a}=21)italic_r ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 21 ) for the temperatures that the blocks experienced during the simulation (solid lines) as well as the GPLFM’s state estimates (red = T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, blue = T2subscript𝑇2T_{2}italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and orange = T3subscript𝑇3T_{3}italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT; marker size proportional to state variance). The black dotted line with ribbon shows the mean and standard deviation of the posterior predictive distribution in Eq. 40, under a third-order Bayesian regression model with μi0=[0 0 0],Σi0=103Iformulae-sequencesubscript𝜇𝑖0superscriptdelimited-[]000subscriptΣ𝑖0superscript103𝐼\mu_{i0}=[0\,0\,0]^{\intercal},\Sigma_{i0}=10^{3}Iitalic_μ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT = [ 0 0 0 ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_I. To test this estimate of the convection function, we simulated the system for 1000100010001000 seconds under starting temperatures of 25°C25°C25\,\degree\mathrm{C}25 ° roman_C and with heat input of 100 WW\mathrm{W}roman_W from 120120120120 to 600600600600 seconds. The fourth subplot shows the true simulation (dotted lines) and a simulation using the mean of the posterior predictive (i.e., Eq.41; solid lines). The fifth subplot shows the error between the simulations, producing an overall mean-squared error over time of 0.0110.0110.0110.011.

We compare the GPLFM with an offline estimate of the effect of the nonlinear convection function. First, run the state-space model without a nonlinear convection term on the simulated temperature measurements (note that this requires a small noise injection of Q=108I𝑄superscript108𝐼Q=10^{-8}Iitalic_Q = 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT italic_I). Then, calculate the post-fit residuals, i.e., yikmiksubscript𝑦𝑖𝑘subscript𝑚𝑖𝑘y_{ik}-m_{ik}italic_y start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT for all blocks i𝑖iitalic_i and time k𝑘kitalic_k. Lastly, fit a standard third-order polynomial regression model on the residuals. The first three subplots on the bottom row of Figure 2 show the residuals as a function of the temperature state estimates (red = block 1, blue = block 2, orange = block 3), with the estimated polynomial overlaid (black dotted line). Note that the polynomials fit well. The fourth subplot shows the effect of simulating the temperatures forward with the polynomial regression instead of the true convection function and the fifth subplot shows the error between the simulation with the true convection function and the regression function. The error is much larger in this case (a mean-squared error over time of 6.4736.4736.4736.473), which highlights the value of the GPLFM in estimating the effect of nonlinear convection dynamically.

For the hyperparameter optimization of the GPLFM, the prior distributions had parameters αl=αγ=5.0subscript𝛼𝑙subscript𝛼𝛾5.0\alpha_{l}=\alpha_{\gamma}=5.0italic_α start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = 5.0 and βl=βγ=0.1subscript𝛽𝑙subscript𝛽𝛾0.1\beta_{l}=\beta_{\gamma}=0.1italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = 0.1. These favour estimates in the region of 1111 to 100100100100. Figure 3 show the prior and posterior distributions for each parameter. For l𝑙litalic_l, the posterior distribution moves to much larger length scales but also becomes wider indicating that there is a high degree of uncertainty on this estimate. For γ𝛾\gammaitalic_γ, the posterior concentrates sharply around roughly 20202020, with high certainty.

Refer to caption
Figure 3: Prior and approximate posterior distributions under Laplace’s method for the kernel hyperparameters l𝑙litalic_l and γ𝛾\gammaitalic_γ.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: (Top row) GP states as a function of temperature states for sensors 4444 (red), 9999 (blue), and 12121212 (orange), with 3d-order polynomial fits (mean posterior predictive distributions; black dashed). (Bottom row) Simulation error (measurement - mean prediction) for GPLFM vs pure SSM methods.

IV-B Measured data

In the physical device, block 1 weighs m1=0.276subscript𝑚10.276m_{1}=0.276italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.276 kg, block 2 weighs m2=0.160subscript𝑚20.160m_{2}=0.160italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.160 kg and block 3 weighs m3=0.74subscript𝑚30.74m_{3}=0.74italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.74 kg. The specific heat capacity of aluminum is cip=910J/(kgK)subscript𝑐𝑖𝑝910JkgKc_{ip}=910\,\mathrm{J}/(\mathrm{kg}\mathrm{K})italic_c start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT = 910 roman_J / ( roman_kgK ). There are 13 thermistors, with #1 to #5 on block 1, #6 to #10 on block 2 (sensor #7 malfunctioned), and #11 to #13 on block 3. The measurements within a block are highly similar, and we thus consider the middle ones as representative of the three components (#4 = block 1, #9 = block 2, and #12 = block 3). The conductance parameters of the pads as well as the heat transfer coefficient hasubscript𝑎h_{a}italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT are identified during a separate experiment after reaching a steady-state, yielding k12=0.272W/(mK)subscript𝑘120.272WmKk_{12}=0.272\,\mathrm{W}/(\mathrm{m}\mathrm{K})italic_k start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = 0.272 roman_W / ( roman_mK ), k23=0.218W/(mK)subscript𝑘230.218WmKk_{23}=0.218\,\mathrm{W}/(\mathrm{m}\mathrm{K})italic_k start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT = 0.218 roman_W / ( roman_mK ) and ha=7.75W/(m2K)subscript𝑎7.75Wsuperscriptm2Kh_{a}=7.75\,\mathrm{W}/(\mathrm{m}^{2}\mathrm{K})italic_h start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 7.75 roman_W / ( roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_K ). The surface areas of the blocks are a1=0.0066m2subscript𝑎10.0066superscriptm2a_{1}=0.0066\,\mathrm{m}^{2}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.0066 roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, a2=0.0055m2subscript𝑎20.0055superscriptm2a_{2}=0.0055\,\mathrm{m}^{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.0055 roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and a3=0.0037m2subscript𝑎30.0037superscriptm2a_{3}=0.0037\,\mathrm{m}^{2}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.0037 roman_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Figure 4 (top row) shows the GP states (marker size proportional to state variance) as a function of the temperature states (block 1 = red, block 2 = blue, block 3 = orange), with the mean of the posterior predictive distribution of the polynomial regression model overlaid (dotted black line). In general, the GP states are fairly smooth as a function of temperature and the polynomial fits well. Only the first block shows something which is not picked up by the polynomial regression model: a rapid drop when the system first starts deviating from the ambient temperature. The bottom row in the figure compares the open-loop simulation of the state-space model with (GPLFM; using the mean posterior predictive of the polynomial regression) and without an identified nonlinear convection term (SSM). The error is the difference between the temperature measurements and the predicted temperatures. Note that the simulation with the identified nonlinear convection term is closer to zero than without the term.

V Discussion

The value of being able to identify nonlinear convection quickly and with quantified uncertainty lies in the fact that it enables optimizing a subsequent step in a control process. For example, maintaining precision on a motion control system or optimizing the placement of cooling fans.

We chose the exponential kernel covariance function as it generates a scalar SDE. One could alternatively choose a higher-order Whittle-Matérn, but that leads to vector SDE’s. For example, a smoothness parameter of 3/2323/23 / 2 generates a two-dimensional system and 5/2525/25 / 2 generates a three-dimensional one [10]. Recall that the smoothing procedure scales cubically in state-space dimensionality, O(D3N)𝑂superscript𝐷3𝑁O(D^{3}N)italic_O ( italic_D start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_N ). D𝐷Ditalic_D is related to the dimensionality of the SDE L𝐿Litalic_L through D=(2+L)M𝐷2𝐿𝑀D=(2+L)Mitalic_D = ( 2 + italic_L ) italic_M. Increasing L𝐿Litalic_L increases the computational cost drastically.

VI Conclusions

We proposed a procedure to estimate the effects of nonlinear convection in a lumped-element heat transfer dynamics model. The procedure is based on a state-space model with known conductance and linear convection effects, augmented with a Gaussian process. Through Bayesian smoothing, we obtain states for temperatures and through Laplace’s method, we obtain approximate posterior distributions for the GP kernel hyperparameters. We fitted a Bayesian polynomial regression model that predicts the GP states from the temperature states and ambient temperature measurements, and used it to simulate the effects of nonlinear convection.

References

  • [1] E. Evers, N. van Tuijl, R. Lamers, and T. Oomen, “Identifying thermal dynamics for precision motion control,” IFAC-PapersOnLine, vol. 52, no. 15, pp. 73–78, 2019.
  • [2] E. Evers, N. van Tuijl, R. Lamers, B. de Jager, and T. Oomen, “Fast and accurate identification of thermal dynamics for precision motion control: Exploiting transient data and additional disturbance inputs,” Mechatronics, vol. 70, p. 102401, 2020.
  • [3] S. Cai, Z. Wang, S. Wang, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks for heat transfer problems,” Journal of Heat Transfer, vol. 143, no. 6, p. 060801, 2021.
  • [4] M. Alvarez, D. Luengo, and N. D. Lawrence, “Latent force models,” in Artificial Intelligence and Statistics, pp. 9–16, PMLR, 2009.
  • [5] M. A. Alvarez, D. Luengo, and N. D. Lawrence, “Linear latent force models using Gaussian processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 2693–2705, 2013.
  • [6] S. Särkkä, M. A. Alvarez, and N. D. Lawrence, “Gaussian process latent force models for learning and stochastic control of physical systems,” IEEE Transactions on Automatic Control, vol. 64, pp. 2953–2960, 2018.
  • [7] T. J. Rogers, K. Worden, and E. J. Cross, “Bayesian joint input-state estimation for nonlinear systems,” Vibration, vol. 3, no. 3, pp. 281–303, 2020.
  • [8] T. J. Rogers and T. Friis, “A latent restoring force approach to nonlinear system identification,” Mechanical Systems and Signal Processing, vol. 180, p. 109426, 2022.
  • [9] S. Ghosh, S. Reece, A. Rogers, S. Roberts, A. Malibari, and N. R. Jennings, “Modeling the thermal dynamics of buildings: A latent-force-model-based approach,” ACM Transactions on Intelligent Systems and Technology, vol. 6, no. 1, pp. 1–27, 2015.
  • [10] J. Hartikainen and S. Särkkä, “Kalman filtering and smoothing solutions to temporal Gaussian process regression models,” in IEEE International Workshop on Machine Learning for Signal Processing, pp. 379–384, 2010.
  • [11] J. Hartikainen and S. Sarkka, “Sequential inference for latent force models,” arXiv:1202.3730, 2012.
  • [12] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning, vol. 1. Springer, 2006.
  • [13] S. Särkkä and A. Solin, Applied stochastic differential equations, vol. 10. Cambridge University Press, 2019.
  • [14] S. Särkkä and L. Svensson, Bayesian filtering and smoothing, vol. 17. Cambridge University Press, 2023.
  • [15] A. Svensson, J. Dahlin, and T. B. Schön, “Marginalizing Gaussian process hyperparameters using sequential Monte Carlo,” in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 477–480, 2015.
  • [16] K. P. Murphy, Machine learning: a probabilistic perspective. MIT press, 2012.
  • [17] C. Rackauckas and Q. Nie, “DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equations in Julia,” Journal of Open Research Software, vol. 5, no. 1, 2017.
  • [18] D. Bagaev, A. Podusenko, and B. de Vries, “Rxinfer: A Julia package for reactive real-time Bayesian inference,” Journal of Open Source Software, vol. 8, no. 84, p. 5161, 2023.
  • [19] P. Mogensen and A. Riseth, “Optim: A mathematical optimization package for Julia,” Journal of Open Source Software, vol. 3, 2018.