inlabru: software for fitting latent Gaussian models with non-linear predictors

Finn Lindgren
University of Edinburgh
[email protected]
   Fabian Bachl
University of Edinburgh
   Janine Illian
University of Glasgow
   Man Ho Suen
University of Edinburgh
   Håvard Rue
King Abdullah University of Science and Technology
   Andrew E. Seaton
University of Glasgow
Abstract

The integrated nested Laplace approximation (INLA) method has become a popular approach for computationally efficient approximate Bayesian computation. In particular, by leveraging sparsity in random effect precision matrices, INLA is commonly used in spatial and spatio-temporal applications. However, the speed of INLA comes at the cost of restricting the user to the family of latent Gaussian models and the likelihoods currently implemented in INLA, the main software implementation of the INLA methodology.

inlabru is a software package that extends the types of models that can be fitted using INLA by allowing the latent predictor to be non-linear in its parameters, moving beyond the additive linear predictor framework to allow more complex functional relationships. For inference it uses an approximate iterative method based on the first-order Taylor expansion of the non-linear predictor, fitting the model using INLA for each linearised model configuration.

inlabru automates much of the workflow required to fit models using R-INLA, simplifying the process for users to specify, fit and predict from models. There is additional support for fitting joint likelihood models by building each likelihood individually. inlabru also supports the direct use of spatial data structures, such as those implemented in the sf and terra packages.

In this paper we outline the statistical theory, model structure and basic syntax required for users to understand and develop their own models using inlabru. We evaluate the approximate inference method using a Bayesian method checking approach. We provide three examples modelling simulated spatial data that demonstrate the benefits of the additional flexibility provided by inlabru.

1 Introduction

The approximate Bayesian inference package INLA (Rue et al., 2009) is an implementation of the integrated nested Laplace approximations (INLA) method of inference for latent Gaussian models (LGMs). INLA is a fast and accurate approximate inference method that provides an alternative to MCMC for Bayesian inference on LGMs. INLA uses computational methods for sparse Gaussian Markov random fields (GMRFs) (Rue and Held, 2005) that greatly speed up inference for models with large numbers of latent Gaussian parameters. This efficiency has led to INLA being a go-to choice over MCMC in many contexts since, for the class of LGMs, the computational benefits of INLA far outweigh the minimal approximation error. As such, INLA is a popular choice in spatio-temporal modelling (Lindgren et al., 2022; Krainski et al., 2018; Bakka et al., 2018; Blangiardo et al., 2013) and has been applied in a diverse set of fields such as ecology (Martino et al., 2021; Illian et al., 2013), astronomy (Levis et al., 2021), public health (Moraga, 2019), conditional extremes (Simpson et al., 2023), seismology (Naylor et al., 2023), econometrics (Bivand et al., 2014) and more.

However, while the class of LGMs is large, as evidenced by the wide variety of applications, there are many models that cannot be fitted using the INLA methodology. inlabru is a software package that extends the class of models that can be fitted using INLA by allowing the predictor to be a non-linear (deterministic) function of the latent Gaussian parameters (Bachl et al., 2019). Inference on such models is achieved by an iterative fitting scheme by using INLA to fit successive linearised model configurations.

This development opens up many new modelling possibilities for fitting models with INLA. The additional flexibility of non-linear predictors allows users to parameterise relationships between predictors and response variables in a bespoke and context-informed way. Traditionally, software for fitting LGMs requires the predictor to be linear in its parameters (e.g. mgcv (Wood, 2017), glmmmTMB (Brooks et al., 2017), and lme4 (Bates et al., 2015). However, non-linear relationships appear in many applications, such as modelling the detectability of animals in ecological field surveys (Martino et al., 2021; Yuan et al., 2017; Buckland et al., 2015; Millar and Fryer, 1999), exposure-response models in public health modelling (Nasari et al., 2016; Gasparrini, 2014; Ritz, 2010), functional response ecology (Matthiopoulos et al., 2020; Rosenbaum and Rall, 2018; Smout et al., 2010; Real, 1979) and consumer choice models in economics (Feng et al., 2022).

Non-linear predictors also appear in models such as self-exciting point processes (Serafini et al., 2023; Hawkes, 1971), population dynamics (Newman et al., 2014), kernel-smoothed effects (Bowman and Azzalini, 1997) and lagged-effects models (Shumway and Stoffer, 2017). inlabru allows parametric relationships such as these to be incorporated into LGMs and estimated efficiently using the INLA methodology. This opens up the computational efficiency of INLA to a much broader class of applications.

inlabru is a software interface for INLA that greatly simplifies the process of specifying, fitting and sampling from INLA models. The process of ‘stack building’, familiar to users of INLA, is now entirely automated based on the model definition. inlabru also supports the use of spatial data objects directly when defining, fitting, and sampling from models. Objects of sf (Pebesma, 2018) and terra (Hijmans, 2024) types are supported, as well as legacy support for sp (Pebesma and Bivand, 2005) and raster (Hijmans, 2023) types. inlabru also provides support for working with joint likelihood models by separating the process of constructing likelihoods and fitting models. By providing predict() and generate() methods for inlabru models, inlabru greatly simplifies the process of generating predictions from fitted models.

This additional functionality provided by inlabru can be used in combination with several add-on packages that extend the basic INLA latent models, using hooks that tells inlabru how to map between covariate inputs and latent models, such as MetricGraph for defining and fitting random fields on networks (Bolin et al., 2024, 2023), dirinla for Dirichlet regression (Joaquín Martínez-Minaya and Finn Lindgren, 2022), ETAS.inlabru for fitting self-exciting point process models in seismology (Naylor et al., 2023), and the rSPDE and INLAspacetime packages (Bolin and Simas, 2023; Bolin et al., 2024; Krainski et al., 2023) that fractional prameter spatial models and non-separable covariance models.

inlabru is therefore both an extension of, and wrapper for, the existing INLA implementation. The aims of this software paper are to introduce the iterative INLA method for fitting models with non-linear predictors, to motivate the software syntax that simplifies the INLA workflow, and to provide numerous examples to demonstrate what can be achieved using inlabru that users can use as a starting point for their own work.

1.1 A simple example

The following example demonstrates how to define a model, fit it, and generate posterior samples using inlabru. It contains all the basic building blocks of the inlabru workflow that will be present in later examples.

We use a simulated dataset with observations

zi=β+f(𝒔i)+ϵi,i=,n,formulae-sequencesubscript𝑧𝑖𝛽𝑓subscript𝒔𝑖subscriptitalic-ϵ𝑖𝑖𝑛z_{i}=\beta+f(\boldsymbol{s}_{i})+\epsilon_{i},\quad i=\dots,n,italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_β + italic_f ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = … , italic_n , (1)

where 𝒔isubscript𝒔𝑖\boldsymbol{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the location of observation i𝑖iitalic_i, β𝛽\betaitalic_β is an intercept parameter, f𝑓fitalic_f is a Gaussian random field with Matérn covariance and ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is unstructured Gaussian noise.

The data is stored as an sf spatial points object (Figure 1 A) which can be used directly with inlabru. The data is included with inlabru under the name toypoints. The following code shows how to load the data, fit a model, and generate model predictions.

R> library(inlabru)R> library(INLA)R> library(sf)R>R> # Load dataR> point_data <- toypoints$points # sf point dataR> mesh <- toypoints$mesh # SPDE meshR> pred_locs <- toypoints$pred_locs # sf prediction point locationsR>R> # Define model componentsR>R> # Spatially structure random effectR> matern <- inla.spde2.pcmatern(mesh,+    prior.range = c(2, 0.05),+    prior.sigma = c(2, 0.05)+  )R>R> cmp <- ~ Intercept(1) ++    grf(+      main = geometry,+      model = matern+    )R>R> # Construct likelihoodR> lik <- like(+    formula = z ~ .,+    data = point_data,+    family = "gaussian"+  )R>R> # Fit modelR> fit <- bru(lik,+    components = cmp+  )R>R> # Make predictionsR> predictions <- predict(fit,+    pred_locs,+    formula = ~ Intercept + grf+  )

This example contains the essential elements of the inlabru workflow. The user defines model components using a formula syntax that allows users to choose meaningful names for model components. Here the labels Intercept and grf (an abbreviation of Gaussian random field) were used to name the components of the model but they equally well could be something else. These are labels for components, not functions, despite the similarity to the function syntax.

The main argument describes the input data for each component. The spatial information is stored in the geometry column of the sf object and so this is used when defining the random field component. The software automatically detects the type of input, the type of component, and interally constructs the prior precision and model design matrix for that component.

An intercept parameter is associated with a column of ones in the model design matrix, and the shorthand notation Intercept(1) reflects this. Alternative methods to define an intercept are Intercept(ones, model = ’linear’) if ones is the name of a covariate in the data or Intercept(rep(1,n), model = ’linear’) if n is an object stored in the global R environment. The input data for a component is allowed to be a general R expression song as it returns an object that is appropriate for that type of model component.

The likelihood is constructed in a separate step to model fitting with the likelihood object created by the like() function. The sf point data object is used directly, requiring no pre-processing of data to a standard data.frame. This automates the process of ‘stack building’ familiar to INLA users.

After fitting the model, predictions are generated using a predict() method. The formula object for the prediction can be a generic R expression that references model components using the user-defined names. This greatly simplifies the process for generating predictions from INLA models. Figure 1 shows the data and the posterior mean prediction.

Refer to caption
Figure 1: A: simulated point data; B: posterior mean prediction

Figure 1B was produced using the predictions object. The predict() method returns an object in the same data format as was used in the predict call which, in this case, is an sf points object. Support for plotting sf data objects is available in the ggplot2 package (Wickham, 2009). The code to produce Figure 1B is as simple as

R> ggplot() ++    gg(+      data = predictions,+      aes(fill = mean),+      geom = "tile"+    ) ++    scale_colour_viridis_c() ++    theme_classic() where mean is one of the summary statistics returned by predict(). See ?predict.bru for a list of all summary statistics. In general, the prediction data can be in any format for which the component definitions make sense. This gives users flexibility to conceptually separate data that is used for model fitting from data that is used for prediction.

This example is not simple. It involves spatial data, a spatially structured random effect, and generating model predictions. Users already familiar with such models in INLA will recognise that the inlabru code required to run this is more concise and readable. This allows users to focus more on the model structure rather than wrangling data and design matrices into a format expected by INLA.

The rest of the paper is structured as follows: Section 2 reviews the standard INLA approach and describes the iterative INLA method for estimation with non-linear predictors, Section 3 covers basic package structure and syntax for specifying, fitting and predicting from models, and Section 4 presents a Bayesian method checking simulation to validate the iterative INLA inference method. Section 5 presents syntax examples for fitting spatial models. Examples include a standard LGM with a Besag-York-Mollié random effect, a non-linear predictor example, aggregating a continuously defined random field to discrete areal units, and a joint likelihood model. Finally, Section 6 draws links to other methods that extend the class of models that can be fitted using INLA, discusses existing published research that uses the inlabru non-linear predictors feature, existing software that depends on inlabru, and, finally, ideas for the future development of inlabru.

2 Iterative INLA

2.1 Standard INLA

The INLA method is an approach to compute fast approximate posterior distributions for Bayesian generalised additive models. The class of models that INLA was developed to address are known as latent Gaussian models (LGMs).

The hierarchical structure of a LGM with latent Gaussian vector 𝒖𝒖\boldsymbol{u}bold_italic_u, covariance parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, and response variable 𝒚𝒚\boldsymbol{y}bold_italic_y, can be written as

𝜽𝜽\displaystyle\boldsymbol{\theta}bold_italic_θ p(𝜽)similar-toabsent𝑝𝜽\displaystyle\sim p(\boldsymbol{\theta})∼ italic_p ( bold_italic_θ )
𝒖|𝜽conditional𝒖𝜽\displaystyle\boldsymbol{u}|\boldsymbol{\theta}bold_italic_u | bold_italic_θ 𝒩(𝝁u,𝑸(𝜽)1)similar-toabsent𝒩subscript𝝁𝑢𝑸superscript𝜽1\displaystyle\sim\mathcal{N}\!\left(\boldsymbol{\mu}_{u},\boldsymbol{Q}(% \boldsymbol{\theta})^{-1}\right)∼ caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_italic_Q ( bold_italic_θ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )
𝜼(𝒖)𝜼𝒖\displaystyle\boldsymbol{\eta}(\boldsymbol{u})bold_italic_η ( bold_italic_u ) =𝑨𝒖absent𝑨𝒖\displaystyle=\boldsymbol{A}\boldsymbol{u}= bold_italic_A bold_italic_u
𝒚|𝒖,𝜽conditional𝒚𝒖𝜽\displaystyle\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}bold_italic_y | bold_italic_u , bold_italic_θ p(𝒚|𝜼(𝒖),𝜽).similar-toabsent𝑝conditional𝒚𝜼𝒖𝜽\displaystyle\sim p(\boldsymbol{y}|\boldsymbol{\eta}(\boldsymbol{u}),% \boldsymbol{\theta}).∼ italic_p ( bold_italic_y | bold_italic_η ( bold_italic_u ) , bold_italic_θ ) .

The latent Gaussian vector 𝒖𝒖\boldsymbol{u}bold_italic_u has a covariance structure that depends on hyper parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ which have prior distribution p(𝜽)𝑝𝜽p(\boldsymbol{\theta})italic_p ( bold_italic_θ ). Each linear predictor element, ηi(𝒖)subscript𝜂𝑖𝒖\eta_{i}(\boldsymbol{u})italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ), is linked to the latent field via a linear map defined by the known, non-random matrix 𝑨𝑨\boldsymbol{A}bold_italic_A, i.e. the additive predictor for observation yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is ηi(𝒖)=𝑨i𝒖subscript𝜂𝑖𝒖subscript𝑨𝑖𝒖\eta_{i}(\boldsymbol{u})=\boldsymbol{A}_{i}\boldsymbol{u}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) = bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_u, where 𝑨isubscript𝑨𝑖\boldsymbol{A}_{i}bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th row of 𝑨𝑨\boldsymbol{A}bold_italic_A. This linear predictor defines the location parameter for the distribution of the data 𝒚𝒚\boldsymbol{y}bold_italic_y, via a (possibly non-linear) link function g1()superscript𝑔1g^{-1}(\cdot)italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ). Observations are assumed to be conditionally independent given 𝜼𝜼\boldsymbol{\eta}bold_italic_η and 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, so that p(𝒚|𝜼(𝒖),𝜽)=ip(yi|ηi(𝒖),𝜽)𝑝conditional𝒚𝜼𝒖𝜽subscriptproduct𝑖𝑝conditionalsubscript𝑦𝑖subscript𝜂𝑖𝒖𝜽p(\boldsymbol{y}|\boldsymbol{\eta}(\boldsymbol{u}),\boldsymbol{\theta})=\prod_% {i}p(y_{i}|\eta_{i}(\boldsymbol{u}),\boldsymbol{\theta})italic_p ( bold_italic_y | bold_italic_η ( bold_italic_u ) , bold_italic_θ ) = ∏ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) , bold_italic_θ ).

In the classic formulation of INLA (Rue et al., 2009), the linear predictors are included as part of an augmented latent Gaussian vector [𝜼,𝒖]superscript𝜼𝒖\left[\boldsymbol{\eta},\boldsymbol{u}\right]^{\intercal}[ bold_italic_η , bold_italic_u ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT so that inference on the predictors is included as part of the overall estimation of the latent field. Since 𝜼𝜼\boldsymbol{\eta}bold_italic_η is a linear combination of 𝒖𝒖\boldsymbol{u}bold_italic_u, a small amount of noise is added to avoid a singular precision. The INLA package has recently implemented an alternative approach which avoids the need to constructed the augmented latent Gaussian vector, which can be prohibitively large for certain classes of models (Van Niekerk et al., 2023; van Niekerk and Rue, 2024). We present the classic approach below, see Section 2.3 for a brief summary of the recent INLA developments.

INLA is a method to approximate the joint posterior density p(𝜽|𝒚)𝑝conditional𝜽𝒚p(\boldsymbol{\theta}|\boldsymbol{y})italic_p ( bold_italic_θ | bold_italic_y ), the marginal posterior distributions (uj|𝒚)conditionalsubscript𝑢𝑗𝒚(u_{j}|\boldsymbol{y})( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_y ) and allows for sampling from the joint posterior (𝒖|θ)conditional𝒖𝜃(\boldsymbol{u}|\theta)( bold_italic_u | italic_θ ), where 𝒖𝒖\boldsymbol{u}bold_italic_u now represents the augmented latent Gaussian vector. Rue et al. (2009) introduced the approach and Martins et al. (2013) includes additional details.

2.2 The classic INLA method

The joint posterior density of the hyperparameters is approximated as

p^(𝜽|𝒚)p(𝒖,𝜽,𝒚)pG(𝒖|𝜽,𝒚)|𝒖=𝒖(𝜽),proportional-to^𝑝conditional𝜽𝒚evaluated-at𝑝𝒖𝜽𝒚subscript𝑝𝐺conditional𝒖𝜽𝒚𝒖superscript𝒖𝜽\hat{p}(\boldsymbol{\theta}|\boldsymbol{y})\propto\left.\frac{p(\boldsymbol{u}% ,\boldsymbol{\theta},\boldsymbol{y})}{p_{G}(\boldsymbol{u}|\boldsymbol{\theta}% ,\boldsymbol{y})}\right|_{\boldsymbol{u}=\boldsymbol{u}^{*}(\boldsymbol{\theta% })},over^ start_ARG italic_p end_ARG ( bold_italic_θ | bold_italic_y ) ∝ divide start_ARG italic_p ( bold_italic_u , bold_italic_θ , bold_italic_y ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_θ , bold_italic_y ) end_ARG | start_POSTSUBSCRIPT bold_italic_u = bold_italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) end_POSTSUBSCRIPT ,

where pG(𝒖|𝜽,𝒚)subscript𝑝𝐺conditional𝒖𝜽𝒚p_{G}(\boldsymbol{u}|\boldsymbol{\theta},\boldsymbol{y})italic_p start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_θ , bold_italic_y ) is the Gaussian approximation to p(𝒖|𝜽,𝒚)𝑝conditional𝒖𝜽𝒚p(\boldsymbol{u}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( bold_italic_u | bold_italic_θ , bold_italic_y ) centred at the mode 𝒖(𝜽)superscript𝒖𝜽\boldsymbol{u}^{*}(\boldsymbol{\theta})bold_italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) for a given 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. This is equivalent to the Laplace approximation of the marginal posterior (Tierney and Kadane, 1986). The marginal posterior p(θk|𝒚)𝑝conditionalsubscript𝜃𝑘𝒚p(\theta_{k}|\boldsymbol{y})italic_p ( italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_italic_y ) is approximated using an interpolation of p^(𝜽|𝒚)^𝑝conditional𝜽𝒚\hat{p}(\boldsymbol{\theta}|\boldsymbol{y})over^ start_ARG italic_p end_ARG ( bold_italic_θ | bold_italic_y ) based on a grid search of the parameter space for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. For this reason INLA is generally applied in contexts where 𝜽𝜽\boldsymbol{\theta}bold_italic_θ is low dimensional.

The marginal posterior density p(ui|𝒚)𝑝conditionalsubscript𝑢𝑖𝒚p(u_{i}|\boldsymbol{y})italic_p ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_y ) is approximated as

p^(ui|𝒚)=mp^(ui|𝜽(m),𝒚)p^(𝜽(m)|𝒚)Δ𝜽(m),^𝑝conditionalsubscript𝑢𝑖𝒚subscript𝑚^𝑝conditionalsubscript𝑢𝑖superscript𝜽𝑚𝒚^𝑝conditionalsuperscript𝜽𝑚𝒚Δsuperscript𝜽𝑚\hat{p}(u_{i}|\boldsymbol{y})=\sum_{m}\hat{p}(u_{i}|\boldsymbol{\theta}^{(m)},% \boldsymbol{y})\hat{p}(\boldsymbol{\theta}^{(m)}|\boldsymbol{y})\Delta% \boldsymbol{\theta}^{(m)},over^ start_ARG italic_p end_ARG ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_y ) = ∑ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over^ start_ARG italic_p end_ARG ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT , bold_italic_y ) over^ start_ARG italic_p end_ARG ( bold_italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT | bold_italic_y ) roman_Δ bold_italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ,

where p^(ui|𝜽,𝒚)^𝑝conditionalsubscript𝑢𝑖𝜽𝒚\hat{p}(u_{i}|\boldsymbol{\theta},\boldsymbol{y})over^ start_ARG italic_p end_ARG ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) is an approximate density given below and Δ𝜽(m)Δsuperscript𝜽𝑚\Delta\boldsymbol{\theta}^{(m)}roman_Δ bold_italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT is an integration weight associated with each grid search location for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, indexed by m𝑚mitalic_m.

Rue and Martino (2007) originally proposed a Gaussian approximation for p^(ui|𝜽,𝒚)^𝑝conditionalsubscript𝑢𝑖𝜽𝒚\hat{p}(u_{i}|\boldsymbol{\theta},\boldsymbol{y})over^ start_ARG italic_p end_ARG ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) but found that this was not sufficiently accurate. Instead, Rue et al. (2009) improved on this by using another application of the Laplace approximation:

p^(ui|𝜽,𝒚)p(𝒖,𝜽,𝒚)pG(𝒖i|ui,𝜽,𝒚)|𝒖i=𝒖i(ui,𝜽),proportional-to^𝑝conditionalsubscript𝑢𝑖𝜽𝒚evaluated-at𝑝𝒖𝜽𝒚subscript𝑝𝐺conditionalsubscript𝒖𝑖subscript𝑢𝑖𝜽𝒚subscript𝒖𝑖superscriptsubscript𝒖𝑖subscript𝑢𝑖𝜽\hat{p}(u_{i}|\boldsymbol{\theta},\boldsymbol{y})\propto\left.\frac{p(% \boldsymbol{u},\boldsymbol{\theta},\boldsymbol{y})}{p_{G}(\boldsymbol{u}_{-i}|% u_{i},\boldsymbol{\theta},\boldsymbol{y})}\right|_{\boldsymbol{u}_{-i}=% \boldsymbol{u}_{-i}^{*}(u_{i},\boldsymbol{\theta})},over^ start_ARG italic_p end_ARG ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) ∝ divide start_ARG italic_p ( bold_italic_u , bold_italic_θ , bold_italic_y ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ , bold_italic_y ) end_ARG | start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) end_POSTSUBSCRIPT ,

where pG(𝒖i|ui,𝜽,𝒚)subscript𝑝𝐺conditionalsubscript𝒖𝑖subscript𝑢𝑖𝜽𝒚p_{G}(\boldsymbol{u}_{-i}|u_{i},\boldsymbol{\theta},\boldsymbol{y})italic_p start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ , bold_italic_y ) is the Gaussian approximation to p(𝒖i|ui,𝜽,𝒚)𝑝conditionalsubscript𝒖𝑖subscript𝑢𝑖𝜽𝒚p(\boldsymbol{u}_{-i}|u_{i},\boldsymbol{\theta},\boldsymbol{y})italic_p ( bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ , bold_italic_y ) centred at the mode 𝒖i(ui,𝜽)superscriptsubscript𝒖𝑖subscript𝑢𝑖𝜽\boldsymbol{u}_{-i}^{*}(u_{i},\boldsymbol{\theta})bold_italic_u start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ), given uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. Until 2022, this was the standard method used by the INLA package. Although the joint posterior p(𝒖|𝒚)𝑝conditional𝒖𝒚p(\boldsymbol{u}|\boldsymbol{y})italic_p ( bold_italic_u | bold_italic_y ) is not available directly, Chiuchiolo et al. (2023) propose a method generate samples from the joint posterior via a skew-Gaussian copula approach.

2.3 A new approach for INLA: the variational Bayes correction

An alternative to the classic INLA approach was proposed by Van Niekerk et al. (2023) and van Niekerk and Rue (2024) that is more computationally efficient and avoids the need to include 𝜼𝜼\boldsymbol{\eta}bold_italic_η as part of the latent field. In this case, the joint posterior is represented as p(𝒖,𝜽|𝒚)p(𝜽)p(𝒖|𝜽)ip(yi|(𝑨𝒖)i,𝜽)proportional-to𝑝𝒖conditional𝜽𝒚𝑝𝜽𝑝conditional𝒖𝜽subscriptproduct𝑖𝑝conditionalsubscript𝑦𝑖subscript𝑨𝒖𝑖𝜽p(\boldsymbol{u},\boldsymbol{\theta}|\boldsymbol{y})\propto p(\boldsymbol{% \theta})p(\boldsymbol{u}|\boldsymbol{\theta})\prod_{i}p(y_{i}|(\boldsymbol{A}% \boldsymbol{u})_{i},\boldsymbol{\theta})italic_p ( bold_italic_u , bold_italic_θ | bold_italic_y ) ∝ italic_p ( bold_italic_θ ) italic_p ( bold_italic_u | bold_italic_θ ) ∏ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( bold_italic_A bold_italic_u ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_θ ) and inference for 𝜼𝜼\boldsymbol{\eta}bold_italic_η is dealt with separately to inference for 𝒖𝒖\boldsymbol{u}bold_italic_u and 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. In this new approach, Van Niekerk et al. (2023) return to the idea of using a Gaussian approximation for p(ui|𝜽,𝒚)𝑝conditionalsubscript𝑢𝑖𝜽𝒚p(u_{i}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ). To correct for the inaccuracies identified in Rue and Martino (2007), van Niekerk and Rue (2024) use a variational Bayes correction to the mean of the Gaussian approximation. As shown by Van Niekerk et al. (2023), this approach is more computationally efficient and it was made the default mode of inference from version 22.11.22 of INLA.

For a fixed 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, the Gaussian approximation to p(𝒖|𝜽,𝒚)𝑝conditional𝒖𝜽𝒚p(\boldsymbol{u}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( bold_italic_u | bold_italic_θ , bold_italic_y ) is calculated from a second order Taylor expansion of the likelihood around the mode, with associated precision 𝑸(𝜽)superscript𝑸𝜽\boldsymbol{Q}^{*}(\boldsymbol{\theta})bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ), which has a sparsity structure that depends on the sparsity of the prior precision 𝑸(𝜽)𝑸𝜽\boldsymbol{Q}(\boldsymbol{\theta})bold_italic_Q ( bold_italic_θ ) as well as the non-zero entries in 𝑨𝑨superscript𝑨𝑨\boldsymbol{A}^{\intercal}\boldsymbol{A}bold_italic_A start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_A. The Gaussian approximation is then used to approximate the marginal posteriors as p(uj|𝜽,𝒚)𝖭(μ(𝜽)j,𝑸(𝜽)jj1)𝑝conditionalsubscript𝑢𝑗𝜽𝒚𝖭𝜇subscript𝜽𝑗superscript𝑸subscriptsuperscript𝜽1𝑗𝑗p(u_{j}|\boldsymbol{\theta},\boldsymbol{y})\approx\mathsf{N}(\mu(\boldsymbol{% \theta})_{j},\boldsymbol{Q}^{*}(\boldsymbol{\theta})^{-1}_{jj})italic_p ( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) ≈ sansserif_N ( italic_μ ( bold_italic_θ ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT ), where 𝝁(𝜽)𝝁𝜽\boldsymbol{\mu}(\boldsymbol{\theta})bold_italic_μ ( bold_italic_θ ) is the joint mode of p(𝒖|𝜽,𝒚)𝑝conditional𝒖𝜽𝒚p(\boldsymbol{u}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( bold_italic_u | bold_italic_θ , bold_italic_y ). The posterior density p(uj|𝒚)𝑝conditionalsubscript𝑢𝑗𝒚p(u_{j}|\boldsymbol{y})italic_p ( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_y ) is then approximated by numerical integration over 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, just as in the classic formulation of INLA. For inference on the predictors, the posterior p(𝜼|𝜽,𝒚)𝑝conditional𝜼𝜽𝒚p(\boldsymbol{\eta}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( bold_italic_η | bold_italic_θ , bold_italic_y ) is approximated as a Gaussian density with expected value 𝑨𝝁(𝜽)𝑨𝝁𝜽\boldsymbol{A}\boldsymbol{\mu}(\boldsymbol{\theta})bold_italic_A bold_italic_μ ( bold_italic_θ ) and covariance 𝑨𝑸(𝜽)1𝑨𝑨superscript𝑸superscript𝜽1superscript𝑨\boldsymbol{A}\boldsymbol{Q}^{*}(\boldsymbol{\theta})^{-1}\boldsymbol{A}^{\intercal}bold_italic_A bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT.

Note that although 𝑸(𝜽)superscript𝑸𝜽\boldsymbol{Q}^{*}(\boldsymbol{\theta})bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) is sparse, its inverse is not. However, van Niekerk and Rue (2024) present a computationally efficient method to compute the j𝑗jitalic_j-th diagonal element of 𝑨𝑸(𝜽)1𝑨𝑨superscript𝑸superscript𝜽1superscript𝑨\boldsymbol{A}\boldsymbol{Q}^{*}(\boldsymbol{\theta})^{-1}\boldsymbol{A}^{\intercal}bold_italic_A bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_θ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT, which is all that is required for the approximation to p(ηj|𝜽,𝒚)𝑝conditionalsubscript𝜂𝑗𝜽𝒚p(\eta_{j}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) and avoids the need to compute the full inverse. The posterior density p(ηj|𝒚)𝑝conditionalsubscript𝜂𝑗𝒚p(\eta_{j}|\boldsymbol{y})italic_p ( italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_y ) is then approximated using numerical integration as above.

It is important to note that, although the approximate posteriors densities p(uj|𝜽,𝒚)𝑝conditionalsubscript𝑢𝑗𝜽𝒚p(u_{j}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) and p(ηj|𝜽,𝒚)𝑝conditionalsubscript𝜂𝑗𝜽𝒚p(\eta_{j}|\boldsymbol{\theta},\boldsymbol{y})italic_p ( italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) are Gaussian, the full marginals can deviate from normality due to the numerical integration. This allows for, say, non-symmetric posteriors to be estimated, even when using these Gaussian approximations.

2.4 Linear predictors in LGMs

The structure of LGMs facilitates a modular view of the additive predictor 𝜼(𝒖)=𝑨𝒖𝜼𝒖𝑨𝒖\boldsymbol{\eta}(\boldsymbol{u})=\boldsymbol{A}\boldsymbol{u}bold_italic_η ( bold_italic_u ) = bold_italic_A bold_italic_u, which can be decomposed by thinking of each fixed and random effect as individual model components. By a model component we mean anything that would usually be viewed as a separate additive component of the linear predictor, such as an intercept parameter, the linear effect of a known covariate or a ‘random effect’ component, such as the sparse Gaussian Markov Random Field (GMRF) effects (Rue and Held, 2005) that are implemented in INLA, examples of which include the Stochastic Partial Differential Equation (SPDE) approach to a GRF with Matérn covariance (Lindgren et al., 2011), and many other options including conditional auto-regressive and random walk models.

The matrix 𝑨𝑨\boldsymbol{A}bold_italic_A is the model design matrix. In presentations of mixed effect models this is often decomposed as 𝑨=[𝑿𝒁]𝑨delimited-[]𝑿𝒁\boldsymbol{A}=\left[\boldsymbol{X}\boldsymbol{Z}\right]bold_italic_A = [ bold_italic_X bold_italic_Z ], where 𝑿𝑿\boldsymbol{X}bold_italic_X contains the covariate information for fixed effects and 𝒁𝒁\boldsymbol{Z}bold_italic_Z contains the information to map the random effects to the observations. In the Bayesian LGM setting the distinction between fixed and random effects is somewhat arbitrary. We can think of the design matrix as decomposed into separate sub-matrices for each model component, i.e.

𝑨=[𝑨(1)𝑨(d)]𝑨matrixsuperscript𝑨1superscript𝑨𝑑\boldsymbol{A}=\begin{bmatrix}\boldsymbol{A}^{(1)}&\dots&\boldsymbol{A}^{(d)}% \end{bmatrix}bold_italic_A = [ start_ARG start_ROW start_CELL bold_italic_A start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL … end_CELL start_CELL bold_italic_A start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]

for a model with d𝑑ditalic_d components, viewing each model component as having it’s own ‘component design matrix’, denoted here as 𝑨(j)superscript𝑨𝑗\boldsymbol{A}^{(j)}bold_italic_A start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT for j=1,,d𝑗1𝑑j=1,\dots,ditalic_j = 1 , … , italic_d. The parameter vector can be partitioned into 𝒖=[𝒖(1),,𝒖(d)]𝒖superscript𝒖1superscript𝒖𝑑\boldsymbol{u}=[\boldsymbol{u}^{(1)},\dots,\boldsymbol{u}^{(d)}]bold_italic_u = [ bold_italic_u start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , bold_italic_u start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT ], where each 𝒖(j)superscript𝒖𝑗\boldsymbol{u}^{(j)}bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT is the parameter vector for component j𝑗jitalic_j. Given this decomposition the predictor expression for the i𝑖iitalic_i-th observation can be written as η(𝒖)i=δi+j=1d𝑨i(j)𝒖(j)𝜂subscript𝒖𝑖subscript𝛿𝑖superscriptsubscript𝑗1𝑑subscriptsuperscript𝑨𝑗𝑖superscript𝒖𝑗\eta(\boldsymbol{u})_{i}=\delta_{i}+\sum_{j=1}^{d}\boldsymbol{A}^{(j)}_{i}% \boldsymbol{u}^{(j)}italic_η ( bold_italic_u ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, where 𝑨i(j)subscriptsuperscript𝑨𝑗𝑖\boldsymbol{A}^{(j)}_{i}bold_italic_A start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th row of 𝑨(j)superscript𝑨𝑗\boldsymbol{A}^{(j)}bold_italic_A start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, and δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an optional constant offset term.

For example, the component design matrix for an intercept parameter is a column vector of ones and the component design matrix for the effect of a known covariate is a column vector containing the covariate information for each observation. The component design matrix for an iid Gaussian effect is the identity matrix (or a block diagonal matrix consisting of identity sub-matrices of the correct dimension for each group). The component design matrix for an SPDE effect contains the finite element basis functions evaluated at each observation location. The key idea is that each component of the model predictor can be written as a known matrix multiplied by a parameter vector.

This view is the basis for the implementation in inlabru that automatically constructs the full model design matrix, given the component definitions and data. Each type of component has an associated method for constructing the component design matrix. These are known as bru_mapper() methods as they ‘map parameters to observations’. The user only needs to provide a model component definition and the data and the software does the rest. This component definition can be conceptualised in the mind of the user as ‘the linear effect of covariate x’ or ‘an SPDE effect on latitude and longitude’, for example, with an associated syntax for writing these as an R formula object. The software then parses these definitions and applies the relevant bru_mapper() methods to the data to construct the appropriate component design matrices and full model design matrix. In addition to linear effects, inlabru also supports non-linear component effects, e.g. via marginal distribution mappers, that are discussed in Supplement A.1.

In the existing implementation of INLA, for all but the most simple models, users must construct the design matrix with the use of helper functions, in a process that is known as ‘stack building’ (Lindgren and Rue, 2015, Section 2.5). For complex models this process can be quite involved. A key feature of inlabru is that this is now entirely automated, greatly simplifying the model fitting process.

For certain model components, such as the SPDE effect, the design matrix has become known among INLA users as the ‘projector’ matrix, as it is thought of as ‘projecting the GMRF parameters to the observation locations’. In this paper we use the term projector matrix and design matrix interchangeably.

2.5 Approximate INLA for non-linear predictors

inlabru extends the class of models that can be fitted using INLA by allowing the predictor to be a non-linear function of the parameter vector 𝒖𝒖\boldsymbol{u}bold_italic_u. The premise for the inlabru implementation for non-linear predictors is to build on the existing INLA implementation using an iterative model fitting scheme, fitting the model using INLA at each iteration. inlabru can therefore be viewed as both a wrapper for, and an extension of, the existing INLA implementation. This iterative approach is an approximate method of inference based on a linearisation step and the properties of the resulting approximation will depend on the nature of the non-linearity and the choice of convergence criteria.

Let 𝜼~(𝒖)~𝜼𝒖\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) denote a non-linear predictor, i.e. a deterministic function of 𝒖𝒖\boldsymbol{u}bold_italic_u. Choosing some linearisation point 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the 1st order Taylor approximation of 𝜼~(𝒖)~𝜼𝒖\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) at 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be written as

𝜼¯(𝒖)¯𝜼𝒖\displaystyle\overline{\boldsymbol{\eta}}(\boldsymbol{u})over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u ) =𝜼~(𝒖0)+j=1d𝑩(j)(𝒖(j)𝒖0(j))=[𝜼~(𝒖0)j𝑩(j)𝒖0(j)]+j𝑩(j)𝒖(j)absent~𝜼subscript𝒖0superscriptsubscript𝑗1𝑑superscript𝑩𝑗superscript𝒖𝑗superscriptsubscript𝒖0𝑗delimited-[]~𝜼subscript𝒖0subscript𝑗superscript𝑩𝑗superscriptsubscript𝒖0𝑗subscript𝑗superscript𝑩𝑗superscript𝒖𝑗\displaystyle=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{0})+\sum_{j=1}^{d}% \boldsymbol{B}^{(j)}(\boldsymbol{u}^{(j)}-\boldsymbol{u}_{0}^{(j)})=\left[% \widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{0})-\sum_{j}\boldsymbol{B}^{(j)}% \boldsymbol{u}_{0}^{(j)}\right]+\sum_{j}\boldsymbol{B}^{(j)}\boldsymbol{u}^{(j)}= over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_B start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) = [ over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_B start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_B start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT
=𝜹+j𝑩(j)𝒖(j),absent𝜹subscript𝑗superscript𝑩𝑗superscript𝒖𝑗\displaystyle=\boldsymbol{\delta}+\sum_{j}\boldsymbol{B}^{(j)}\boldsymbol{u}^{% (j)},= bold_italic_δ + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_B start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ,

where 𝑩(j)superscript𝑩𝑗\boldsymbol{B}^{(j)}bold_italic_B start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT are the derivative matrices of the non-linear predictor, with respect to each latent component state vector 𝒖(j)superscript𝒖𝑗\boldsymbol{u}^{(j)}bold_italic_u start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, evaluated at the linearisation point 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. In contrast to the linear predictor case, the offset term 𝜹𝜹\boldsymbol{\delta}bold_italic_δ is not optional, as it varies with the linearisation point and is therefore typically non-zero.

Generally we define the observation models via

𝒚|𝒖,𝜽conditional𝒚𝒖𝜽\displaystyle\boldsymbol{y}|\boldsymbol{u},{\boldsymbol{\theta}}bold_italic_y | bold_italic_u , bold_italic_θ p(𝒚|𝜼~(𝒖),𝜽),similar-toabsent𝑝conditional𝒚~𝜼𝒖𝜽\displaystyle\sim p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u% }),{\boldsymbol{\theta}}),∼ italic_p ( bold_italic_y | over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) , bold_italic_θ ) ,

so that the observation distribution only depends on the latent vector 𝒖𝒖\boldsymbol{u}bold_italic_u via the predictor η~(𝒖)~𝜂𝒖\widetilde{\eta}(\boldsymbol{u})over~ start_ARG italic_η end_ARG ( bold_italic_u ). Commonly, a link function transformation is applied to each element of the predictor vector, so that observation yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is linked to g1[η~(𝒖)i]superscript𝑔1delimited-[]~𝜂subscript𝒖𝑖g^{-1}\left[\widetilde{\eta}(\boldsymbol{u})_{i}\right]italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ over~ start_ARG italic_η end_ARG ( bold_italic_u ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. However, for some models, such as inhomogeneous Poisson point processes, further linear or non-linear transformations may also be applied internally, in order to setup the required likelihood construction in for INLA (Yuan et al., 2017).

The non-linear observation model p(𝒚|𝜼~(𝒖),𝜽)𝑝conditional𝒚~𝜼𝒖𝜽p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{% \theta})italic_p ( bold_italic_y | over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) , bold_italic_θ ) is approximated by replacing the non-linear predictor with its linearisation, so that the linearised model is defined by

p¯(𝒚|𝒖,𝜽)=p(𝒚|𝜼¯(𝒖),𝜽)p(𝒚|𝜼~(𝒖),𝜽)=p~(𝒚|𝒖,𝜽).¯𝑝conditional𝒚𝒖𝜽𝑝conditional𝒚¯𝜼𝒖𝜽𝑝conditional𝒚~𝜼𝒖𝜽~𝑝conditional𝒚𝒖𝜽\overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})=p(\boldsymbol{% y}|\overline{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta})\approx p(% \boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{% \theta})=\widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}).over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) = italic_p ( bold_italic_y | over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u ) , bold_italic_θ ) ≈ italic_p ( bold_italic_y | over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) , bold_italic_θ ) = over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) .

Each such linearised predictor model η¯(𝒖)¯𝜂𝒖\overline{\eta}(\boldsymbol{u})over¯ start_ARG italic_η end_ARG ( bold_italic_u ) has design matrix

𝑩𝑩\displaystyle\boldsymbol{B}bold_italic_B =[𝑩(1)𝑩(d)]absentmatrixsuperscript𝑩1superscript𝑩𝑑\displaystyle=\begin{bmatrix}\boldsymbol{B}^{(1)}&\dots&\boldsymbol{B}^{(d)}% \end{bmatrix}= [ start_ARG start_ROW start_CELL bold_italic_B start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL … end_CELL start_CELL bold_italic_B start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]

and offset term 𝜹=𝜼~(𝒖0)𝑩𝒖0𝜹~𝜼subscript𝒖0𝑩subscript𝒖0\boldsymbol{\delta}=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{0})-% \boldsymbol{B}\boldsymbol{u}_{0}bold_italic_δ = over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_italic_B bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This defines a model that can be fitted using the existing INLA methodology for LGMs with linear predictors. The next step is to choose a suitable linearisation point. This is done using a fixed point iteration method, which locates a point 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that the resulting conditional posterior mode 𝒖^^𝒖\widehat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG of the linearised model is the same as the linearisation point. Starting at some initial point 𝒖0(0)superscriptsubscript𝒖00\boldsymbol{u}_{0}^{(0)}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT, for k=1,2,𝑘12k=1,2,\dotsitalic_k = 1 , 2 , … applying the INLA method, to obtain

𝜽^(k)superscript^𝜽𝑘\displaystyle\widehat{{\boldsymbol{\theta}}}^{(k)}over^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT =argmax𝜽p¯(𝜽|𝒚;𝒖0(k1)),absentsubscriptargmax𝜽¯𝑝conditional𝜽𝒚superscriptsubscript𝒖0𝑘1\displaystyle=\operatorname*{arg\,max}_{\boldsymbol{\theta}}\overline{p}({% \boldsymbol{\theta}}|\boldsymbol{y};\boldsymbol{u}_{0}^{(k-1)}),= start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG ( bold_italic_θ | bold_italic_y ; bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT ) ,
𝒖^(k)superscript^𝒖𝑘\displaystyle\widehat{\boldsymbol{u}}^{(k)}over^ start_ARG bold_italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT =argmax𝒖p¯(𝒖|𝒚,𝜽^,𝒖0(k1)),absentsubscriptargmax𝒖¯𝑝conditional𝒖𝒚^𝜽superscriptsubscript𝒖0𝑘1\displaystyle=\operatorname*{arg\,max}_{\boldsymbol{u}}\overline{p}(% \boldsymbol{u}|\boldsymbol{y},\widehat{{\boldsymbol{\theta}}},\boldsymbol{u}_{% 0}^{(k-1)}),= start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , over^ start_ARG bold_italic_θ end_ARG , bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT ) ,

and choosing the next linearisation point as the point 𝒖0(k)=(1α)𝒖0(k1)+α𝒖^(k)superscriptsubscript𝒖0𝑘1𝛼superscriptsubscript𝒖0𝑘1𝛼superscript^𝒖𝑘\boldsymbol{u}_{0}^{(k)}=(1-\alpha)\boldsymbol{u}_{0}^{(k-1)}+\alpha\widehat{% \boldsymbol{u}}^{(k)}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = ( 1 - italic_α ) bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT + italic_α over^ start_ARG bold_italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT for some α>0𝛼0\alpha>0italic_α > 0 that minimises a norm of the difference between the linearised and non-linear predictors. The iteration continues until convergence is detected. The convergence stop** criteria are when

  • a maximum number of iterations have been carried out (the bru_max_iter option, default 10101010), or

  • the relative maximum change in the linearisation point is below a threshold relative to the estimated posterior standard deviations (bru_method$rel_tol, default 10%), and

  • the line search for choosing α𝛼\alphaitalic_α is inactive, i.e. α1𝛼1\alpha\approx 1italic_α ≈ 1, indicating that the non-linear and linearised predictors coincide.

For more details on the iterative method, see Supplementary Material Section B.

2.6 Approximation accuracy

Whereas the inlabru optimisation method leads to an estimate where 𝜼~(𝒖)𝜼¯(𝒖)=0norm~𝜼subscript𝒖¯𝜼subscript𝒖0\|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{*})-\overline{\boldsymbol{\eta% }}(\boldsymbol{u}_{*})\|=0∥ over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ∥ = 0 for a specific 𝒖subscript𝒖\boldsymbol{u}_{*}bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, the overall posterior approximation accuracy depends on the degree of nonlinearity in the vicinity of 𝒖subscript𝒖\boldsymbol{u}_{*}bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. There are two main options for evaluating this nonlinearity, that can both be computed using sampling from the approximate posterior distribution. The first option is

iE𝒖p¯(𝒖|𝒚)[|𝜼¯i(𝒖)𝜼~i(𝒖)|2]Var𝒖p¯(𝒖|𝒚)(𝜼¯i(𝒖)),subscript𝑖subscript𝐸similar-to𝒖¯𝑝conditional𝒖𝒚delimited-[]superscriptsubscript¯𝜼𝑖𝒖subscript~𝜼𝑖𝒖2subscriptVarsimilar-to𝒖¯𝑝conditional𝒖𝒚subscript¯𝜼𝑖𝒖\displaystyle\sum_{i}\frac{E_{\boldsymbol{u}\sim\overline{p}(\boldsymbol{u}|% \boldsymbol{y})}\left[|\overline{\boldsymbol{\eta}}_{i}(\boldsymbol{u})-% \widetilde{\boldsymbol{\eta}}_{i}(\boldsymbol{u})|^{2}\right]}{\mathrm{Var}_{% \boldsymbol{u}\sim\overline{p}(\boldsymbol{u}|\boldsymbol{y})}(\overline{% \boldsymbol{\eta}}_{i}(\boldsymbol{u}))},∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_E start_POSTSUBSCRIPT bold_italic_u ∼ over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y ) end_POSTSUBSCRIPT [ | over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) - over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG roman_Var start_POSTSUBSCRIPT bold_italic_u ∼ over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y ) end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) ) end_ARG ,

which is the posterior expectation of the component-wise variance-normalised squared deviation between the non-linear and linearised predictor. Note that the normalising variance includes the variability induced by the posterior uncertainty for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, whereas the V\|\cdot\|_{V}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT norm used for the line search used only the posterior mode.

The second option is to approximate the Kullback-Leibler divergences between the conditional posterior distributions under the linear and nonlinear models,

𝖣𝖪𝖫(p¯p~)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\overline{p}\,\middle\|\,\widetilde% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ∥ over~ start_ARG italic_p end_ARG ) =E𝒖p¯(𝒖|𝒚,𝜽)[lnp¯(𝒖|𝒚,𝜽)p~(𝒖|𝒚,𝜽)],absentsubscript𝐸similar-to𝒖¯𝑝conditional𝒖𝒚𝜽delimited-[]¯𝑝conditional𝒖𝒚𝜽~𝑝conditional𝒖𝒚𝜽\displaystyle=E_{\boldsymbol{u}\sim\overline{p}(\boldsymbol{u}|\boldsymbol{y},% \boldsymbol{\theta})}\left[\ln\frac{\overline{p}(\boldsymbol{u}|\boldsymbol{y}% ,{\boldsymbol{\theta}})}{\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},{% \boldsymbol{\theta}})}\right],= italic_E start_POSTSUBSCRIPT bold_italic_u ∼ over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_POSTSUBSCRIPT [ roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG ] , (2)
𝖣𝖪𝖫(p~p¯)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\widetilde{p}\,\middle\|\,\overline% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG ∥ over¯ start_ARG italic_p end_ARG ) =E𝒖p~(𝒖|𝒚,𝜽)[lnp~(𝒖|𝒚,𝜽)p¯(𝒖|𝒚,𝜽)],absentsubscript𝐸similar-to𝒖~𝑝conditional𝒖𝒚𝜽delimited-[]~𝑝conditional𝒖𝒚𝜽¯𝑝conditional𝒖𝒚𝜽\displaystyle=E_{\boldsymbol{u}\sim\widetilde{p}(\boldsymbol{u}|\boldsymbol{y}% ,\boldsymbol{\theta})}\left[\ln\frac{\widetilde{p}(\boldsymbol{u}|\boldsymbol{% y},{\boldsymbol{\theta}})}{\overline{p}(\boldsymbol{u}|\boldsymbol{y},{% \boldsymbol{\theta}})}\right],= italic_E start_POSTSUBSCRIPT bold_italic_u ∼ over~ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_POSTSUBSCRIPT [ roman_ln divide start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG ] , (3)

both evaluated at the posterior mode for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. While it may be possible to also approximate the K-L divergence for the full posterior distributions, in practice it is easier to evaluate the conditional K-L divergences only at the posterior mode for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. Implementing this requires access to specific aspects of the likelihood and prior distribution details, that are available from the INLA software package output, and we plan to include an implementation of these K-L divergence as a model diagnostic in a future version of inlabru. Supplementary Material Section F presents this in more detail and includes results to approximate Equations (2) and (3).

Users can also evaluate the accuracy of the method for a given model using general Bayesian method checking approaches, as we show in Section 4.

2.7 Well-posedness and initialisation

On a side note, one might be concerned about initialisation at, or convergence to, a saddle point. Although it is not implemented in inlabru, we want to talk about the technicality how we define the initial linearisation point 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Generally speaking, any values of 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT work except the case that the gradient evaluated at 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is 𝟎0\boldsymbol{0}bold_0 because the linearisation point will never move away if the prior mean is also 𝟎0\boldsymbol{0}bold_0. In general, this tends to be a saddle point problem. In some cases the problem can be handled by changing the predictor parameterisation or just changing the initialisation point using the bru_initial option. However, for true saddle point problems, it indicates that the predictor parameterisation may lead to a multimodal posterior distribution or is ill-posed in some other way. This is a more fundamental problem that cannot be fixed by changing the initialisation point.

In these examples, where β𝛽\betaitalic_β and 𝒖𝒖\boldsymbol{u}bold_italic_u are latent Gaussian components, the predictors 1, 3, and 4 would typically be safe, but predictor 2 is fundamentally non-identifiable:

𝜼1subscript𝜼1\displaystyle\boldsymbol{\eta}_{1}bold_italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =𝒖,absent𝒖\displaystyle=\boldsymbol{u},= bold_italic_u ,
𝜼2subscript𝜼2\displaystyle\boldsymbol{\eta}_{2}bold_italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =β𝒖,absent𝛽𝒖\displaystyle=\beta\boldsymbol{u},= italic_β bold_italic_u ,
𝜼3subscript𝜼3\displaystyle\boldsymbol{\eta}_{3}bold_italic_η start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =eβ𝒖,absentsuperscript𝑒𝛽𝒖\displaystyle=e^{\beta}\boldsymbol{u},= italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT bold_italic_u ,
𝜼4subscript𝜼4\displaystyle\boldsymbol{\eta}_{4}bold_italic_η start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT =Fβ1(Φ(zβ))𝒖,zβ𝖭(0,1).formulae-sequenceabsentsuperscriptsubscript𝐹𝛽1Φsubscript𝑧𝛽𝒖similar-tosubscript𝑧𝛽𝖭01\displaystyle=F_{\beta}^{-1}(\Phi(z_{\beta}))\boldsymbol{u},\quad z_{\beta}% \sim\mathsf{N}(0,1).= italic_F start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_Φ ( italic_z start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) ) bold_italic_u , italic_z start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ∼ sansserif_N ( 0 , 1 ) .

3 Package structure and basic syntax

The main workhorse of inlabru is the bru() function, which is used to fit models. bru() automatically detects non-linear predictors and, if required, runs the iterated INLA procedure. The bru() function requires a components argument, which is an R formula object specifying the model components. In addition to the components specification, bru() has an optional formula argument that specifies how to combine the model components, possibly in a non-linear way, to construct the latent predictor. If no formula is provided the model assumes a linear additive predictor based on the components formula. In this case bru() internally constructs the model design matrix and makes a single call to INLA::inla() to fit the model.

Below we summarise the basic workflow to specify, fit and sample from models using inlabru.

3.1 Defining, estimating and sampling from models

3.1.1 Define model components

The syntax for defining model components is directly related to the perspective on LGM predictors presented in Section 2.4. Each model component, which is a single latent Gaussian parameter, or a collection of related parameters, has an associated prior covariance structure as well as a component design matrix to relate parameters to observations.

Similar to other GAM fitting software, models components are defined via a formula-like syntax. Each component can be specified by including the following expression in a formula object:

R> component_name(+    main = ...,+    model = ...,+    ...+  ) where component_name is a user-chosen name for the component and does not refer to a specific function name.

The main argument takes an R expression that specifies the input data for the component. For example, main could be the name of a covariate for a linear fixed effect. Or it could be the name of a covariate that includes indices for a latent random effect, such as a time variable for an auto-regressive model component. For a spatial effect component it could be the name of the geometry column in an sf data object or an R expression like cbind(x, y) where x and y are the names of coordinate variables, for effects such as the SPDE where a 2-column matrix of coordinates is expected. In other words, main specifies key information that is required to build the component design matrix.

The flexibility of allowing main to be an R expression is the basis for the support for various spatial data types and allows inlabru to be readily extended to support other data structures in the future. We cover this feature in more detail below. The optional arguments group and replicate also accept general R expressions but for the corresponding features of the INLA::f() function. This allows users flexibility in specifying the model and reduces the amount of pre-processing of data required compared to INLA.

The information provided by main is then used by bru_mapper() internally to construct the relevant design matrix. Exactly what form this takes depends on the type of model component which is specified using the model argument. Most models are identified using a character string. For example, model = "linear", defines a component that is the linear effect of the data provided in main. Similarly, model = "ar1" defines a auto-regressive order 1 model component. The full list of supported component names can be found by running INLA::inla.list.models()$latent. An exception to this is certain spatial models defined by the SPDE random effect. In this case, model can be a SPDE model object.

The user-chosen name component_name appears in model summaries and can be used in prediction formulas, or when sampling from the posterior. This makes it easy for users to track which latent parameters correspond to which components, as the model object uses this label. This is in contrast to INLA which does not have the ability for user-defined naming of each component and requires the user to use their own indexing system to track which latent parameters correspond to each component.

We now give some invented examples to clarify the above template for defining model components in specific situations.

Suppose we have covariate named salt included in the data and we wish to include the linear effect of salt as a component of the model predictor. Additionally, suppose that the only other parameter in the predictor is an intercept component. Then the model components would for this model could defined as

R> cmps <- ~ effect_of_salt(salt,+                           model = "linear") ++    intercept(1, model = "linear")

The name effect_of_salt is used to label relevant parts of the fitted model object, will appear in model summaries, and can be used in the formula passed to the predict() methods for fitted models (which we cover in more detail in the full examples below). Note that inlabru interprets intercept(1,...) as equivalent to intercept(rep(1, n), ...) where n is the number of rows in the model design matrix. This is shorthand to make defining intercept-like components more straightforward.

The model types can be any of the models that are accepted by the INLA::f() function, as well as the additional inlabru specific options of "constant", "offset" (a backwards compatibility synonym for "constant"), "linear", "factor_full", "factor_contrast". Additionally the "fixed" component type allows users to specify fixed effects using standard lm()-type syntax such as ~ x1*x2 for an interaction.

The additional (optional) arguments to component definitions are any that can be accepted by INLA::f(), such as the already mentioned group and replicate arguments, but also prior specifications, linear constraints and initial values, for example.

We can see from the above that multiple model components as additive terms in a formula object that is then passed to the like() function to construct the model likelihood. In general, any number of components can be defined by specifying model definitions additively in the components formula object:

R> cmp <- ~ component_1(main = ..., model = ..., ...) ++    component_2(main = ..., model = ..., ...) ++    ...

There are a number of conventions that simplify the process of defining common model components such as linear fixed effects and intercept-like parameters.

A component defining the linear effect of a covariate can also be defined simply by using the name of the covariate in the formula. e.g. cmp <- ~covariate_name is equivalent to cmp <- ~covariate_name_effect(main = covariate_name, model = "linear"). This simplifies the approach to specify common model components and is similar to other GAM fitting software in R. However, we prefer to present the more verbose definition as it more closely resembles the definition of more complicated model components.

For a final more complicated example, consider the case where we have a temporal covariate named season which takes values 1,2,3,412341,2,3,41 , 2 , 3 , 4 which correspond to the four seasons, and a discrete spatial index given by a covariate sp_index which takes values 1,2,,M12𝑀1,2,\ldots,M1 , 2 , … , italic_M with associated neighbourhood graph object sp_graph. Then to define a spatio-temporal Besag CAR model with an AR(1)AR1\text{AR}(1)AR ( 1 ) temporal dependence structure we can include the following expression in the component definitions:

R> cmp <- ~ sp_season_comp(sp_index,+                          model = "besag",+                          graph = sp_graph,+                          group = season,+                          group.model = "ar1"+  )

Note that this has the same basic structure as the definition of a linear model component above. There is a main input, and a model definition. In addition to this, we include a group variable and a group.model that define the temporal dependence model, as well as the graph argument which is expected for the Besag component type. inlabru automatically parses this expression into a component design matrix when constructing the model likelihood. Note that this greatly simplifies the usual process of defining such effects in INLA which require users to construct the relevant component design matrices directly and the full model design matrix using inla.stack(). All of this is now automated and the user can focus on defining the model.

The documentation for defining model components can be found by running ?component. Any information that is not included there is likely to be found in INLA::f which is the equivalent documentation in INLA.

3.1.2 Define model formula

The model formula specifies the response variable and how the model components are combined to form the predictor expression. For example, a standard additive predictor with response variable y has the form

R> fml <- y ~ component_1 + component_2 + ... Note that each model component is referred to by the user-chosen name specified in the components definition. In this example the names are just component_1 and component_2. The predictor formula can be any R expression, including non-linear functions of components. For example, a user could define a product of two latent components by using

R> y ~ component_1 * component_2 Alternatively, the user could also define a new function in the environment and use this within the formula expression. For example, an equivalent to the above product is

R> my_prod <- function(a, b) a * bR> fml <- y ~ my_prod(component_1, component_2) This allows more complex predictor expressions to be stored as separate functions rather than directly specified in the formula expression. Note that despite similar appearance to other GAM/GLM software conventions, the syntax component_1 * component_2 is not defining an interaction of two covariates. Instead this formula object is parsed as a literal function of the model components. In this case the model has two components, and the predictor expression is the multiplication of them with each other. Using the component design matrix notation we defined above, this predictor can be written mathematically as 𝜼~(𝒖)=𝑨(1)𝒖(1)𝑨(2)𝒖(2)~𝜼𝒖direct-productsuperscript𝑨1superscript𝒖1superscript𝑨2superscript𝒖2\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})=\boldsymbol{A}^{(1)}\boldsymbol{% u}^{(1)}\odot\boldsymbol{A}^{(2)}\boldsymbol{u}^{(2)}over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ) = bold_italic_A start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT bold_italic_u start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ⊙ bold_italic_A start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT bold_italic_u start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT, where direct-product\odot denotes element-wise multiplication.

In order to calculate the derivatives matrix required for the linearisation step, inlabru uses numerical derivatives rather than requiring the user to directly specify the derivatives of the predictor formula. The component mapper system computes the linearised component offset and design matrices discussed in Supplement A.1, followed by numerical differentiation of the combined predictor formula, and subsequent combination using the chain rule. In future this could be improved by linking inlabru with automatic differentiation libraries or allowing users to specify their own derivative functions directly. We advise users to define predictor expressions that are at least twice differentiable since this is a requirement of the Laplace approximation. However this is a minimum requirement and is not a guarantee of good approximation properties for all twice-differentiable non-linear functions. At the moment no checks are made as to the differentiability of the predictor formula and it is up to users to make sure they define differentiable expressions.

3.1.3 Construct the likelihood and fit the model

To facilitate models with multiple likelihoods with shared components, inlabru contains a likelihood object class that is constructed using the like() function. To fit a model the components definition and likelihood(s) are then passed to the fitting function bru().

R> lik <- like(+    formula = fml,+    data = ...,+    family = ...,+    ...+  )R> fit <- bru(lik,+    components = cmp+  )

The formula is automatically inspected to check if it is non-linear and, if so, bru() uses the iterated INLA inference approach. The family argument specifies the data likelihood. A full list of currently implemented likelihoods can be viewed by running INLA::inla.list.models("likelihood")

For models with a single likelihood, an alternative is to call bru() directly

R> fit <- bru(+    components = cmp,+    formula = fml,+    family = ...,+    data = ...+  )

However, all the remaining example code in this paper and in the accompanying code repository uses the like() function to specify models.

Separating out likelihood construction as a separate task to model fitting facilitates support for multiple likelihood models which can be specified as simply as

R> lik_1 <- like(+    formula = fml_1,+    data = data_1,+    family = ...+  )R>R> lik_2 <- like(+    formula = fml_2,+    data = data_2,+    family = ...+  )R>R> fit <- bru(+    lik_1, lik_2,+    components = cmp,+    formula = fml+  ) Note that the components are only specified once directly in the call to bru(). The likelihoods each have their own predictor formula object that use some or all of these components. In this way it is possible to specify model components that are shared across multiple likelihoods.

Recall that the input data for each component is define using a general R expression. This functionality greatly simplifies the specification of joint models since the predictor expressions can be constructed for each likelihood but each component need only be defined once. The main argument for each component specifies the means of constructing the required input data and projection matrices when evaluated on either data_1 or data_2.

3.1.4 Sampling from the posterior

The generate() function can be used to sample from the model posterior. The following function produces 500 samples:

R> generate(+    fit,+    data = ...,+    formula = ...,+    n.samples = 500+  ) with the parameters sampled depending on what is included in the formula argument. The formula can be any R expression that uses named model components, which can be useful for investigating functional summaries of model parameters. The predict() method can be used to calculate summary statistics of posterior Monte Carlo samples, such as the empirical mean, mode, standard deviation and quantiles.

The data argument can be used to specify the input data for the predictions, such as specific spatial locations or covariate values, for example. To sample directly from latent parameters without specifying any new input data (or, in other words, to sample with an identity projection matrix), the keyword _latent can be appended to a latent component name. For example

R> generate(+    fit,+    formula = ~ component_name_latent,+    n.samples = 500+  ) will generate 500 samples of the latent Gaussian parameters associated with the component named component_name.

Similarly, individual components can be evaluated for arbitrary inputs by using the keyword _eval appended to a latent component name. For example,

R> generate(fit,+    formula = ~ component_name_eval(1:5),+    n.samples = 500+  ) generates samples of the component named component_name evaluated with input data c(1,2,3,4,5). When using this functionality care should be taken to ensure that the input data are in the correct format for the component model type.

3.2 Support for spatial data

As mentioned above, inlabru provides support for sf (Pebesma, 2018) and terra (Hijmans, 2024) data structures. This means that sf and terra objects can be passed as data and covariates, respectively, directly to the like() and bru() functions. For example, a log-Gaussian Cox process can be fitted using an sf points object to describe the locations of the observed points and an sf polygon object to define the observation window. Spatial objects can also be used for prediction as data objects in predict() and generate() calls.

Some model components can work directly with the spatial information provided in the geometry column of the sf data object. For example, the bru_mapper() method for the SPDE random effect has methods for working with sf geometry objects. The mapper is system is the basis for extending support for spatial data to more model components in the future.

The eval_spatial() methods can be used to extract information from spatial data objects. This is useful for defining components where spatial data is an input. For example, a component specified as

R> ~ my_sp_effect(+    main = a_spatial_object,+    model = "linear"+  ) is equivalent to

R> ~ my_sp_effect(+    main = eval_spatial(a_spatial_object, .data.),+    model = "linear"+  ) where .data. is a keyword that refers to whatever data object is passed to the like() or bru() function. The eval_spatial() function extracts the value of the spatial covariate at the locations provided in the data. See ?eval_spatial for more information on what spatial data formats are supported.

There is also legacy support for the sp and raster packages. For these legacy packages there is support for the plotting of spatial data. The gg() function is a geometry class for ggplot2 (Wickham, 2009) plot objects. It automatically detects the type of sp object (points, lines, pixels, grid or polygons) and coerces this into a correct format for plotting with standard ggplot2 geometry classes. This plotting support is not required for the newer spatial packages which have already been supported in ggplot2 with the geom_sf function, the tidyterra package (Hernangómez, 2023), and the geom_fm function in the fmesher package (Lindgren, 2024).

4 Approximate Bayesian method checking

As a proof of concept test we compare the approximate posterior distributions to the truth in a simple case where consider the true posterior distribution has a known form. Consider the hierarchical model of the form

λ𝜆\displaystyle\lambdaitalic_λ 𝖤𝗑𝗉(γ)similar-toabsent𝖤𝗑𝗉𝛾\displaystyle\sim\mathsf{Exp}(\gamma)∼ sansserif_Exp ( italic_γ )
(yi|λ)conditionalsubscript𝑦𝑖𝜆\displaystyle(y_{i}|\lambda)( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_λ ) 𝖯𝗈(λ),similar-toabsent𝖯𝗈𝜆\displaystyle\sim\mathsf{Po}(\lambda),∼ sansserif_Po ( italic_λ ) ,

for i=1,,n𝑖1𝑛i=1,\ldots,nitalic_i = 1 , … , italic_n. Treating γ𝛾\gammaitalic_γ as a known constant, the posterior density is

p(λ|{yi})𝑝conditional𝜆subscript𝑦𝑖\displaystyle p(\lambda|\{y_{i}\})italic_p ( italic_λ | { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ) p(λ,y1,,yn)proportional-toabsent𝑝𝜆subscript𝑦1subscript𝑦𝑛\displaystyle\propto p(\lambda,y_{1},\dots,y_{n})∝ italic_p ( italic_λ , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
exp(γλ)exp(nλ)λny¯proportional-toabsent𝛾𝜆𝑛𝜆superscript𝜆𝑛¯𝑦\displaystyle\propto\exp(-\gamma\lambda)\exp(-n\lambda)\lambda^{n\overline{y}}∝ roman_exp ( - italic_γ italic_λ ) roman_exp ( - italic_n italic_λ ) italic_λ start_POSTSUPERSCRIPT italic_n over¯ start_ARG italic_y end_ARG end_POSTSUPERSCRIPT
=exp{(γ+n)λ}λny¯,absent𝛾𝑛𝜆superscript𝜆𝑛¯𝑦\displaystyle=\exp\{-(\gamma+n)\lambda\}\lambda^{n\overline{y}},= roman_exp { - ( italic_γ + italic_n ) italic_λ } italic_λ start_POSTSUPERSCRIPT italic_n over¯ start_ARG italic_y end_ARG end_POSTSUPERSCRIPT ,

where y¯=1ni=1nyi¯𝑦1𝑛superscriptsubscript𝑖1𝑛subscript𝑦𝑖\overline{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}over¯ start_ARG italic_y end_ARG = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This is proportional to a 𝖦𝖺(α=1+ny¯,β=γ+n)𝖦𝖺formulae-sequence𝛼1𝑛¯𝑦𝛽𝛾𝑛\mathsf{Ga}(\alpha=1+n\overline{y},\beta=\gamma+n)sansserif_Ga ( italic_α = 1 + italic_n over¯ start_ARG italic_y end_ARG , italic_β = italic_γ + italic_n ) density.

This model can be reparameterised by introducing a latent Gaussian variable u𝖭(0,1)similar-to𝑢𝖭01u\sim\mathsf{N}(0,1)italic_u ∼ sansserif_N ( 0 , 1 ). Using the inverse cumulative distribution function (CDF) for the exponential distribution we can rewrite the model as

λ(u)𝜆𝑢\displaystyle\lambda(u)italic_λ ( italic_u ) =ln{1Φ(u)}/γabsent1Φ𝑢𝛾\displaystyle=-\ln\{1-\Phi(u)\}/\gamma= - roman_ln { 1 - roman_Φ ( italic_u ) } / italic_γ
yi|uconditionalsubscript𝑦𝑖𝑢\displaystyle y_{i}|uitalic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_u 𝖯𝗈(λ(u)),similar-toabsent𝖯𝗈𝜆𝑢\displaystyle\sim\mathsf{Po}(\lambda(u)),∼ sansserif_Po ( italic_λ ( italic_u ) ) ,

where Φ(u)Φ𝑢\Phi(u)roman_Φ ( italic_u ) is the CDF of a standard normal distribution. Using a log link, the non-linear predictor function is η~(u)=lnλ(u)~𝜂𝑢𝜆𝑢\widetilde{\eta}(u)=\ln\lambda(u)over~ start_ARG italic_η end_ARG ( italic_u ) = roman_ln italic_λ ( italic_u ). For the simulation study we fix γ=1/2𝛾12\gamma=1/2italic_γ = 1 / 2.

This model can be specified in inlabru as follows:

R> # Fix value for gamma parameterR> gam <- 0.5R>R> # Define an N(0,1) prior for uR> u_prior <- list(prec = list(initial = 0, fixed = TRUE))R> cmp <- ~ 0 + lambda(rep(1, n), model = "iid", hyper = u_prior,+                      marginal = bru_mapper_marginal(qexp, rate = gam))R>R> # Define model formulaR> fml <- y ~ log(lambda)R>R> # Construct likelihoodR> lik <- like(+    formula = fml,+    family = "poisson",+    data = ...+  )R>R> # Fit the modelR> fit <- bru(cmp, lik) Note that the objects gam (γ𝛾\gammaitalic_γ in Equation (4)) and n (sample size) are stored in the global environment. The component definitions and predictor formula can reference these objects directly. Defining a parameter with a standard normal prior is done in the same way as with INLA, via the hyper option (see ?INLA::f). From inlabru version 2.10.0, the bru_mapper_marginal mapper class can be added to the component mapper pipeline directly, via the marginal argument. The supplementary code includes comments about how to apply the transformation manually as well, using the internal helper function bru_forward_transformation, that transforms standard normal variables to given distributions using numerically stable methods.

Figure 2 shows close agreement between the approximate posterior density estimated by inlabru alongside the exact posterior. The approximate posterior density was calculated by Gaussian kernel smoothing 20,000 Monte Carlo samples from the posterior for λ𝜆\lambdaitalic_λ.

Refer to caption
Figure 2: The approximate posterior density for λ𝜆\lambdaitalic_λ (dashed blue curve) compared to the true posterior density (solid red curve) for a single simulation. The approximate posterior is a kernel smoothed representation of 20,000 posterior Monte Carlo samples for λ𝜆\lambdaitalic_λ, with the shaded region showing 99% pointwise bootstrap confidence intervals for the kernel density estimator. The true data-generating λ𝜆\lambdaitalic_λ values is marked with a solid black vertical line

To evaluate the ability of the approximate iterative INLA method to accurately estimate the posterior across many simulated datasets, we use an approach adapted from Talts et al. (2020). The basic idea is to repeatedly sample data from the prior predictive model, approximate the posterior for each sample and then calculate the posterior CDF value of each quantity of interest. If the inference procedure accurately estimates the posterior then these CDF values will have a uniform distribution. See Supplementary Material Section C for more details.

For all simulations we set γ=0.5𝛾0.5\gamma=0.5italic_γ = 0.5 and sampled 100 values from the prior predictive distribution for y𝑦yitalic_y. We ran 500 simulations and for each we computed the empirical CDF (ECDF) value for λ(u)𝜆𝑢\lambda(u)italic_λ ( italic_u ) based on 5,000 Monte Carlo samples from the approximate posterior.

Refer to caption
Figure 3: Results from the simulation based calibration checking for uniformity. Both plots are based on the ECDF values for λ(u)𝜆𝑢\lambda(u)italic_λ ( italic_u ). A: Scaled ECDF values with the Kolmogorov-Smirnov test statistic; B: Histogram of ECDF values.

This shows that the approximate non-linear method is close to estimating the true posterior. These figures show only a relatively small deviation from uniformity. With such an approximate method based on a first-order Taylor expansion we should generally expect some deviation. However, overall the approximation is fairly good for this model.

5 Examples

We present three examples analysing a simulated dataset of counts in areal units. Areal units are a common data structure in epidemiology and public health due to data commonly reported within administrative geographic units. For the examples we use the intermediate zones (The Scottish Government, 2021) in the city of Glasgow, Scotland, which are freely available under the Open Government Licence (The National Archives, 2024). These are areal units created for reporting data at medium sized geographic units, typically with 2,500-6,000 residents.

We simulated count data within each intermediate zone by assuming a dependence on a latent continuously indexed random field. We generated a single realisation from a Gaussian random field (GRF) with Matérn covariance which we denote ξ(𝒔)𝜉𝒔\xi(\boldsymbol{s})italic_ξ ( bold_italic_s ). This was then aggregated up to the areal unit level so that the count data in each areal unit i𝑖iitalic_i were simulated from a Poisson distribution with rate λi=Ωiexp(β+ξ(𝒔))d𝒔subscript𝜆𝑖subscriptsubscriptΩ𝑖𝛽𝜉𝒔differential-d𝒔\lambda_{i}=\int_{\Omega_{i}}\exp(\beta+\xi(\boldsymbol{s}))\,\mathrm{d}{}% \boldsymbol{s}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s ) ) roman_d bold_italic_s, where β𝛽\betaitalic_β is an intercept parameter and ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the region of space covered by areal unit i𝑖iitalic_i. In all three examples the data likelihood at the area level is Poisson with a log link function. The three examples differ in the approach to modelling the log-rate for the Poisson variables. We present 1) a Besag-York-Mollié (BYM)(Besag et al., 1991) example, 2) an example aggregating a continuously indexed random field to areal units, and 3) a joint likelihood model where the latent random field is observed at a set of point locations and aggregated to areal units.

5.1 The Besag-York-Mollié model for areal data

The Besag-York-Mollié model is an example of an intrinsic random field defined on a graph that represents the neighbourhood structure of the areal units. The BYM component consists of the sum of a spatially structured intrinsic conditional autoregressive (ICAR) field 𝒖𝒖\boldsymbol{u}bold_italic_u and a spatially unstructured Gaussian field 𝒗𝒗\boldsymbol{v}bold_italic_v. i.e. The log rate is lnλi=β+wisubscript𝜆𝑖𝛽subscript𝑤𝑖\ln\lambda_{i}=\beta+w_{i}roman_ln italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_β + italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where wi=ui+visubscript𝑤𝑖subscript𝑢𝑖subscript𝑣𝑖w_{i}=u_{i}+v_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The data object glasgow is an sf polygons object with the response variable named count and a variable named ID that is an integer identifier for each areal unit. The model components and formula are

R> cmp <- ~ 0 + beta(1) + w(ID,+    model = "bym",+    graph = g+  )R>R> fml <- count ~ beta + w

where g is the graph defining the neighbourhood structure. Note that we have chosen the component names beta and w to directly match the mathematical notation above. To match the formula conventions of other packages, inlabru will include an intercept by default unless +0 or -1 is included in the components definition, or there is a component name that begins with the string intercept or Intercept.

To fit and predict from the model:

R> lik <- like(+    formula = fml,+    data = glasgow,+    family = "poisson"+  )R>R> fit <- bru(+    components = cmp,+    lik+  )R>R> predict(+    fit,+    newdata = glasgow,+    formula = ~ exp(beta + w),+    n.samples = 500+  )

Figure 4 shows the mean posterior rate returned by the predict() call alongside the true rate used to simulate the data.

Refer to caption
Figure 4: A: The BYM model posterior rate per area unit; B: The observed count per area unit.

5.2 Aggregating a continuously indexed random field to areal units

This example demonstrates that the extra flexibility provided by the iterated INLA method allows for models to be fitted that are not possible to fit using standard INLA.

In this example we specify a continuously indexed latent random field using the SPDE approach. We use the term continuously indexed to distinguish from the discrete area models like the BYM model. In this case the index is a spatial location \bs\bs\bs which can be any location in a continuous spatial domain.

This model structure is useful for applications where the response variable is available at a different spatial scale to the assumed causes. For example, one might expect counts of asthma cases in an areal unit to depend on air pollution, which varies continuously in space and does not recognise areal unit boundaries.

The log-rate is approximated using a numerical integration scheme

lnλisubscript𝜆𝑖\displaystyle\ln\lambda_{i}roman_ln italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =ln(Ωiexp(β+ξ(𝒔))d𝒔)absentsubscriptsubscriptΩ𝑖𝛽𝜉𝒔differential-d𝒔\displaystyle=\ln\left(\int_{\Omega_{i}}\exp(\beta+\xi(\boldsymbol{s}))\,% \mathrm{d}{}\boldsymbol{s}\right)= roman_ln ( ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s ) ) roman_d bold_italic_s )
ln[jwijexp(β+ξ(𝒔ij))]absentsubscript𝑗subscript𝑤𝑖𝑗𝛽𝜉subscript𝒔𝑖𝑗\displaystyle\approx\ln\left[\sum_{j}w_{ij}\exp(\beta+\xi(\boldsymbol{s}_{ij})% )\right]≈ roman_ln [ ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ]

with weights wijsubscript𝑤𝑖𝑗w_{ij}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT defined using projected integration weights to mesh nodes. In order to implement this, one could store these weights in a matrix 𝑾𝑾\boldsymbol{W}bold_italic_W such that 𝝀𝑾exp(β+𝝃)𝝀𝑾𝛽𝝃\boldsymbol{\lambda}\approx\boldsymbol{W}\exp(\beta+\boldsymbol{\xi})bold_italic_λ ≈ bold_italic_W roman_exp ( italic_β + bold_italic_ξ ), using a predictor expression written as log(W %*% exp(Intercept + field)). However, this procedure is already encapsulated in the bru_mapper_logsumexp mapper, so that the 𝑾𝑾\boldsymbol{W}bold_italic_W matrix is constructed automatically and doesn’t need to be visible in the user code. In addition, the mapper uses an internal weight shift to avoid numerical over/underflow when summing the terms. The code to construct the integration scheme can be found in Supplementary Material Section E.

Given this integration scheme, the code to specify and fit the model is

R> # Define Matern/SPDE modelR> matern <- inla.spde2.pcmatern(+    mesh,+    prior.range = c(2, 0.1),+    prior.sigma = c(3, 0.1)+  )R>R> # Define the intermediate zone integration schemeR> ips <- fm_int(mesh, samplers = glasgow)R> agg <- bru_mapper_logsumexp(+    rescale = FALSE,+    n_block = nrow(glasgow)+  )R>R> # Define model components and formulaR> cmp <- ~ 0 + beta(1) + xi(main = geometry, model = matern)R> fml <- count ~ ibm_eval(+    agg,+    input = list(weights = weight, block = .block),+    state = beta + xi+  )R>R> # Construct the likelihoodR> lik <- like(+    formula = fml,+    family = "poisson",+    data = ips,+    response_data = glasgow+  )R>R> # Fit the modelR> fit <- bru(+    components = cmp,+    lik+  ) In this example we make use of the response_data argument of the like() function, which allows users to specify response data of different length to what would usually be expected. This feature of inlabru allows covariates and latent parameters to be on a different scale to the observed data. The non-linear predictor function provides the link between these different scales. In this case the link is the numerical integration of the latent field within each intermediate zone where ibm_eval(agg, ...) results in a vector of length n𝑛nitalic_n that matches the response data.

The xi component is evaluated at the locations provided in ips, which define the integration scheme. Note that this is an sf object in this example, and the xi component was defined using geometry as input. If geometry is not a named variable then the software assumes it is a general R expression to be applied to the data. In this case geometry is the geometry information column of the sf data object, which is supported by the bru_mapper() method for the matern model object, that constructs the model matrix for the component. For old SpatialPointsDataFrame objects, one would instead use xi(coordinates, ...), where coordinates is the sp function that extracts a 2-column matrix of coordinates from the data object. However, this would drop all coordinate system projection information, which is retained in the sf object, and so we recommend using sf objects where possible to avoid associating data with incorrect model locations.

The mapper bru_mapper_logsumexp here defines the link from the state vectors to the effects vector. In other words, it is the logarithms of the weighted sums of the element-wise exponential of the state vectors beta and xi, taking into account of the input vectors weight and .block. .block argument gives the block index vector from the fm_int function. By default, the rescale = FALSE argument gives the plain weighted sums. With rescale = TRUE, the sum is rescaled by the sum of the weights within each block, as

ln[jwijexp(β+ξ(𝒔ij))jwij].subscript𝑗subscript𝑤𝑖𝑗𝛽𝜉subscript𝒔𝑖𝑗subscript𝑗subscript𝑤𝑖𝑗\ln\left[\frac{\sum_{j}w_{ij}\exp\big{(}\beta+\xi(\boldsymbol{s}_{ij})\big{)}}% {\sum_{j}w_{ij}}\right].roman_ln [ divide start_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG ] .

The n_block argument specifies a predetermined number of output blocks. For more information, please see the theory and technical documentation on bru_mapper system (Lindgren, 2022).

5.2.1 Non-linear convergence assessment

Note that because the predictor is non-linear in the parameters, this model is fitted using the iterated INLA method. This non-linearity is assumed unless the user specifies the formula as y ~ . (see Section 1.1, for example), or provides no formula at all.

For assessing the convergence of the linearisation, we can call the function

R> bru_convergence_plot(fit)

Refer to caption
Figure 5: Figure for assessing the convergence of iterative INLA for Example 5.2. This figure is explained in detail in the main text.

The resulting Figure 5 shows the convergence of the linearisation across iterations, through four panels (from top to bottom, left to right),

  • Tracks: The convergence of the modes of the covariates and hyperparameters.

  • Mode - Lin: The difference between the mode and the linearisation.

  • ||||Change||||/sd (Max and Mean): The absolute changes over the standard deviations (their maxima and means for vector components) for the mode and the linearisation.

  • Change & SD: The changes and standard deviations of the covariates and hyperparameters.

In this case the convergence is virtually immediate, with the second iteration already being close enough to the first iteration to trigger then of iterations. The third and final step only carries out the posterior integration step of the INLA method.

5.2.2 Predictions

Predictions can be generated at multiple scales by modifying the formula used in the predict() call. For predictions at the areal unit level we again make use of the integration scheme in the predict formula:

R> # areal unit predictions using logsumexp mapperR> fml_latent <- ~ ibm_eval(+    agg,+    input = list(weights = weight, block = .block),+    state = beta + xi,+    log = FALSE+  )R> predict(+    fit,+    newdata = ips,+    formula = fml_latent+  )

In addition to the component effects, the prediction method also provides access to the latent parameters directly, with the syntax <componentname>_latent appended to component names is used to access the latent parameters directly, with no need to pass a newdata argument to predict(). Similarly, the <componentname>_eval(<values>) can be used to directly evaluate a component effect at given values of the inputs.

For predictions at a higher-resolution, we pass an sf object defining the locations at which we wish to make predictions. For example,

R> # pixel level predictionsR> pts <- fm_pixels(mesh, mask = bnd)R> pred <- predict(+    fit,+    newdata = pts,+    formula = ~ exp(beta + xi)+  )R> pts$lambda_mean <- pred$mean uses the fm_pixels() function that generates an sf object that can be used as prediction locations. The mask argument is optional and can be used to restrict the prediction locations to a specific region. The predict() function will automatically detect the components involved in the predictor expressions and generate predictions at the appropriate locations. This ability to use general R expressions in component definitions allows the software to be easily extended to new data structures.

The posterior mean for both the areal unit rates and the pixel level predictions are shown in Figure 6, alongside the true values used to simulate the data.

Refer to caption
Figure 6: Summary of the fitted model from the aggregation Example 5.2. A: the posterior expected count at the area unit level; B: the true expected count at the area unit level; C: the posterior mean of the intensity field; D: the true intensity surface.

5.3 A joint model aggregating a continuously indexed random field to areal units with the field observed at point locations

This example uses the same aggregation scheme as in the above but also assumes that we have additional data observing the latent random field at a set of point locations. This data structure can be common where information on relevant covariates is only available at a finite set of locations in the region of interest, such as measures of air pollution at air pollution monitoring stations, for example. In addition to the count data at the areal unit level, we have data zksubscript𝑧𝑘z_{k}italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, and we assume the following model:

zksubscript𝑧𝑘\displaystyle z_{k}italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =α0+ξ(𝒔k)+ϵkabsentsubscript𝛼0𝜉subscript𝒔𝑘subscriptitalic-ϵ𝑘\displaystyle=\alpha_{0}+\xi(\boldsymbol{s}_{k})+\epsilon_{k}= italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
lnλ(𝒔)𝜆𝒔\displaystyle\ln\lambda(\boldsymbol{s})roman_ln italic_λ ( bold_italic_s ) =β0+β1ξ(𝒔)absentsubscript𝛽0subscript𝛽1𝜉𝒔\displaystyle=\beta_{0}+\beta_{1}\xi(\boldsymbol{s})= italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ξ ( bold_italic_s )
λisubscript𝜆𝑖\displaystyle\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT jln(wijλ(𝒔ij)),absentsubscript𝑗subscript𝑤𝑖𝑗𝜆subscript𝒔𝑖𝑗\displaystyle\approx\sum_{j}\ln\left(w_{ij}\lambda(\boldsymbol{s}_{ij})\right),≈ ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_ln ( italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_λ ( bold_italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ,

where α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is an intercept parameter which could account for systematic measurement or calibration error involved in measuring the random field ξ(𝒔)𝜉𝒔\xi(\boldsymbol{s})italic_ξ ( bold_italic_s ) and the wijsubscript𝑤𝑖𝑗w_{ij}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are defined as above. We do not include α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to the second line to avoid identifiability issues. The likelihood for 𝒛𝒛\boldsymbol{z}bold_italic_z is Gaussian which also incorporates iid measurement error. In this example we view the random field as though it were a fixed effect covariate that we have only observed in a finite set of locations. This partially observed covariate, which we model using the SPDE effect, affects the continuously indexed log-rate linearly through the parameter β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

The code to specify and fit the model is

R> # Define model componentsR> cmp <- ~ 0 + xi(main = geometry, model = matern) ++    alpha_0(1) ++    beta_0(1) ++    beta_1(1)R>R> # Gaussian likelihood for z observationsR> z_fml <- z ~ alpha_0 + xiR> z_lik <- like(+    "Gaussian",+    formula = z_fml,+    data = z_data+  )R>R> # Poisson likelihood for count observationsR> count_fml <- count ~ ibm_eval(+    agg,+    input = list(weights = weight, block = .block),+    state = beta_0 + beta_1 * xi+  )R> count_lik <- like(+    "Poisson",+    formula = count_fml,+    data = ips,+    response_data = glasgow+  )R>R> # Fit joint likelihood modelR> fit <- bru(+    components = cmp,+    z_lik,+    count_lik,+    options = list(+      bru_max_iter = 10,+      bru_verbose = 2,+  #    bru_initial = list(+  #      beta_1 = 1+  #    )+    )+  ) where z_data is an sf object with a column named z. Predictions can be generated from the model in a similar way to the previous examples. The methods will normally auto-detect which effects are involved in the predictor expressions, but if needed, the default auto-detection can be overridden using the used argument. We can optionally include the bru_initial option to set the initial values for the latent variables, to potentially speed up the method convergence, see Figure 7. We believe that the random field ξ(𝒔)𝜉𝒔\xi(\boldsymbol{s})italic_ξ ( bold_italic_s ) would contribute positive effect on the intensity λ(𝒔)𝜆𝒔\lambda(\boldsymbol{s})italic_λ ( bold_italic_s ). The bru_verbose option controls the amount of information about the iterative algorithm that is printed by bru(). The verbose information for this model is included in Supplementary Material Section G.

A seen by the convergence diagnostic plot in Figure 7, the iterated linearisation method converges in 6 steps, with the final 7th step only computing the posterior integration step of the INLA method.

Refer to caption
Figure 7: Convergence diagnostic plot for the joint model Example 5.3.

The posterior mean for λ(𝒔)𝜆𝒔\lambda(\boldsymbol{s})italic_λ ( bold_italic_s ) and for ξ(𝒔)𝜉𝒔\xi(\boldsymbol{s})italic_ξ ( bold_italic_s ) are shown in Figure 8, alongside the truth and the latent field observations.

Refer to caption
Figure 8: Summary of the fitted model for the aggregation joint likelihood model in Example 5.3. A: the posterior mean of the intensity field; B: the true intensity surface; C: the posterior mean of the covariate field; D: the observations of the covariate field.

6 Discussion

6.1 Other methods for extending INLA

The iterative INLA approach extends the class of models that can be fitted using INLA. Other research has investigated this by embedding INLA within other inference frameworks such as Metropolis Hastings Markov chain Monte Carlo (MH-MCMC) (Gómez-Rubio and Rue, 2018) and importance sampling (Berild et al., 2022). These approaches allow users to fit conditional latent Gaussian models, where the model is, conditional on certain parameters, able to be fitted using standard INLA.

In the case of MH-MCMC, this comes at considerable extra computational cost. Typical chains can require fitting a model using INLA tens or even hundreds of thousands of times, which is prohibitively expensive for all but the most simple of models. This computational cost has hindered embedding INLA within MCMC approaches. In our experience so far, iterative INLA generally requires something between a couple and twenty iterations to converge, with simpler models often converging in 3 or 4 steps.

The class of models that can be fitted using inlabru is smaller than the class of conditional LGMs because only the latent predictor expression is allowed to be non-linear. However, the data likelihood must still be one of those that is currently implemented in INLA. In contrast, the MH-MCMC approach is, in theory at least, not restricted in this sense.

An alternative to MH-MCMC is to use importance sampling (IS) to marginalise over the conditioning parameters for the conditional LGM. Let θcsubscript𝜃𝑐\theta_{c}italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT denote the parameters conditional on which the other model parameters, θcsubscript𝜃𝑐\theta_{-c}italic_θ start_POSTSUBSCRIPT - italic_c end_POSTSUBSCRIPT, can be estimated using INLA. Then the unconditional marginal posterior distributions for elements of θcsubscript𝜃𝑐\theta_{c}italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT can be obtained by

π(θc,j)|𝒚)=π(θc,j|𝒚,θc)π(θc|𝒚)dθc.\pi(\theta_{-c,j})|\boldsymbol{y})=\int\pi(\theta_{-c,j}|\boldsymbol{y},\theta% _{c})\pi(\theta_{c}|\boldsymbol{y})\,\mathrm{d}{}\theta_{c}.italic_π ( italic_θ start_POSTSUBSCRIPT - italic_c , italic_j end_POSTSUBSCRIPT ) | bold_italic_y ) = ∫ italic_π ( italic_θ start_POSTSUBSCRIPT - italic_c , italic_j end_POSTSUBSCRIPT | bold_italic_y , italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) italic_π ( italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | bold_italic_y ) roman_d italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT .

Berild et al. (2022) present estimating this integral using IS. The importance sampling weight can also be used to estimate the joint posterior π(θc|𝒚)𝜋conditionalsubscript𝜃𝑐𝒚\pi(\theta_{c}|\boldsymbol{y})italic_π ( italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | bold_italic_y ). The performance of IS is sensitive to the choice of proposal distribution. The greater the difference from π(θc|𝒚)𝜋conditionalsubscript𝜃𝑐𝒚\pi(\theta_{c}|\boldsymbol{y})italic_π ( italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | bold_italic_y ), the more samples are required. To reduce the risk of a poor choice, Berild et al. (2022) suggest using an adaptive IS algorithm. This approach appears more promising than MH-MCMC, with examples showing good convergence to true posterior in under 30 adaptations of the proposal distribution. However, the examples in their paper are more simple than those presented here.

Stringer et al. (2023) present a method inspired by INLA for efficient fitting of what they term extended LGMs. They propose an alternative approach to inference that, unlike the original INLA implementation, avoids including the predictor as part of the latent field and use this feature to do aggregation similar the spatial examples in this paper. Key to their approach is flexibility in specifying the model likelihood. Whereas inlabru relies on the likelihoods implemented in the core INLA C code, their software can work with a general C++ template of the likelihood, which gives users flexibility to describe non-linear relationships. Since the new variational Bayes method in INLA similarly doesn’t include the the predictor in the latent field, this user-plugin feature is the main practical benefit of the alternative approach.

6.2 Examples of inlabru in action

Several papers have been published that use the non-linear predictor feature of inlabru. Arce Guillen et al. (2023) use inlabru to analyse telemetry data from GPS tags on wild animals. They use the non-linear predictor feature to model animal movement kernels and formulate the likelihood as a Cox process (which is not implemented in INLA, but is in inlabru, see ?lgcp) to include spatially correlated random effects for the first time in this class of model.

Serafini et al. (2023) use inlabru to fit the self-exciting Hawkes point process. In these models the excitation function is non-linear and so cannot be fitted using INLA. This work has been implemented in the ETAS.inlabru package (Naylor et al., 2023) for modelling seismic activity using point process models.

Martino et al. (2021) use inlabru to combine various opportunistic data sources to model the abundance and spatial distribution of dolphins. They use the non-linear predictor to model the decreasing likelihood of observing dolphins further away from tourist hotspots or vessels through a half-normal detection function.

All of the above examples are models with spatial or temporal dependence where researchers would like to be able to capitalise on the computational efficiency offered by the INLA method. However, due to the nature of the analyses, these models cannot be fitted using standard INLA package implementation. Using the iterative INLA approach extends the capability of researchers to address their research questions.

6.3 Software that uses inlabru

In addition to ETAS.inlabru already mentioned above there are several pieces of software that depend on inlabru. The dirinla package (Joaquín Martínez-Minaya and Finn Lindgren, 2022) implements Dirichlet regression for analysing compositional data using INLA. The rSPDE package (Bolin and Simas, 2023; Bolin et al., 2024) is software for computing rational approximations of fractional SPDEs, which allows for inference on the fractional parameter in INLA SPDE models (Lindgren et al., 2011). The MetricGraph package (Bolin et al., 2024, 2023) is software for fitting models with random fields defined on graphs. INLAspacetime (Krainski et al., 2023) supports spatio-temporal modelling using the cgeneric interface for defining new model components and supports non-separable space-time models (Lindgren et al., 2023). The fdmr package (Jones et al., 2023) supports interactive spatio-temporal modelling and includes features for defining spatially explicit model components, choosing priors, and evaluating model outputs.

6.4 Future improvements

There are several avenues for improvements to inlabru moving forwards. The implementation of the KL-divergence result (Equations (2) and (3)) is feasible but requires further development time. The like() function has the potential to be improved by adding modularity and extracting certain model building steps into separate functions. This should allow for easier extensions to complex user-defined observation models, such as specific aggregation models, as well as support for add-on packages. Finally, although the interoperability with INLA is considerable, when new features are introduced to INLA, they in a small number of cases require special support to be implemented in inlabru. An example of this are observation models where special data formats are required, such as inla.mdata and inla.surv. From INLA version 24.06.27 and inlabru development version 2.10.1.9011, inlabru can be used with both inla.mdata and inla.surv objects, so that they will be fully supported when version 2.11.0 is released.

The results in the paper were generated with INLA 24.06.27, inlabru 2.10.1.9012, and fmesher 0.1.6.9003.

References

  • Arce Guillen et al. (2023) Arce Guillen, R., F. Lindgren, S. Muff, T. W. Glass, G. A. Breed, and U. E. Schlägel (2023). Accounting for unobserved spatial variation in step selection analyses of animal movement via spatial random effects. Methods in Ecology and Evolution 14(10), 2639–2653.
  • Bachl et al. (2019) Bachl, F. E., F. Lindgren, D. L. Borchers, and J. B. Illian (2019). inlabru: an r package for bayesian spatial modelling from ecological survey data. Methods in Ecology and Evolution 10(6), 760–766.
  • Bakka et al. (2018) Bakka, H., H. Rue, G.-A. Fuglstad, A. Riebler, D. Bolin, J. Illian, E. Krainski, D. Simpson, and F. Lindgren (2018). Spatial modeling with R-INLA: A review. Wiley Interdisciplinary Reviews: Computational Statistics 10(6).
  • Bates et al. (2015) Bates, D., M. Mächler, B. Bolker, and S. Walker (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1), 1–48.
  • Berild et al. (2022) Berild, M. O., S. Martino, V. Gómez-Rubio, and H. Rue (2022, October). Importance Sampling with the Integrated Nested Laplace Approximation. Journal of Computational and Graphical Statistics 31(4), 1225–1237.
  • Besag et al. (1991) Besag, J., J. York, and A. Mollié (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics 43(1), 1–20.
  • Bivand et al. (2014) Bivand, R. S., V. Gómez-Rubio, and H. Rue (2014, August). Approximate Bayesian inference for spatial econometrics models. Spatial Statistics 9, 146–165.
  • Blangiardo et al. (2013) Blangiardo, M., M. Cameletti, G. Baio, and H. Rue (2013, March). Spatial and spatio-temporal models with R-INLA. Spatial and Spatio-temporal Epidemiology 4, 33–49.
  • Bolin and Simas (2023) Bolin, D. and A. B. Simas (2023). rSPDE: Rational Approximations of Fractional Stochastic Partial Differential Equations. R package version 2.3.3.
  • Bolin et al. (2023) Bolin, D., A. B. Simas, and J. Wallin (2023). MetricGraph: Random fields on metric graphs. R package version 1.3.0.9000.
  • Bolin et al. (2024) Bolin, D., A. B. Simas, and J. Wallin (2024). Gaussian whittle-matérn fields on metric graphs. Bernoulli 30(2), 1611–1639.
  • Bolin et al. (2024) Bolin, D., A. B. Simas, and Z. Xiong (2024, January). Covariance–Based Rational Approximations of Fractional SPDEs for Computationally Efficient Bayesian Inference. Journal of Computational and Graphical Statistics 33(1), 64–74.
  • Bowman and Azzalini (1997) Bowman, A. W. and A. Azzalini (1997). Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations, Volume 18. OUP Oxford.
  • Brooks et al. (2017) Brooks, M. E., K. Kristensen, K. J. van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J. Skaug, M. Mächler, and B. M. Bolker (2017). glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal 9(2), 378–400.
  • Buckland et al. (2015) Buckland, S. T., E. A. Rexstad, T. A. Marques, and C. S. Oedekoven (2015). Distance Sampling: Methods and Applications. Methods in Statistical Ecology. Springer International Publishing.
  • Chiuchiolo et al. (2023) Chiuchiolo, C., J. van Niekerk, and H. Rue (2023, March). Joint posterior inference for latent Gaussian models with R-INLA. Journal of Statistical Computation and Simulation 93(5), 723–752.
  • Feng et al. (2022) Feng, Q., J. G. Shanthikumar, and M. Xue (2022). Consumer Choice Models and Estimation: A Review and Extension. Production and Operations Management 31(2), 847–867.
  • Gasparrini (2014) Gasparrini, A. (2014). Modeling exposure–lag–response associations with distributed lag non-linear models. Statistics in Medicine 33(5), 881–899.
  • Gómez-Rubio and Rue (2018) Gómez-Rubio, V. and H. Rue (2018). Markov chain Monte Carlo with the integrated nested Laplace approximation. Statistics and Computing 28(5), 1033–1051.
  • Hawkes (1971) Hawkes, A. (1971, April). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90.
  • Hernangómez (2023) Hernangómez, D. (2023). Using the tidyverse with terra objects: the tidyterra package. Journal of Open Source Software 8(91), 5751.
  • Hijmans (2023) Hijmans, R. J. (2023). raster: Geographic Data Analysis and Modeling. R package version 3.6-26.
  • Hijmans (2024) Hijmans, R. J. (2024). terra: Spatial Data Analysis. R package version 1.7-78.
  • Illian et al. (2013) Illian, J. B., S. Martino, S. H. Sørbye, J. B. Gallego-Fernández, M. Zunzunegui, M. P. Esquivias, and J. M. J. Travis (2013). Fitting complex ecological point process models with integrated nested Laplace approximation. Methods in Ecology and Evolution 4(4), 305–315.
  • Joaquín Martínez-Minaya and Finn Lindgren (2022) Joaquín Martínez-Minaya and Finn Lindgren (2022). dirinla: Hierarchical Bayesian Dirichlet regression models using Integrated Nested Laplace Approximation. R package version 1.0.5.9000.
  • Jones et al. (2023) Jones, G., X. Yin, J. Aiken, and J. Bamber (2023). fdmr: 4D Modeller project. R package version 0.2.0.
  • Krainski et al. (2018) Krainski, E. T., V. Gómez-Rubio, H. Bakka, A. Lenzi, D. Castro-Camilo, D. Simpson, F. Lindgren, and H. Rue (2018, December). Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Chapman and Hall/CRC.
  • Krainski et al. (2023) Krainski, E. T., F. Lindgren, and H. Rue (2023). INLAspacetime: Spatial and Spatio-Temporal Models using ’INLA’. R package version 0.1.7.
  • Levis et al. (2021) Levis, A., D. Lee, J. A. Tropp, C. F. Gammie, and K. L. Bouman (2021, October). Inference of Black Hole Fluid-Dynamics from Sparse Interferometric Measurements. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  2320–2329.
  • Lindgren (2022) Lindgren, F. (2022). Devel: Customised model components with the bru_mapper system.
  • Lindgren (2024) Lindgren, F. (2024). fmesher: Triangle Meshes and Related Geometry Tools. R package version 0.1.6.9003.
  • Lindgren et al. (2023) Lindgren, F., H. Bakka, D. Bolin, E. Krainski, and H. Rue (2023, April). A diffusion-based spatio-temporal extension of Gaussian Mat\’ern fields.
  • Lindgren et al. (2022) Lindgren, F., D. Bolin, and H. Rue (2022, August). The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running. Spatial Statistics 50, 100599.
  • Lindgren and Rue (2015) Lindgren, F. and H. Rue (2015, February). Bayesian Spatial Modelling with R-INLA. Journal of Statistical Software 63, 1–25.
  • Lindgren et al. (2011) Lindgren, F., H. Rue, and J. Lindström (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4), 423–498.
  • Martino et al. (2021) Martino, S., D. S. Pace, S. Moro, E. Casoli, D. Ventura, A. Frachea, M. Silvestri, A. Arcangeli, G. Giacomini, G. Ardizzone, and G. Jona Lasinio (2021). Integration of presence-only data from several sources: A case study on dolphins’ spatial distribution. Ecography 44(10), 1533–1543.
  • Martins et al. (2013) Martins, T. G., D. Simpson, F. Lindgren, and H. Rue (2013). Bayesian computing with INLA: New features. Computational Statistics & Data Analysis 67, 68–83.
  • Matthiopoulos et al. (2020) Matthiopoulos, J., J. Fieberg, and G. Aarts (2020, December). Species-Habitat Associations: Spatial Data, Predictive Models, and Ecological Insights. University of Minnesota Libraries Publishing.
  • Millar and Fryer (1999) Millar, R. B. and R. J. Fryer (1999, March). Estimating the size-selection curves of towed gears, traps, nets and hooks. Reviews in Fish Biology and Fisheries 9(1), 89–116.
  • Modrák et al. (2023) Modrák, M., A. H. Moon, S. Kim, P. Bürkner, N. Huurre, K. Faltejsková, A. Gelman, and A. Vehtari (2023, January). Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity. Bayesian Analysis -1(-1), 1–28. Publisher: International Society for Bayesian Analysis.
  • Moraga (2019) Moraga, P. (2019, November). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. New York: Chapman and Hall/CRC.
  • Nasari et al. (2016) Nasari, M. M., M. Szyszkowicz, H. Chen, D. Crouse, M. C. Turner, M. Jerrett, C. A. Pope, B. Hubbell, N. Fann, A. Cohen, S. M. Gapstur, W. R. Diver, D. Stieb, M. H. Forouzanfar, S.-Y. Kim, C. Olives, D. Krewski, and R. T. Burnett (2016, December). A class of non-linear exposure-response models suitable for health impact assessment applicable to large cohort studies of ambient air pollution. Air Quality, Atmosphere & Health 9(8), 961–972.
  • Naylor et al. (2023) Naylor, M., F. Serafini, F. Lindgren, and I. G. Main (2023, March). Bayesian modeling of the temporal evolution of seismicity using the ETAS.inlabru package. Frontiers in Applied Mathematics and Statistics 9.
  • Newman et al. (2014) Newman, K. B., S. T. Buckland, B. J. T. Morgan, R. King, D. L. Borchers, D. J. Cole, P. Besbeas, O. Gimenez, and L. Thomas (2014). Modelling Population Dynamics: Model Formulation, Fitting and Assessment Using State-Space Methods. Methods in Statistical Ecology. New York, NY: Springer.
  • Pebesma (2018) Pebesma, E. (2018). Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10(1), 439–446.
  • Pebesma and Bivand (2005) Pebesma, E. and R. S. Bivand (2005). S classes and methods for spatial data: the sp package. R news 5(2), 9–13.
  • Real (1979) Real, L. A. (1979). Ecological determinants of functional response. Ecology 60(3), 481–485.
  • Ritz (2010) Ritz, C. (2010). Toward a unified approach to dose–response modeling in ecotoxicology. Environmental Toxicology and Chemistry 29(1), 220–229.
  • Rosenbaum and Rall (2018) Rosenbaum, B. and B. C. Rall (2018). Fitting functional responses: Direct parameter estimation by simulating differential equations. Methods in Ecology and Evolution 9(10), 2076–2090.
  • Rue and Held (2005) Rue, H. and L. Held (2005). Gaussian Markov Random Fields: Theory and Applications. CRC Press.
  • Rue and Martino (2007) Rue, H. and S. Martino (2007, October). Approximate Bayesian inference for hierarchical Gaussian Markov random field models. Journal of Statistical Planning and Inference 137(10), 3177–3192.
  • Rue et al. (2009) Rue, H., S. Martino, and N. Chopin (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(2), 319–392.
  • Serafini et al. (2023) Serafini, F., F. Lindgren, and M. Naylor (2023). Approximation of Bayesian Hawkes process with inlabru. Environmetrics 34(5), e2798.
  • Shumway and Stoffer (2017) Shumway, R. H. and D. S. Stoffer (2017). Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Cham: Springer International Publishing.
  • Simpson et al. (2023) Simpson, E. S., T. Opitz, and J. L. Wadsworth (2023, December). High-dimensional modeling of spatial and spatio-temporal conditional extremes using INLA and Gaussian Markov random fields. Extremes 26(4), 669–713.
  • Smout et al. (2010) Smout, S., C. Asseburg, J. Matthiopoulos, C. Fernández, S. Redpath, S. Thirgood, and J. Harwood (2010, May). The functional response of a generalist predator. PLoS ONE 5(5).
  • Stringer et al. (2023) Stringer, A., P. Brown, and J. Stafford (2023, January). Fast, Scalable Approximations to Posterior Distributions in Extended Latent Gaussian Models. Journal of Computational and Graphical Statistics 32(1), 84–98.
  • Talts et al. (2020) Talts, S., M. Betancourt, D. Simpson, A. Vehtari, and A. Gelman (2020). Validating Bayesian inference algorithms with simulation-based calibration. arXiv:1804.06788 [stat].
  • The National Archives (2024) The National Archives (2024). Open Government Licence. https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
  • The Scottish Government (2021) The Scottish Government (2021, November). Intermediate Zone Boundaries 2011. https://www.data.gov.uk/dataset/133d4983-c57d-4ded-bc59-390c962ea280/intermediate-zone-boundaries-2011.
  • Tierney and Kadane (1986) Tierney, L. and J. B. Kadane (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81(393), 82–86.
  • Van Niekerk et al. (2023) Van Niekerk, J., E. Krainski, D. Rustand, and H. Rue (2023, May). A new avenue for Bayesian inference with INLA. Computational Statistics & Data Analysis 181, 107692.
  • van Niekerk and Rue (2024) van Niekerk, J. and H. Rue (2024). Low-rank Variational Bayes correction to the Laplace method. Journal of Machine Learning Research 25(62), 1–25.
  • Wickham (2009) Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
  • Wood (2017) Wood, S. (2017). Generalized Additive Models: An Introduction with R (2 ed.). Chapman and Hall/CRC.
  • Yuan et al. (2017) Yuan, Y., F. E. Bachl, F. Lindgren, D. L. Borchers, J. B. Illian, S. T. Buckland, H. Rue, and T. Gerrodette (2017, December). Point process models for spatio-temporal distance sampling data from a large-scale survey of blue whales. The Annals of Applied Statistics 11(4), 2270–2297.

Supplementary Information

Since inlabru is under active development, some information in the paper and supplementary materials may become outdated. We encourage readers to check the articles on the inlabru website for up to date information. The articles can be accessed here:

Appendix A Mapper information

The bru_mapper methods define model component design matrices are constructed based on the input information. This is a summary of the bru_mapper methods, a more detailed version can be found in the bru_mapper vignette on the inlabru website (Lindgren, 2022) as well as in documentation ?bru_mapper and ?bru_mapper_methods.

When implementing new user-defined model components, or models defined in add-on packages, e.g. via the rgeneric nd cgeneric INLA frameworks, a new mapper class can be defined, with a bru_get_mapper method that inlabru can call to automatically obtain a suitable mapper object.

A.1 Mapper system introduction

Each inlabru latent model component generates an effect, given the latent state vector, i.e. a vector of values for the component. The purpose of the mapper system is to define the link from state vectors to effect vectors. For an ordinary model component, named c𝑐citalic_c, this link can be represented as a function of the latent variables,

𝜼~c=fc(𝒖c,inputc),subscript~𝜼𝑐subscript𝑓𝑐subscript𝒖𝑐subscriptinput𝑐\widetilde{\boldsymbol{\eta}}_{c}=f_{c}(\boldsymbol{u}_{c},\text{input}_{c}),over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , input start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ,

where 𝒖csubscript𝒖𝑐\boldsymbol{u}_{c}bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the latent state vector, 𝜼~csubscript~𝜼𝑐\widetilde{\boldsymbol{\eta}}_{c}over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the resulting component effect vector, and inputcsubscriptinput𝑐\text{input}_{c}input start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT are fixed component parameters, and fc()subscript𝑓𝑐f_{c}(\cdot)italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( ⋅ ) is a linear or non-linear transformation function, defined by a combination of mapper methods. Each component has a linearised version, at a given linearisation state 𝒖c0superscriptsubscript𝒖𝑐0\boldsymbol{u}_{c}^{0}bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, defined by the affine transformation

𝜼¯c=𝒃c0(inputc)+𝑨c0(inputc)𝒖c,subscript¯𝜼𝑐subscriptsuperscript𝒃0𝑐subscriptinput𝑐subscriptsuperscript𝑨0𝑐subscriptinput𝑐subscript𝒖𝑐\overline{\boldsymbol{\eta}}_{c}=\boldsymbol{b}^{0}_{c}(\text{input}_{c})+% \boldsymbol{A}^{0}_{c}(\text{input}_{c})\boldsymbol{u}_{c},over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = bold_italic_b start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( input start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) + bold_italic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( input start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ,

where 𝒃c(inputc0)subscript𝒃𝑐subscriptsuperscriptinput0𝑐\boldsymbol{b}_{c}(\text{input}^{0}_{c})bold_italic_b start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( input start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) is a component offset, and 𝑨c0(inputc)subscriptsuperscript𝑨0𝑐subscriptinput𝑐\boldsymbol{A}^{0}_{c}(\text{input}_{c})bold_italic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( input start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) is a component design matrix. This matrix depends on the component inputs (covariates, index information, etc) from main, group, replicate, weights, and scale in the component definition, as well as the linearisation point 𝒖c0superscriptsubscript𝒖𝑐0\boldsymbol{u}_{c}^{0}bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. In addition a marginal transformation mapper can be applied, to convert 𝖭(0,1)𝖭01\mathsf{N}(0,1)sansserif_N ( 0 , 1 ) marginal distributions into other, pre-specified distributions.

The in-package documentation for the mapper methods is contained in four parts:

R> ?bru_mapper # Mapper constructorsR> ?bru_mapper_generics # Generic and default methodsR> ?bru_mapper_methods # Specialised mapper methodsR> ?bru_get_mapper # Mapper extraction methods

Regular users normally at most need some of the methods from ?bru_mapper and sometimes ?bru_mapper_generics. The methods in ?bru_mapper_methods provides more details and allows the user to query mappers and to use them outside the context of inlabru model definitions. The bru_mapper_define() and bru_get_mapper() methods are needed for those implementing their own mapper class.

A.2 Mappers

The main purpose of each mapper class is to allow evaluating a component effect, from given input and latent state (𝒖csubscript𝒖𝑐\boldsymbol{u}_{c}bold_italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT), by calling

R> ibm_eval(mapper, input, state)

for the mapper associated with the component definition.

A.3 Basic mappers

Basic mappers take covariate vectors or matrices as input and numeric vectors as state.

The basic constructors are:

  • bru_mapper_const()

  • bru_mapper_linear()

  • bru_mapper_index(n)

  • bru_mapper_factor(values, factor_map**)

  • bru_mapper_matrix()

  • bru_mapper_harmonics(order, scaling, intercept, interval)

See the above-mentioned bru_mapper vignette for the full definition of each mapper.

A.4 Transformation mappers

Transformation mappers are mappers that would normally be combined with other mappers, as steps in a sequence of transformations, or as individual transformation mappers.

  • bru_mapper_scale()

  • bru_mapper_marginal(qfun, pfun, ...)

  • bru_mapper_aggregate(rescale)

  • bru_mapper_logsumexp(rescale)

Details on each is in the package documentation and the mapper vignette.

A.5 Compound mappers

Compound mappers define collections or chains of map**s, and can take various forms of input. The state vector is normally a numeric vector, but can in some cases be a list of vectors.

  • bru_mapper_collect(mappers, hidden)

  • bru_mapper_multi(mappers)

  • bru_mapper_pipe(mappers)

Full details are in the package documentation and mapper vignette.

A.6 Mapper methods

Mapper objects themselves have methods associated with them to retrieve relevant information. Each mapper has the information required by inlabru to construct the full model matrix for the call to INLA. The mapper methods are

R> ibm_n(mapper, inla_f, ...)R> ibm_n_output(mapper, input, ...)R> ibm_values(mapper, inla_f, ...)R> ibm_jacobian(mapper, input, state, ...)R> ibm_eval(mapper, input, state, ...)R> ibm_names(mapper, ...)R> ibm_inla_subset(mapper, ...)

For more information see the vignette.

A.7 Defining new mappers

Taken together, the basic constructors, transformation mappers and compound mappers offer functionality for users to define new mappers for their own model components.

A mapper object should store enough information in order for the ibm_⁢ methods to work. The simplest case of a customised mapper is to just attached a new class label to the front of the S3 class() information of an existing mapper, to obtain a class to override some of the standard ibm_⁢ method implementations.

More commonly, the bru_mapper_define() method should be used to properly set the class information:

R> bru_mapper_define(mapper, new_class)

For users wishing to write their own mappers we strongly suggest reading the mapper vignette which includes more information and provides examples of defining new mappers.

Appendix B Iterative method details

B.1 Fixed point iteration

The choice of the linearisation point is key for the accuracy of the approximation. To define a search and convergence criterion for choosing the linearisation point, we define a Bayesian estimation functional f(p¯𝒗)𝑓subscript¯𝑝𝒗f(\overline{p}_{\boldsymbol{v}})italic_f ( over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT ) of the posterior distribution linearised at 𝒗𝒗\boldsymbol{v}bold_italic_v. This functional takes as input the posterior from a linearised model p¯𝒗subscript¯𝑝𝒗\overline{p}_{\boldsymbol{v}}over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT and outputs a new linearisation point.

We define the functional

f(p¯𝒗)=(𝜽^𝒗,𝒖^𝒗)𝑓subscript¯𝑝𝒗subscript^𝜽𝒗subscript^𝒖𝒗f(\overline{p}_{\boldsymbol{v}})=(\widehat{\boldsymbol{\theta}}_{\boldsymbol{v% }},\widehat{\boldsymbol{u}}_{\boldsymbol{v}})italic_f ( over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT ) = ( over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT , over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT )

where

𝜽^𝒗=argmax𝜽p¯𝒗(𝜽|𝒚),subscript^𝜽𝒗subscriptargmax𝜽subscript¯𝑝𝒗conditional𝜽𝒚\widehat{\boldsymbol{\theta}}_{\boldsymbol{v}}=\operatorname*{arg\,max}_{% \boldsymbol{\theta}}\overline{p}_{\boldsymbol{v}}(\boldsymbol{\theta}|% \boldsymbol{y}),over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT ( bold_italic_θ | bold_italic_y ) ,

the posterior mode for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, and

𝒖^𝒗=argmax𝒖p¯𝒗(𝒖|𝒚,𝜽^𝒗),subscript^𝒖𝒗subscriptargmax𝒖subscript¯𝑝𝒗conditional𝒖𝒚subscript^𝜽𝒗\widehat{\boldsymbol{u}}_{\boldsymbol{v}}=\operatorname*{arg\,max}_{% \boldsymbol{u}}\overline{p}_{\boldsymbol{v}}(\boldsymbol{u}|\boldsymbol{y},% \widehat{\boldsymbol{\theta}}_{\boldsymbol{v}}),over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_y , over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_v end_POSTSUBSCRIPT ) ,

the joint conditional posterior mode for 𝒖𝒖\boldsymbol{u}bold_italic_u. In other words, the functional generates a new linearisation point defined as the posterior mode for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and the conditional mode for 𝒖𝒖\boldsymbol{u}bold_italic_u given 𝜽𝜽\boldsymbol{\theta}bold_italic_θ.

An obvious choice for the optimal linearisation point is the posterior mode under the non-linear model. Let (𝜽^,𝒖^)^𝜽^𝒖(\widehat{\boldsymbol{\theta}},\widehat{\boldsymbol{u}})( over^ start_ARG bold_italic_θ end_ARG , over^ start_ARG bold_italic_u end_ARG ) denote the corresponding posterior modes for the true posterior distribution,

𝜽^^𝜽\displaystyle\widehat{{\boldsymbol{\theta}}}over^ start_ARG bold_italic_θ end_ARG =argmax𝜽p(𝜽|𝒚),absentsubscriptargmax𝜽𝑝conditional𝜽𝒚\displaystyle=\operatorname*{arg\,max}_{\boldsymbol{\theta}}p({\boldsymbol{% \theta}}|\boldsymbol{y}),= start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT italic_p ( bold_italic_θ | bold_italic_y ) ,
𝒖^^𝒖\displaystyle\widehat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG =argmax𝒖p(𝒖|𝒚,𝜽^).absentsubscriptargmax𝒖𝑝conditional𝒖𝒚^𝜽\displaystyle=\operatorname*{arg\,max}_{\boldsymbol{u}}p(\boldsymbol{u}|% \boldsymbol{y},\widehat{{\boldsymbol{\theta}}}).= start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT italic_p ( bold_italic_u | bold_italic_y , over^ start_ARG bold_italic_θ end_ARG ) .

We therefore seek a fixed point of the functional (𝜽,𝒖)=f(p¯𝒖)subscript𝜽subscript𝒖𝑓subscript¯𝑝subscript𝒖(\boldsymbol{\theta}_{*},\boldsymbol{u}_{*})=f(\overline{p}_{\boldsymbol{u}_{*% }})( bold_italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = italic_f ( over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) that ideally would be close to (𝜽^,𝒖^)^𝜽^𝒖(\widehat{\boldsymbol{\theta}},\widehat{\boldsymbol{u}})( over^ start_ARG bold_italic_θ end_ARG , over^ start_ARG bold_italic_u end_ARG ). We can achieve this for the conditional latent mode, so that 𝒖=argmax𝒖p(𝒖|𝒚,𝜽^𝒖)subscript𝒖subscriptargmax𝒖𝑝conditional𝒖𝒚subscript^𝜽subscript𝒖\boldsymbol{u}_{*}=\operatorname*{arg\,max}_{\boldsymbol{u}}p(\boldsymbol{u}|% \boldsymbol{y},\widehat{\boldsymbol{\theta}}_{\boldsymbol{u}_{*}})bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT italic_p ( bold_italic_u | bold_italic_y , over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). We therefore seek the latent vector 𝒖subscript𝒖\boldsymbol{u}_{*}bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT that generates the fixed point of the functional, so that (𝜽,𝒖)=f(p¯𝒖)subscript𝜽subscript𝒖𝑓subscript¯𝑝subscript𝒖(\boldsymbol{\theta}_{*},\boldsymbol{u}_{*})=f(\overline{p}_{\boldsymbol{u}_{*% }})( bold_italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = italic_f ( over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

One key to the fixed point iteration is that the observation model is linked to 𝒖𝒖\boldsymbol{u}bold_italic_u only through the non-linear predictor 𝜼~(𝒖)~𝜼𝒖\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})over~ start_ARG bold_italic_η end_ARG ( bold_italic_u ), since this leads to a simplified line search method below.

  1. 0.

    Let 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be an initial linearisation point for the latent variables obtained from the initial INLA call. Iterate the following steps for k=0,1,2,𝑘012k=0,1,2,...italic_k = 0 , 1 , 2 , …

  2. 1.

    Compute the predictor linearisation at 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

  3. 2.

    Compute the linearised INLA posterior p¯𝒖0(𝜽|𝒚)subscript¯𝑝subscript𝒖0conditional𝜽𝒚\overline{p}_{\boldsymbol{u}_{0}}(\boldsymbol{\theta}|\boldsymbol{y})over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ | bold_italic_y ).

  4. 3.

    Let (𝜽1,𝒖1)=(𝜽^𝒖0,𝒖^𝒖0)=f(p¯𝒖0)subscript𝜽1subscript𝒖1subscript^𝜽subscript𝒖0subscript^𝒖subscript𝒖0𝑓subscript¯𝑝subscript𝒖0(\boldsymbol{\theta}_{1},\boldsymbol{u}_{1})=(\widehat{\boldsymbol{\theta}}_{% \boldsymbol{u}_{0}},\widehat{\boldsymbol{u}}_{\boldsymbol{u}_{0}})=f(\overline% {p}_{\boldsymbol{u}_{0}})( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ( over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = italic_f ( over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) be the initial candidate for new linearisation point.

  5. 4.

    Let 𝒗α=(1α)𝒖1+α𝒖0subscript𝒗𝛼1𝛼subscript𝒖1𝛼subscript𝒖0\boldsymbol{v}_{\alpha}=(1-\alpha)\boldsymbol{u}_{1}+\alpha\boldsymbol{u}_{0}bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = ( 1 - italic_α ) bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and find the value α𝛼\alphaitalic_α minimises 𝜼~(𝒗α)𝜼¯(𝒖1)norm~𝜼subscript𝒗𝛼¯𝜼subscript𝒖1\|\widetilde{\boldsymbol{\eta}}(\boldsymbol{v}_{\alpha})-\overline{\boldsymbol% {\eta}}(\boldsymbol{u}_{1})\|∥ over~ start_ARG bold_italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥.

  6. 5.

    Set the new linearisation point 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT equal to 𝒗αsubscript𝒗𝛼\boldsymbol{v}_{\alpha}bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT and repeat from step 1, unless the iteration has converged to a given tolerance.

A potential improvement of step 4 might be to also take into account the prior distribution for 𝒖𝒖\boldsymbol{u}bold_italic_u as a minimisation penalty, to avoid moving further than would be indicated by a full likelihood optimisation.

B.2 Line search details

In step 4, we would ideally want α𝛼\alphaitalic_α to be

argmaxα[lnp(𝒖|𝒚,𝜽1)]𝒖=𝒗α.\operatorname*{arg\,max}_{\alpha}\left[\ln p(\boldsymbol{u}|\boldsymbol{y},% \boldsymbol{\theta}_{1})\right]_{\boldsymbol{u}=\boldsymbol{v}_{\alpha}}.start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT [ roman_ln italic_p ( bold_italic_u | bold_italic_y , bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUBSCRIPT bold_italic_u = bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

However, since this requires access to the internal likelihood and prior density evaluation code, we instead use a simpler alternative. We consider norms of the form η~(𝒗α)η¯(𝒖1)norm~𝜂subscript𝒗𝛼¯𝜂subscript𝒖1\|\widetilde{\eta}(\boldsymbol{v}_{\alpha})-\overline{\eta}(\boldsymbol{u}_{1})\|∥ over~ start_ARG italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - over¯ start_ARG italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥ that only depend on the nonlinear and linearised predictor expressions, and other known quantities, given 𝒖0subscript𝒖0\boldsymbol{u}_{0}bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, such as the current INLA estimate of the component wise predictor variances.

Let σi2=Var𝒖p¯(𝒖|𝒚,𝜽1)(𝜼¯i(𝒖))superscriptsubscript𝜎𝑖2subscriptVarsimilar-to𝒖¯𝑝conditional𝒖𝒚subscript𝜽1subscript¯𝜼𝑖𝒖\sigma_{i}^{2}=\mathrm{Var}_{\boldsymbol{u}\sim\overline{p}(\boldsymbol{u}|% \boldsymbol{y},\boldsymbol{\theta}_{1})}(\overline{\boldsymbol{\eta}}_{i}(% \boldsymbol{u}))italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_Var start_POSTSUBSCRIPT bold_italic_u ∼ over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) ) be the current estimate of the posterior variance for each predictor element i𝑖iitalic_i. We then define an inner product on the space of predictor vectors as

𝒂,𝒃V=iaibiσi2.subscript𝒂𝒃𝑉subscript𝑖subscript𝑎𝑖subscript𝑏𝑖superscriptsubscript𝜎𝑖2\langle\boldsymbol{a},\boldsymbol{b}\rangle_{V}=\sum_{i}\frac{a_{i}b_{i}}{% \sigma_{i}^{2}}.⟨ bold_italic_a , bold_italic_b ⟩ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The squared norm for the difference between the predictor vectors 𝜼~(𝒗α)~𝜼subscript𝒗𝛼\widetilde{\boldsymbol{\eta}}(\boldsymbol{v}_{\alpha})over~ start_ARG bold_italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) and 𝜼¯(𝒖1)¯𝜼subscript𝒖1\overline{\boldsymbol{\eta}}(\boldsymbol{u}_{1})over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ),with respect to this inner product, is defined as

𝜼~(𝒗α)𝜼¯(𝒖1)V2=i|𝜼~i(𝒗α)𝜼¯i(𝒖1)|2σi2.subscriptsuperscriptnorm~𝜼subscript𝒗𝛼¯𝜼subscript𝒖12𝑉subscript𝑖superscriptsubscript~𝜼𝑖subscript𝒗𝛼subscript¯𝜼𝑖subscript𝒖12superscriptsubscript𝜎𝑖2\|\widetilde{\boldsymbol{\eta}}(\boldsymbol{v}_{\alpha})-\overline{\boldsymbol% {\eta}}(\boldsymbol{u}_{1})\|^{2}_{V}=\sum_{i}\frac{|\widetilde{\boldsymbol{% \eta}}_{i}(\boldsymbol{v}_{\alpha})-\overline{\boldsymbol{\eta}}_{i}(% \boldsymbol{u}_{1})|^{2}}{\sigma_{i}^{2}}.∥ over~ start_ARG bold_italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG | over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Using this norm as the target loss function for the line search avoids many potentially expensive evaluations of the true posterior conditional log-density. We evaluate 𝜼~1=𝜼~(𝒖1)subscript~𝜼1~𝜼subscript𝒖1\widetilde{\boldsymbol{\eta}}_{1}=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}% _{1})over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and make use of the linearised predictor information. Let 𝜼~α=𝜼~(𝒗α)subscript~𝜼𝛼~𝜼subscript𝒗𝛼\widetilde{\boldsymbol{\eta}}_{\alpha}=\widetilde{\boldsymbol{\eta}}(% \boldsymbol{v}_{\alpha})over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = over~ start_ARG bold_italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) and 𝜼¯α=𝜼¯(𝒗α)=(1α)𝜼~(𝒖0)+α𝜼¯(𝒖1)subscript¯𝜼𝛼¯𝜼subscript𝒗𝛼1𝛼~𝜼subscript𝒖0𝛼¯𝜼subscript𝒖1\overline{\boldsymbol{\eta}}_{\alpha}=\overline{\boldsymbol{\eta}}(\boldsymbol% {v}_{\alpha})=(1-\alpha)\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{0})+% \alpha\overline{\boldsymbol{\eta}}(\boldsymbol{u}_{1})over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = over¯ start_ARG bold_italic_η end_ARG ( bold_italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) = ( 1 - italic_α ) over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_α over¯ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). In other words, α=0𝛼0\alpha=0italic_α = 0 corresponds to the previous linear predictor, and α=1𝛼1\alpha=1italic_α = 1 is the current estimate from INLA. An exact line search would minimise 𝜼~α𝜼¯1normsubscript~𝜼𝛼subscript¯𝜼1\|\widetilde{\boldsymbol{\eta}}_{\alpha}-\overline{\boldsymbol{\eta}}_{1}\|∥ over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥. Instead, we define a quadratic approximation to the non-linear predictor as a function of α𝛼\alphaitalic_α,

𝜼˘α=𝜼¯α+α2(𝜼~1𝜼¯1)subscript˘𝜼𝛼subscript¯𝜼𝛼superscript𝛼2subscript~𝜼1subscript¯𝜼1\breve{\boldsymbol{\eta}}_{\alpha}=\overline{\boldsymbol{\eta}}_{\alpha}+% \alpha^{2}(\widetilde{\boldsymbol{\eta}}_{1}-\overline{\boldsymbol{\eta}}_{1})over˘ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

and minimise the quartic polynomial in α𝛼\alphaitalic_α,

𝜼˘α𝜼¯12superscriptnormsubscript˘𝜼𝛼subscript¯𝜼12\displaystyle\|\breve{\boldsymbol{\eta}}_{\alpha}-\overline{\boldsymbol{\eta}}% _{1}\|^{2}∥ over˘ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(α1)(𝜼¯1𝜼¯0)+α2(𝜼~1𝜼¯1)2.absentsuperscriptnorm𝛼1subscript¯𝜼1subscript¯𝜼0superscript𝛼2subscript~𝜼1subscript¯𝜼12\displaystyle=\|(\alpha-1)(\overline{\boldsymbol{\eta}}_{1}-\overline{% \boldsymbol{\eta}}_{0})+\alpha^{2}(\widetilde{\boldsymbol{\eta}}_{1}-\overline% {\boldsymbol{\eta}}_{1})\|^{2}.= ∥ ( italic_α - 1 ) ( over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

If initial expansion and contraction steps are carried out, leading to an initial guess of α=γk𝛼superscript𝛾𝑘\alpha=\gamma^{k}italic_α = italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where γ>1𝛾1\gamma>1italic_γ > 1 is a scaling factor (see ?bru_options, bru_method$factor) and k𝑘kitalic_k is the (signed) number of expansions and contractions, the quadratic expression is replaced by

𝜼˘α𝜼¯12superscriptnormsubscript˘𝜼𝛼subscript¯𝜼12\displaystyle\|\breve{\boldsymbol{\eta}}_{\alpha}-\overline{\boldsymbol{\eta}}% _{1}\|^{2}∥ over˘ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(α1)(𝜼¯1𝜼¯0)+α2γ2k(𝜼~γk𝜼¯γk)2,absentsuperscriptnorm𝛼1subscript¯𝜼1subscript¯𝜼0superscript𝛼2superscript𝛾2𝑘subscript~𝜼superscript𝛾𝑘subscript¯𝜼superscript𝛾𝑘2\displaystyle=\|(\alpha-1)(\overline{\boldsymbol{\eta}}_{1}-\overline{% \boldsymbol{\eta}}_{0})+\frac{\alpha^{2}}{\gamma^{2k}}(\widetilde{\boldsymbol{% \eta}}_{\gamma^{k}}-\overline{\boldsymbol{\eta}}_{\gamma^{k}})\|^{2},= ∥ ( italic_α - 1 ) ( over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT end_ARG ( over~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

which is minimised on the interval α[γk1,γk+1]𝛼superscript𝛾𝑘1superscript𝛾𝑘1\alpha\in[\gamma^{k-1},\gamma^{k+1}]italic_α ∈ [ italic_γ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_γ start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ].

Appendix C Approximate Bayesian method checking

We take an approach adapted from Talts et al. (2020). Given a Bayesian hierarchical model of the form

θ𝜃\displaystyle\thetaitalic_θ p(θ)similar-toabsent𝑝𝜃\displaystyle\sim p(\theta)∼ italic_p ( italic_θ )
u𝑢\displaystyle uitalic_u p(u|θ)similar-toabsent𝑝conditional𝑢𝜃\displaystyle\sim p(u|\theta)∼ italic_p ( italic_u | italic_θ )
y𝑦\displaystyle yitalic_y p(y|u,θ),similar-toabsent𝑝conditional𝑦𝑢𝜃\displaystyle\sim p(y|u,\theta),∼ italic_p ( italic_y | italic_u , italic_θ ) ,

the aim is to assess the ability of an approximate Bayesian inference method to estimate the posterior of some functional h(θ,u)𝜃𝑢h(\theta,u)italic_h ( italic_θ , italic_u ). An approximate posterior p~(u,θ|y)~𝑝𝑢conditional𝜃𝑦\widetilde{p}(u,\theta|y)over~ start_ARG italic_p end_ARG ( italic_u , italic_θ | italic_y ) can be checked by simulating data from the exact model and checking approximate posterior CDF values for uniformity.

Given samples

θ(k)superscript𝜃𝑘\displaystyle\theta^{(k)}italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT p(θ)similar-toabsent𝑝𝜃\displaystyle\sim p(\theta)∼ italic_p ( italic_θ )
u(k)superscript𝑢𝑘\displaystyle u^{(k)}italic_u start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT p(u|θ(k))similar-toabsent𝑝conditional𝑢superscript𝜃𝑘\displaystyle\sim p(u|\theta^{(k)})∼ italic_p ( italic_u | italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT )
y(k)superscript𝑦𝑘\displaystyle y^{(k)}italic_y start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT p(y|u,θ(k)),similar-toabsent𝑝conditional𝑦𝑢superscript𝜃𝑘\displaystyle\sim p(y|u,\theta^{(k)}),∼ italic_p ( italic_y | italic_u , italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ,

for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, the CDF value is defined as

w(k)=F(h(θ,u)),superscript𝑤𝑘𝐹𝜃𝑢w^{(k)}=F(h(\theta,u)),italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = italic_F ( italic_h ( italic_θ , italic_u ) ) ,

where F(.)F(.)italic_F ( . ) is the CDF for h(θ,u)𝜃𝑢h(\theta,u)italic_h ( italic_θ , italic_u ) under the approximate posterior p~(h(θ,u)|y)~𝑝conditional𝜃𝑢𝑦\widetilde{p}(h(\theta,u)|y)over~ start_ARG italic_p end_ARG ( italic_h ( italic_θ , italic_u ) | italic_y ). If the approximate inference method recovers the correct posterior distributions then w(k)𝖴𝗇𝗂𝖿(0,1)similar-tosuperscript𝑤𝑘𝖴𝗇𝗂𝖿01w^{(k)}\sim\mathsf{Unif}(0,1)italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ sansserif_Unif ( 0 , 1 ), independently for k=1,K𝑘1𝐾k=1,\ldots Kitalic_k = 1 , … italic_K.

If the F(.)F(.)italic_F ( . ) is not available in closed form then it can be estimated by samples from p~(u,θ|y)~𝑝𝑢conditional𝜃𝑦\widetilde{p}(u,\theta|y)over~ start_ARG italic_p end_ARG ( italic_u , italic_θ | italic_y ). For each k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, sample J𝐽Jitalic_J times from p~(u,θ|y(k))~𝑝𝑢conditional𝜃superscript𝑦𝑘\widetilde{p}(u,\theta|y^{(k)})over~ start_ARG italic_p end_ARG ( italic_u , italic_θ | italic_y start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) and calculate the empirical CDF value

w~(k)=1Jj=1J𝕀(h(θ(j|k),u(j|k))<h(θ(k),u(k)))12J.superscript~𝑤𝑘1𝐽superscriptsubscript𝑗1𝐽𝕀superscript𝜃conditional𝑗𝑘superscript𝑢conditional𝑗𝑘superscript𝜃𝑘superscript𝑢𝑘12𝐽\widetilde{w}^{(k)}=\frac{1}{J}\sum_{j=1}^{J}\mathbb{I}(h(\theta^{(j|k)},u^{(j% |k)})<h(\theta^{(k)},u^{(k)}))-\frac{1}{2J}.over~ start_ARG italic_w end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_J end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT blackboard_I ( italic_h ( italic_θ start_POSTSUPERSCRIPT ( italic_j | italic_k ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_j | italic_k ) end_POSTSUPERSCRIPT ) < italic_h ( italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ) - divide start_ARG 1 end_ARG start_ARG 2 italic_J end_ARG .

The empirical CDF value w~(k)superscript~𝑤𝑘\widetilde{w}^{(k)}over~ start_ARG italic_w end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is a normalised rank statistic that, if the approximate inference approach recovers the posterior exactly, will have a discrete uniform distribution on an equal partition of (0,1)01(0,1)( 0 , 1 ). Note that this is an adjusted form to that given in Talts et al. (2020), which works directly with the rank statistic j=1J𝕀(h(θ(j|k),u(j|k))<h(θ(k),u(k)))superscriptsubscript𝑗1𝐽𝕀superscript𝜃conditional𝑗𝑘superscript𝑢conditional𝑗𝑘superscript𝜃𝑘superscript𝑢𝑘\sum_{j=1}^{J}\mathbb{I}(h(\theta^{(j|k)},u^{(j|k)})<h(\theta^{(k)},u^{(k)}))∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT blackboard_I ( italic_h ( italic_θ start_POSTSUPERSCRIPT ( italic_j | italic_k ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_j | italic_k ) end_POSTSUPERSCRIPT ) < italic_h ( italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ) which is uniform on the integers {0,1,,J}01𝐽\{0,1,\ldots,J\}{ 0 , 1 , … , italic_J }. The adjusted form transforms this to the interval (0,1)01(0,1)( 0 , 1 ), not inclusive of the endpoints. Choosing J𝐽Jitalic_J to be large allows samples to be approximately evaluated using standard approaches for checking samples from a continuous uniform distribution, such as the Kolmogorov-Smirnov test, which we use in Section 4.

As noted by Modrák et al. (2023), the calibration checking approach above is insensitive to certain types of deviations from the true posterior distribution. The solution suggested there, including assessing functionals that involve the observation log-likelihoods, may also be implemented for INLA and inlabru results.

Appendix D Setting priors

It is important to consider priors carefully for non-linear model components. For example, consider the default INLA prior of a "iid" Gaussian model x𝑥xitalic_x, which is mean zero and precision equal to 5×1055superscript1055\times 10^{-5}5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, which is equivalent to a standard deviation of roughly 141. Then exp(x)𝑥\exp(x)roman_exp ( italic_x ) will have a variance that is larger than the typical maximum number that can be represented in the double-precision floating point number format that R uses. Care should therefore be taken when setting priors for non-linear predictors. As is the case in INLA, priors can be set by using the hyper argument when defining model components.

Appendix E Aggregation example integration scheme

In the examples in Section 5.2 of the paper, the predictor consists of a numerical integration scheme applied to the latent GMRF parameters:

lnλisubscript𝜆𝑖\displaystyle\ln\lambda_{i}roman_ln italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =ln(Ωiexp(β+ξ(𝒔))d𝒔)absentsubscriptsubscriptΩ𝑖𝛽𝜉𝒔differential-d𝒔\displaystyle=\ln\left(\int_{\Omega_{i}}\exp(\beta+\xi(\boldsymbol{s}))\,% \mathrm{d}{}\boldsymbol{s}\right)= roman_ln ( ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s ) ) roman_d bold_italic_s )
ln[jaijexp(β+ξ(𝒔ij))].absentsubscript𝑗subscript𝑎𝑖𝑗𝛽𝜉subscript𝒔𝑖𝑗\displaystyle\approx\ln\left[\sum_{j}a_{ij}\exp(\beta+\xi(\boldsymbol{s}_{ij})% )\right].≈ roman_ln [ ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT roman_exp ( italic_β + italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ] .

inlabru supports the construction of integration schemes primarily through the fmesher::fm_int() function. This function mainly provides support for defining integration with respect to the finite element mesh for the Matérn SPDE effect. This integration scheme can be constructed in the following way:

R> ips <- fm_int(+    domain = mesh,+    samplers = glasgow+  )R>R> agg <- bru_mapper_logsumexp(+    n_block = nrow(glasgow),+    rescale = FALSE+  )R>R> # For explicit construction of the aggregation weight matrix:R> W <- fm_block(+    block = ips$.block,+    weights = ips$weight,+    rescale = FALSE+  ) By default, the integration scheme uses weights at each mesh node, but higher resolution integration schemes can also be constructed. The fm_block() function can be used to constructs the integration matrix by using the block and weights arguments. rescale = FALSE indicates that the weights in each polygon should not be rescaled to sum to one. The bru_mapper_logsumexp() function allows a more numerically stable implementation of the aggregation calculations, using internal weight shifts to avoid potential numerical over/underflow.

Appendix F Approximation Accuracy

Section 2.6 of the paper discussed two options for assessing the accuracy of the linearised model estimates. We now present the details required for the Kullback-Leibler divergence diagnostics, that compares the posterior conditional densities p~(𝒖|𝒚,𝜽)~𝑝conditional𝒖𝒚𝜽\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta})over~ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) and p¯(𝒖|𝒚,𝜽)¯𝑝conditional𝒖𝒚𝜽\overline{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta})over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ), evaluated at the posterior mode for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. With Bayes’ theorem,

p(𝒖|𝒚,𝜽)𝑝conditional𝒖𝒚𝜽\displaystyle p(\boldsymbol{u}|\boldsymbol{y},{\boldsymbol{\theta}})italic_p ( bold_italic_u | bold_italic_y , bold_italic_θ ) =p(𝒖,𝒚|𝜽)p(𝒚|𝜽)absent𝑝𝒖conditional𝒚𝜽𝑝conditional𝒚𝜽\displaystyle=\frac{p(\boldsymbol{u},\boldsymbol{y}|{\boldsymbol{\theta}})}{p(% \boldsymbol{y}|{\boldsymbol{\theta}})}= divide start_ARG italic_p ( bold_italic_u , bold_italic_y | bold_italic_θ ) end_ARG start_ARG italic_p ( bold_italic_y | bold_italic_θ ) end_ARG
=p(𝒚|𝒖,𝜽)p(𝒖|𝜽)p(𝒚|𝜽),absent𝑝conditional𝒚𝒖𝜽𝑝conditional𝒖𝜽𝑝conditional𝒚𝜽\displaystyle=\frac{p(\boldsymbol{y}|\boldsymbol{u},{\boldsymbol{\theta}})p(% \boldsymbol{u}|{\boldsymbol{\theta}})}{p(\boldsymbol{y}|{\boldsymbol{\theta}})},= divide start_ARG italic_p ( bold_italic_y | bold_italic_u , bold_italic_θ ) italic_p ( bold_italic_u | bold_italic_θ ) end_ARG start_ARG italic_p ( bold_italic_y | bold_italic_θ ) end_ARG ,

where p(𝒖|𝜽)𝑝conditional𝒖𝜽p(\boldsymbol{u}|\boldsymbol{\theta})italic_p ( bold_italic_u | bold_italic_θ ) is a Gaussian density and p(𝒚|𝜽)𝑝conditional𝒚𝜽p(\boldsymbol{y}|\boldsymbol{\theta})italic_p ( bold_italic_y | bold_italic_θ ) is a normalisation factor.

Theorem 1.

Let 𝖭(𝐦¯,𝐐¯1)𝖭¯𝐦superscript¯𝐐1\mathsf{N}(\overline{\boldsymbol{m}},\overline{\boldsymbol{Q}}^{-1})sansserif_N ( over¯ start_ARG bold_italic_m end_ARG , over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) be the linearised posterior distribution for 𝛉=𝛉^𝛉^𝛉\boldsymbol{\theta}=\widehat{\boldsymbol{\theta}}bold_italic_θ = over^ start_ARG bold_italic_θ end_ARG. Let

gi=ηilnp(𝒚|𝜽,𝜼)|𝜼superscriptsubscript𝑔𝑖evaluated-atsubscript𝜂𝑖𝑝conditional𝒚𝜽𝜼superscript𝜼g_{i}^{*}=\left.\frac{\partial}{\partial\eta_{i}}\ln p(\boldsymbol{y}|{% \boldsymbol{\theta}},\boldsymbol{\eta})\right|_{\boldsymbol{\eta}^{*}}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG ∂ end_ARG start_ARG ∂ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η ) | start_POSTSUBSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

and

𝑯i=uuη~i(𝒖)|𝒖.subscriptsuperscript𝑯𝑖evaluated-atsubscript𝑢superscriptsubscript𝑢topsubscript~𝜂𝑖𝒖subscript𝒖\boldsymbol{H}^{*}_{i}=\left.\nabla_{u}\nabla_{u}^{\top}\widetilde{\eta}_{i}(% \boldsymbol{u})\right|_{\boldsymbol{u}_{*}}.bold_italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) | start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

and form the sum of their products, 𝐆=igi𝐇i𝐆subscript𝑖superscriptsubscript𝑔𝑖superscriptsubscript𝐇𝑖\boldsymbol{G}=\sum_{i}g_{i}^{*}\boldsymbol{H}_{i}^{*}bold_italic_G = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

A Taylor approximation for the difference in observation log-densities between the linearised and original models leads to an approximate non-linear posterior distribution 𝖭(𝐦~,𝐐~1)𝖭~𝐦superscript~𝐐1\mathsf{N}(\widetilde{\boldsymbol{m}},\widetilde{\boldsymbol{Q}}^{-1})sansserif_N ( over~ start_ARG bold_italic_m end_ARG , over~ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ), where

𝑸~~𝑸\displaystyle\widetilde{\boldsymbol{Q}}over~ start_ARG bold_italic_Q end_ARG =𝑸¯𝑮,absent¯𝑸𝑮\displaystyle=\overline{\boldsymbol{Q}}-\boldsymbol{G},= over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ,
𝑸~𝒎~~𝑸~𝒎\displaystyle\widetilde{\boldsymbol{Q}}\widetilde{\boldsymbol{m}}over~ start_ARG bold_italic_Q end_ARG over~ start_ARG bold_italic_m end_ARG =𝑸¯𝒎¯𝑮𝒖.absent¯𝑸¯𝒎𝑮subscript𝒖\displaystyle=\overline{\boldsymbol{Q}}\overline{\boldsymbol{m}}-\boldsymbol{G% }\boldsymbol{u}_{*}.= over¯ start_ARG bold_italic_Q end_ARG over¯ start_ARG bold_italic_m end_ARG - bold_italic_G bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT .

The K-L divergences are given by

𝖣𝖪𝖫(p¯p~)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\overline{p}\,\middle\|\,\widetilde% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ∥ over~ start_ARG italic_p end_ARG ) =12[lndet(𝑸¯)lndet(𝑸¯𝑮)+tr(𝑸¯𝑮)𝑸¯1)d+(𝒎¯𝒎~)(𝑸¯𝑮)(𝒎¯𝒎~)],\displaystyle=\frac{1}{2}\left[\ln\det(\overline{\boldsymbol{Q}})-\ln\det(% \overline{\boldsymbol{Q}}-\boldsymbol{G})+\operatorname*{tr}\left(\overline{% \boldsymbol{Q}}-\boldsymbol{G})\overline{\boldsymbol{Q}}^{-1}\right)-d+(% \overline{\boldsymbol{m}}-\widetilde{\boldsymbol{m}})^{\top}(\overline{% \boldsymbol{Q}}-\boldsymbol{G})(\overline{\boldsymbol{m}}-\widetilde{% \boldsymbol{m}})\right],= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG ) - roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ) + roman_tr ( over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ) over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) - italic_d + ( over¯ start_ARG bold_italic_m end_ARG - over~ start_ARG bold_italic_m end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ) ( over¯ start_ARG bold_italic_m end_ARG - over~ start_ARG bold_italic_m end_ARG ) ] ,
𝖣𝖪𝖫(p~p¯)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\widetilde{p}\,\middle\|\,\overline% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG ∥ over¯ start_ARG italic_p end_ARG ) =12[lndet(𝑸¯𝑮)lndet(𝑸¯)+tr(𝑸¯(𝑸¯𝑮)1)d+(𝒎¯𝒎~)𝑸¯(𝒎¯𝒎~)].absent12delimited-[]¯𝑸𝑮¯𝑸tr¯𝑸superscript¯𝑸𝑮1𝑑superscript¯𝒎~𝒎top¯𝑸¯𝒎~𝒎\displaystyle=\frac{1}{2}\left[\ln\det(\overline{\boldsymbol{Q}}-\boldsymbol{G% })-\ln\det(\overline{\boldsymbol{Q}})+\operatorname*{tr}\left(\overline{% \boldsymbol{Q}}(\overline{\boldsymbol{Q}}-\boldsymbol{G})^{-1}\right)-d+(% \overline{\boldsymbol{m}}-\widetilde{\boldsymbol{m}})^{\top}\overline{% \boldsymbol{Q}}(\overline{\boldsymbol{m}}-\widetilde{\boldsymbol{m}})\right].= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ) - roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG ) + roman_tr ( over¯ start_ARG bold_italic_Q end_ARG ( over¯ start_ARG bold_italic_Q end_ARG - bold_italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) - italic_d + ( over¯ start_ARG bold_italic_m end_ARG - over~ start_ARG bold_italic_m end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_Q end_ARG ( over¯ start_ARG bold_italic_m end_ARG - over~ start_ARG bold_italic_m end_ARG ) ] .

F.0.1 Proof of Theorem 1

Proof.

Recall that the observation likelihood only depends on 𝒖𝒖\boldsymbol{u}bold_italic_u through 𝜼𝜼\boldsymbol{\eta}bold_italic_η. Using a Taylor expansion with respect to 𝜼𝜼\boldsymbol{\eta}bold_italic_η and 𝜼=𝜼~(𝒖)superscript𝜼~𝜼subscript𝒖\boldsymbol{\eta}^{*}=\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_{*})bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = over~ start_ARG bold_italic_η end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ),

lnp(𝒚|𝜼,𝜽)𝑝conditional𝒚𝜼𝜽\displaystyle\ln p(\boldsymbol{y}|\boldsymbol{\eta},\boldsymbol{\theta})roman_ln italic_p ( bold_italic_y | bold_italic_η , bold_italic_θ ) =lnp(𝒚|𝜽,𝜼))\displaystyle=\ln p(\boldsymbol{y}|{\boldsymbol{\theta}},\boldsymbol{\eta}^{*}))= roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
+iηilnp(𝒚|𝜽,𝜼)|𝜼(ηiηi)evaluated-atsubscript𝑖subscript𝜂𝑖𝑝conditional𝒚𝜽𝜼superscript𝜼subscript𝜂𝑖subscriptsuperscript𝜂𝑖\displaystyle\qquad+\sum_{i}\left.\frac{\partial}{\partial\eta_{i}}\ln p(% \boldsymbol{y}|{\boldsymbol{\theta}},\boldsymbol{\eta})\right|_{\boldsymbol{% \eta}^{*}}\cdot(\eta_{i}-\eta^{*}_{i})+ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η ) | start_POSTSUBSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
+12i,j2ηiηjlnp(𝒚|𝜽,𝜼)|𝜼(ηiηi)(ηjηj)+𝒪(𝜼𝜼3),evaluated-at12subscript𝑖𝑗superscript2subscript𝜂𝑖subscript𝜂𝑗𝑝conditional𝒚𝜽𝜼superscript𝜼subscript𝜂𝑖subscriptsuperscript𝜂𝑖subscript𝜂𝑗subscriptsuperscript𝜂𝑗𝒪superscriptnorm𝜼superscript𝜼3\displaystyle\qquad+\frac{1}{2}\sum_{i,j}\left.\frac{\partial^{2}}{\partial% \eta_{i}\partial\eta_{j}}\ln p(\boldsymbol{y}|{\boldsymbol{\theta}},% \boldsymbol{\eta})\right|_{\boldsymbol{\eta}^{*}}\cdot(\eta_{i}-\eta^{*}_{i})(% \eta_{j}-\eta^{*}_{j})+\mathcal{O}(\|\boldsymbol{\eta}-\boldsymbol{\eta}^{*}\|% ^{3}),+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η ) | start_POSTSUBSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + caligraphic_O ( ∥ bold_italic_η - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ,

Similarly, for each component of 𝜼~~𝜼\widetilde{\boldsymbol{\eta}}over~ start_ARG bold_italic_η end_ARG,

η~i(𝒖)subscript~𝜂𝑖𝒖\displaystyle\widetilde{\eta}_{i}(\boldsymbol{u})over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) =ηi+[uη~i(𝒖)|𝒖](𝒖𝒖)absentsubscriptsuperscript𝜂𝑖superscriptdelimited-[]evaluated-atsubscript𝑢subscript~𝜂𝑖𝒖subscript𝒖top𝒖subscript𝒖\displaystyle=\eta^{*}_{i}+\left[\left.\nabla_{u}\widetilde{\eta}_{i}(% \boldsymbol{u})\right|_{\boldsymbol{u}_{*}}\right]^{\top}(\boldsymbol{u}-% \boldsymbol{u}_{*})= italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + [ ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) | start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
+12(𝒖𝒖)[uuη~i(𝒖)|𝒖](𝒖𝒖)+𝒪(𝒖𝒖3)12superscript𝒖subscript𝒖topdelimited-[]evaluated-atsubscript𝑢superscriptsubscript𝑢topsubscript~𝜂𝑖𝒖subscript𝒖𝒖subscript𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle\quad+\frac{1}{2}(\boldsymbol{u}-\boldsymbol{u}_{*})^{\top}\left[% \left.\nabla_{u}\nabla_{u}^{\top}\widetilde{\eta}_{i}(\boldsymbol{u})\right|_{% \boldsymbol{u}_{*}}\right](\boldsymbol{u}-\boldsymbol{u}_{*})+\mathcal{O}(\|% \boldsymbol{u}-\boldsymbol{u}_{*}\|^{3})+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) | start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )
=ηi+bi(𝒖)+hi(𝒖)+𝒪(𝒖𝒖3)absentsuperscriptsubscript𝜂𝑖subscript𝑏𝑖𝒖subscript𝑖𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle=\eta_{i}^{*}+b_{i}(\boldsymbol{u})+h_{i}(\boldsymbol{u})+% \mathcal{O}(\|\boldsymbol{u}-\boldsymbol{u}_{*}\|^{3})= italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) + italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )
=η¯i(𝒖)+hi(𝒖)+𝒪(𝒖𝒖3)absentsubscript¯𝜂𝑖𝒖subscript𝑖𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle=\overline{\eta}_{i}(\boldsymbol{u})+h_{i}(\boldsymbol{u})+% \mathcal{O}(\|\boldsymbol{u}-\boldsymbol{u}_{*}\|^{3})= over¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) + italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )

where uusubscript𝑢superscriptsubscript𝑢top\nabla_{u}\nabla_{u}^{\top}∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the Hessian with respect to 𝒖𝒖\boldsymbol{u}bold_italic_u, bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are linear in 𝒖𝒖\boldsymbol{u}bold_italic_u, and hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are quadratic in 𝒖𝒖\boldsymbol{u}bold_italic_u. Combining the two expansions and taking the difference between the full and linearised log-likelihoods, we get

lnp~(𝒚|𝒖,𝜽)lnp¯(𝒚|𝒖,𝜽)~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝒖𝜽\displaystyle\ln\widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta% })-\ln\overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) =iηilnp(𝒚|𝜽,𝜼)|𝜼hi(𝒖)+𝒪(𝒖𝒖3)absentevaluated-atsubscript𝑖subscript𝜂𝑖𝑝conditional𝒚𝜽𝜼superscript𝜼subscript𝑖𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle=\sum_{i}\left.\frac{\partial}{\partial\eta_{i}}\ln p(\boldsymbol% {y}|{\boldsymbol{\theta}},\boldsymbol{\eta})\right|_{\boldsymbol{\eta}^{*}}% \cdot h_{i}(\boldsymbol{u})+\mathcal{O}(\|\boldsymbol{u}-\boldsymbol{u}_{*}\|^% {3})= ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η ) | start_POSTSUBSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )

Note that the log-likelihood Hessian difference contribution only involves third order 𝒖𝒖\boldsymbol{u}bold_italic_u terms and higher, so the expression above includes all terms up to second order.

Let111Note: this step requires evaluation of the gradient at 𝜼subscript𝜼\boldsymbol{\eta}_{*}bold_italic_η start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT which is the latent mode for the hyperparameter mode 𝜽^^𝜽\widehat{\boldsymbol{\theta}}over^ start_ARG bold_italic_θ end_ARG, and this gradient is currently not evaluated for other 𝜽𝜽\boldsymbol{\theta}bold_italic_θ values.

gi=ηilnp(𝒚|𝜽,𝜼)|𝜼superscriptsubscript𝑔𝑖evaluated-atsubscript𝜂𝑖𝑝conditional𝒚𝜽𝜼superscript𝜼\displaystyle g_{i}^{*}=\left.\frac{\partial}{\partial\eta_{i}}\ln p(% \boldsymbol{y}|{\boldsymbol{\theta}},\boldsymbol{\eta})\right|_{\boldsymbol{% \eta}^{*}}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG ∂ end_ARG start_ARG ∂ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_ln italic_p ( bold_italic_y | bold_italic_θ , bold_italic_η ) | start_POSTSUBSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and𝑯i=uuη~i(𝒖)|𝒖,andsubscriptsuperscript𝑯𝑖evaluated-atsubscript𝑢superscriptsubscript𝑢topsubscript~𝜂𝑖𝒖subscript𝒖\displaystyle\qquad\text{and}\qquad\boldsymbol{H}^{*}_{i}=\left.\nabla_{u}% \nabla_{u}^{\top}\widetilde{\eta}_{i}(\boldsymbol{u})\right|_{\boldsymbol{u}_{% *}},and bold_italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) | start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

and form the sum of their products, 𝑮=igi𝑯i𝑮subscript𝑖superscriptsubscript𝑔𝑖superscriptsubscript𝑯𝑖\boldsymbol{G}=\sum_{i}g_{i}^{*}\boldsymbol{H}_{i}^{*}bold_italic_G = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Then

lnp~(𝒚|𝒖,𝜽)lnp¯(𝒚|𝒖,𝜽)~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝒖𝜽\displaystyle\ln\widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta% })-\ln\overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) =12igi(𝒖𝒖)𝑯i(𝒖𝒖)+𝒪(𝒖𝒖3)absent12subscript𝑖superscriptsubscript𝑔𝑖superscript𝒖subscript𝒖topsuperscriptsubscript𝑯𝑖𝒖subscript𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle=\frac{1}{2}\sum_{i}g_{i}^{*}(\boldsymbol{u}-\boldsymbol{u}_{*})^% {\top}\boldsymbol{H}_{i}^{*}(\boldsymbol{u}-\boldsymbol{u}_{*})+\mathcal{O}(\|% \boldsymbol{u}-\boldsymbol{u}_{*}\|^{3})= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) (4)
=12(𝒖𝒖)𝑮(𝒖𝒖)+𝒪(𝒖𝒖3).absent12superscript𝒖subscript𝒖top𝑮𝒖subscript𝒖𝒪superscriptnorm𝒖subscript𝒖3\displaystyle=\frac{1}{2}(\boldsymbol{u}-\boldsymbol{u}_{*})^{\top}\boldsymbol% {G}(\boldsymbol{u}-\boldsymbol{u}_{*})+\mathcal{O}(\|\boldsymbol{u}-% \boldsymbol{u}_{*}\|^{3}).= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_G ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + caligraphic_O ( ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) . (5)

With 𝒎¯𝜽=𝖤p¯(𝒖|𝒚,𝜽)subscript¯𝒎𝜽subscript𝖤¯𝑝conditional𝒖𝒚𝜽\overline{\boldsymbol{m}}_{\boldsymbol{\theta}}=\mathsf{E}_{\overline{p}}(% \boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta})over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_y , bold_italic_θ ) and 𝑸¯𝜽1=𝖢𝗈𝗏p¯(𝒖,𝒖|𝒚,𝜽)subscriptsuperscript¯𝑸1𝜽subscript𝖢𝗈𝗏¯𝑝𝒖conditional𝒖𝒚𝜽\overline{\boldsymbol{Q}}^{-1}_{\boldsymbol{\theta}}=\mathsf{Cov}_{\overline{p% }}(\boldsymbol{u},\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta})over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = sansserif_Cov start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_u , bold_italic_u | bold_italic_y , bold_italic_θ ), we obtain

𝖤p¯[𝒖{lnp~(𝒚|𝒖,𝜽)lnp¯(𝒚|𝒖,𝜽)}]subscript𝖤¯𝑝delimited-[]subscript𝒖~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝒖𝜽\displaystyle\mathsf{E}_{\overline{p}}\left[\nabla_{\boldsymbol{u}}\left\{\ln% \widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})-\ln\overline{% p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})\right\}\right]sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ ∇ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT { roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) } ] =𝑮(𝒎¯𝜽𝒖)+𝒪(𝖤p¯[𝒖𝒖2|𝒚,𝜽]),\displaystyle=\boldsymbol{G}(\overline{\boldsymbol{m}}_{\boldsymbol{\theta}}-% \boldsymbol{u}_{*})+\mathcal{O}\left(\mathsf{E}_{\overline{p}}\left[\|% \boldsymbol{u}-\boldsymbol{u}_{*}\|^{2}\middle|\boldsymbol{y},\boldsymbol{% \theta}\right]\right),= bold_italic_G ( over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + caligraphic_O ( sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_y , bold_italic_θ ] ) , (6)
𝖤p¯[𝒖𝒖{lnp~(𝒚|𝒖,𝜽)lnp¯(𝒚|𝒖,𝜽)}]subscript𝖤¯𝑝delimited-[]subscript𝒖superscriptsubscript𝒖top~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝒖𝜽\displaystyle\mathsf{E}_{\overline{p}}\left[\nabla_{\boldsymbol{u}}\nabla_{% \boldsymbol{u}}^{\top}\left\{\ln\widetilde{p}(\boldsymbol{y}|\boldsymbol{u},% \boldsymbol{\theta})-\ln\overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol% {\theta})\right\}\right]sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ ∇ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) } ] =𝑮+𝒪(𝖤p¯[𝒖𝒖|𝒚,𝜽]),\displaystyle=\boldsymbol{G}+\mathcal{O}\left(\mathsf{E}_{\overline{p}}\left[% \|\boldsymbol{u}-\boldsymbol{u}_{*}\|\middle|\boldsymbol{y},\boldsymbol{\theta% }\right]\right),= bold_italic_G + caligraphic_O ( sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ | bold_italic_y , bold_italic_θ ] ) , (7)
𝖤p¯[lnp~(𝒚|𝒖,𝜽)lnp¯(𝒚|𝒖,𝜽)]subscript𝖤¯𝑝delimited-[]~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝒖𝜽\displaystyle\mathsf{E}_{\overline{p}}\left[\ln\widetilde{p}(\boldsymbol{y}|% \boldsymbol{u},\boldsymbol{\theta})-\ln\overline{p}(\boldsymbol{y}|\boldsymbol% {u},\boldsymbol{\theta})\right]sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) ] =12tr(𝑮𝑸¯𝜽1)+12(𝒎¯𝜽𝒖)𝑮(𝒎¯𝜽𝒖)absent12tr𝑮superscriptsubscript¯𝑸𝜽112superscriptsubscript¯𝒎𝜽subscript𝒖top𝑮subscript¯𝒎𝜽subscript𝒖\displaystyle=\frac{1}{2}\operatorname*{tr}(\boldsymbol{G}\overline{% \boldsymbol{Q}}_{\boldsymbol{\theta}}^{-1})+\frac{1}{2}(\overline{\boldsymbol{% m}}_{\boldsymbol{\theta}}-\boldsymbol{u}_{*})^{\top}\boldsymbol{G}(\overline{% \boldsymbol{m}}_{\boldsymbol{\theta}}-\boldsymbol{u}_{*})= divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_tr ( bold_italic_G over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_G ( over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
+𝒪(𝖤p¯[𝒖𝒖3|𝒚,𝜽]).\displaystyle\phantom{=}+\mathcal{O}\left(\mathsf{E}_{\overline{p}}\left[\|% \boldsymbol{u}-\boldsymbol{u}_{*}\|^{3}\middle|\boldsymbol{y},\boldsymbol{% \theta}\right]\right).+ caligraphic_O ( sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ ∥ bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT | bold_italic_y , bold_italic_θ ] ) . (8)

For each 𝜽𝜽\boldsymbol{\theta}bold_italic_θ configuration in the INLA output, we can extract both 𝒎¯𝜽subscript¯𝒎𝜽\overline{\boldsymbol{m}}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT and the sparse precision matrix 𝑸¯𝜽subscript¯𝑸𝜽\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT for the Gaussian approximation. The non-sparsity structure of 𝑮𝑮\boldsymbol{G}bold_italic_G is contained in the non-sparsity of 𝑸¯𝜽subscript¯𝑸𝜽\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, which allows the use of Takahashi recursion (implemented by INLA::inla.qinv(Q)) to compute the corresponding 𝑸¯𝜽1subscriptsuperscript¯𝑸1𝜽\overline{\boldsymbol{Q}}^{-1}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT values needed to evaluate the trace tr(𝑮𝑸¯𝜽1)tr𝑮subscriptsuperscript¯𝑸1𝜽\operatorname*{tr}(\boldsymbol{G}\overline{\boldsymbol{Q}}^{-1}_{\boldsymbol{% \theta}})roman_tr ( bold_italic_G over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ). Thus, to implement a numerical approximation of this error analysis only needs special access to the log-likelihood derivatives gisuperscriptsubscript𝑔𝑖g_{i}^{*}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as 𝑯isuperscriptsubscript𝑯𝑖\boldsymbol{H}_{i}^{*}bold_italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can in principle be evaluated numerically.

For a given 𝜽𝜽\boldsymbol{\theta}bold_italic_θ,

𝖣𝖪𝖫(p¯p~)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\overline{p}\,\middle\|\,\widetilde% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ∥ over~ start_ARG italic_p end_ARG ) =Ep¯[lnp¯(𝒖|𝒚,𝜽)p~(𝒖|𝒚,𝜽)]absentsubscript𝐸¯𝑝delimited-[]¯𝑝conditional𝒖𝒚𝜽~𝑝conditional𝒖𝒚𝜽\displaystyle=E_{\overline{p}}\left[\ln\frac{\overline{p}(\boldsymbol{u}|% \boldsymbol{y},\boldsymbol{\theta})}{\widetilde{p}(\boldsymbol{u}|\boldsymbol{% y},\boldsymbol{\theta})}\right]= italic_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u | bold_italic_y , bold_italic_θ ) end_ARG ]
=Ep¯[lnp¯(𝒚|𝒖,𝜽)p~(𝒚|𝒖,𝜽)]lnp¯(𝒚|𝜽)p~(𝒚|𝜽).absentsubscript𝐸¯𝑝delimited-[]¯𝑝conditional𝒚𝒖𝜽~𝑝conditional𝒚𝒖𝜽¯𝑝conditional𝒚𝜽~𝑝conditional𝒚𝜽\displaystyle=E_{\overline{p}}\left[\ln\frac{\overline{p}(\boldsymbol{y}|% \boldsymbol{u},\boldsymbol{\theta})}{\widetilde{p}(\boldsymbol{y}|\boldsymbol{% u},\boldsymbol{\theta})}\right]-\ln\frac{\overline{p}(\boldsymbol{y}|% \boldsymbol{\theta})}{\widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta})}.= italic_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT [ roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_u , bold_italic_θ ) end_ARG ] - roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) end_ARG .

The first term can be approximated via the negative of (8). For the second term,

lnp¯(𝒚|𝜽)p~(𝒚|𝜽)¯𝑝conditional𝒚𝜽~𝑝conditional𝒚𝜽\displaystyle\ln\frac{\overline{p}(\boldsymbol{y}|\boldsymbol{\theta})}{% \widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta})}roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) end_ARG =lnp¯(𝒚|𝜽)p¯(𝒖|𝜽,𝒚)p¯(𝒖|𝜽,𝒚)lnp~(𝒚|𝜽)p~(𝒖|𝜽,𝒚)p~(𝒖|𝜽,𝒚)absent¯𝑝conditional𝒚𝜽¯𝑝conditionalsubscript𝒖𝜽𝒚¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditional𝒚𝜽~𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=\ln\frac{\overline{p}(\boldsymbol{y}|\boldsymbol{\theta})% \overline{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},\boldsymbol{y})}{\overline% {p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},\boldsymbol{y})}-\ln\frac{% \widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta})\widetilde{p}(\boldsymbol{u}_% {*}|\boldsymbol{\theta},\boldsymbol{y})}{\widetilde{p}(\boldsymbol{u}_{*}|% \boldsymbol{\theta},\boldsymbol{y})}= roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG - roman_ln divide start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ ) over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG
=lnp¯(𝒚,𝒖|𝜽)p¯(𝒖|𝜽,𝒚)lnp~(𝒚,𝒖|𝜽)p~(𝒖|𝜽,𝒚)absent¯𝑝𝒚conditionalsubscript𝒖𝜽¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝𝒚conditionalsubscript𝒖𝜽~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=\ln\frac{\overline{p}(\boldsymbol{y},\boldsymbol{u}_{*}|% \boldsymbol{\theta})}{\overline{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}-\ln\frac{\widetilde{p}(\boldsymbol{y},\boldsymbol{u}_{*}|% \boldsymbol{\theta})}{\widetilde{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}= roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ ) end_ARG start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG - roman_ln divide start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG
=lnp¯(𝒚|𝜽,𝒖)p(𝒖|𝜽)p¯(𝒖|𝜽,𝒚)lnp~(𝒚|𝜽,𝒖)p(𝒖|𝜽)p~(𝒖|𝜽,𝒚)absent¯𝑝conditional𝒚𝜽subscript𝒖𝑝conditionalsubscript𝒖𝜽¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditional𝒚𝜽subscript𝒖𝑝conditionalsubscript𝒖𝜽~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=\ln\frac{\overline{p}(\boldsymbol{y}|\boldsymbol{\theta},% \boldsymbol{u}_{*})p(\boldsymbol{u}_{*}|\boldsymbol{\theta})}{\overline{p}(% \boldsymbol{u}_{*}|\boldsymbol{\theta},\boldsymbol{y})}-\ln\frac{\widetilde{p}% (\boldsymbol{y}|\boldsymbol{\theta},\boldsymbol{u}_{*})p(\boldsymbol{u}_{*}|% \boldsymbol{\theta})}{\widetilde{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}= roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_p ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ ) end_ARG start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG - roman_ln divide start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_p ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG
=lnp¯(𝒚|𝜽,𝒖)p¯(𝒖|𝜽,𝒚)lnp~(𝒚|𝜽,𝒖)p~(𝒖|𝜽,𝒚)absent¯𝑝conditional𝒚𝜽subscript𝒖¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditional𝒚𝜽subscript𝒖~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=\ln\frac{\overline{p}(\boldsymbol{y}|\boldsymbol{\theta},% \boldsymbol{u}_{*})}{\overline{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}-\ln\frac{\widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta},% \boldsymbol{u}_{*})}{\widetilde{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}= roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG - roman_ln divide start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG
=lnp¯(𝒚|𝜽,𝒖)p~(𝒚|𝜽,𝒖)lnp¯(𝒖|𝜽,𝒚)p~(𝒖|𝜽,𝒚).absent¯𝑝conditional𝒚𝜽subscript𝒖~𝑝conditional𝒚𝜽subscript𝒖¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=\ln\frac{\overline{p}(\boldsymbol{y}|\boldsymbol{\theta},% \boldsymbol{u}_{*})}{\widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta},% \boldsymbol{u}_{*})}-\ln\frac{\overline{p}(\boldsymbol{u}_{*}|\boldsymbol{% \theta},\boldsymbol{y})}{\widetilde{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}.= roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG - roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG .
=lnp¯(𝒖|𝜽,𝒚)p~(𝒖|𝜽,𝒚),absent¯𝑝conditionalsubscript𝒖𝜽𝒚~𝑝conditionalsubscript𝒖𝜽𝒚\displaystyle=-\ln\frac{\overline{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})}{\widetilde{p}(\boldsymbol{u}_{*}|\boldsymbol{\theta},% \boldsymbol{y})},= - roman_ln divide start_ARG over¯ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG start_ARG over~ start_ARG italic_p end_ARG ( bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_y ) end_ARG ,

where the last step uses that both observation densities are evaluated at the linearisation point, so that the predictors are identical, so the two observation densities are the same. Now, by approximating both the linearised and non-linearised posterior distributions with Gaussian distributions 𝖭(𝒎¯𝜽,𝑸¯𝜽1)𝖭subscript¯𝒎𝜽subscriptsuperscript¯𝑸1𝜽\mathsf{N}(\overline{\boldsymbol{m}}_{\boldsymbol{\theta}},\overline{% \boldsymbol{Q}}^{-1}_{\boldsymbol{\theta}})sansserif_N ( over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) and 𝖭(𝒎~𝜽,𝑸~𝜽1)𝖭subscript~𝒎𝜽subscriptsuperscript~𝑸1𝜽\mathsf{N}(\widetilde{\boldsymbol{m}}_{\boldsymbol{\theta}},\widetilde{% \boldsymbol{Q}}^{-1}_{\boldsymbol{\theta}})sansserif_N ( over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , over~ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ), respectively, the observation log-density discrepancy (5) gives a approximate equation, for some constant C𝐶Citalic_C,

lnp~G(𝒖|𝜽,𝒚)lnp¯G(𝒖|𝜽,𝒚)subscript~𝑝𝐺conditional𝒖𝜽𝒚subscript¯𝑝𝐺conditional𝒖𝜽𝒚\displaystyle\ln\widetilde{p}_{G}(\boldsymbol{u}|\boldsymbol{\theta},% \boldsymbol{y})-\ln\overline{p}_{G}(\boldsymbol{u}|\boldsymbol{\theta},% \boldsymbol{y})roman_ln over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_θ , bold_italic_y ) - roman_ln over¯ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_θ , bold_italic_y ) =lnp~(𝒚|𝜽,𝒖)+lnp(𝒖|𝜽)lnp¯(𝒚|𝜽,𝒖)lnp(𝒖|𝜽)+Cabsent~𝑝conditional𝒚𝜽𝒖𝑝conditional𝒖𝜽¯𝑝conditional𝒚𝜽𝒖𝑝conditional𝒖𝜽𝐶\displaystyle=\ln\widetilde{p}(\boldsymbol{y}|\boldsymbol{\theta},\boldsymbol{% u})+\ln p(\boldsymbol{u}|\boldsymbol{\theta})-\ln\overline{p}(\boldsymbol{y}|% \boldsymbol{\theta},\boldsymbol{u})-\ln p(\boldsymbol{u}|\boldsymbol{\theta})+C= roman_ln over~ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u ) + roman_ln italic_p ( bold_italic_u | bold_italic_θ ) - roman_ln over¯ start_ARG italic_p end_ARG ( bold_italic_y | bold_italic_θ , bold_italic_u ) - roman_ln italic_p ( bold_italic_u | bold_italic_θ ) + italic_C
12(𝒖𝒖)𝑮(𝒖𝒖)+C.absent12superscript𝒖subscript𝒖top𝑮𝒖subscript𝒖𝐶\displaystyle\approx\frac{1}{2}(\boldsymbol{u}-\boldsymbol{u}_{*})^{\top}% \boldsymbol{G}(\boldsymbol{u}-\boldsymbol{u}_{*})+C.≈ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_G ( bold_italic_u - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + italic_C .

In order for the quadratic expressions in 𝒖𝒖\boldsymbol{u}bold_italic_u to match between the left and right hand sides, we need 𝑸~𝜽+𝑸¯𝜽=𝑮subscript~𝑸𝜽subscript¯𝑸𝜽𝑮-\widetilde{\boldsymbol{Q}}_{\boldsymbol{\theta}}+\overline{\boldsymbol{Q}}_{% \boldsymbol{\theta}}=\boldsymbol{G}- over~ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT + over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = bold_italic_G and 𝑸¯𝜽(𝒎~𝜽𝒎¯𝜽)=𝑮(𝒎~𝜽𝒖)subscript¯𝑸𝜽subscript~𝒎𝜽subscript¯𝒎𝜽𝑮subscript~𝒎𝜽subscript𝒖\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}(\widetilde{\boldsymbol{m}}_{% \boldsymbol{\theta}}-\overline{\boldsymbol{m}}_{\boldsymbol{\theta}})=% \boldsymbol{G}(\widetilde{\boldsymbol{m}}_{\boldsymbol{\theta}}-\boldsymbol{u}% _{*})over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) = bold_italic_G ( over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ), or equivalently, 𝑸~𝜽=𝑸¯𝜽𝑮subscript~𝑸𝜽subscript¯𝑸𝜽𝑮\widetilde{\boldsymbol{Q}}_{\boldsymbol{\theta}}=\overline{\boldsymbol{Q}}_{% \boldsymbol{\theta}}-\boldsymbol{G}over~ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G and 𝑸~𝒎~𝜽=𝑸¯𝜽𝒎¯𝜽𝑮𝒖~𝑸subscript~𝒎𝜽subscript¯𝑸𝜽subscript¯𝒎𝜽𝑮subscript𝒖\widetilde{\boldsymbol{Q}}\widetilde{\boldsymbol{m}}_{\boldsymbol{\theta}}=% \overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}\overline{\boldsymbol{m}}_{% \boldsymbol{\theta}}-\boldsymbol{G}\boldsymbol{u}_{*}over~ start_ARG bold_italic_Q end_ARG over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. The K-L divergence for a given 𝜽𝜽\boldsymbol{\theta}bold_italic_θ becomes

𝖣𝖪𝖫(p¯p~)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\overline{p}\,\middle\|\,\widetilde% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ∥ over~ start_ARG italic_p end_ARG ) 12[lndet(𝑸¯𝜽)lndet(𝑸¯𝜽𝑮)tr(𝑮𝑸¯𝜽1)+(𝒎¯𝜽𝒖)𝑮(𝑸¯𝜽𝑮)1𝑮(𝒎𝜽𝒖)].absent12delimited-[]subscript¯𝑸𝜽subscript¯𝑸𝜽𝑮tr𝑮subscriptsuperscript¯𝑸1𝜽superscriptsubscript¯𝒎𝜽subscript𝒖top𝑮superscriptsubscript¯𝑸𝜽𝑮1𝑮subscript𝒎𝜽subscript𝒖\displaystyle\approx\frac{1}{2}\left[\ln\det(\overline{\boldsymbol{Q}}_{% \boldsymbol{\theta}})-\ln\det(\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}-% \boldsymbol{G})-\operatorname*{tr}\left(\boldsymbol{G}\overline{\boldsymbol{Q}% }^{-1}_{\boldsymbol{\theta}}\right)+(\overline{\boldsymbol{m}}_{\boldsymbol{% \theta}}-\boldsymbol{u}_{*})^{\top}\boldsymbol{G}(\overline{\boldsymbol{Q}}_{% \boldsymbol{\theta}}-\boldsymbol{G})^{-1}\boldsymbol{G}(\boldsymbol{m}_{% \boldsymbol{\theta}}-\boldsymbol{u}_{*})\right].≈ divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) - roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G ) - roman_tr ( bold_italic_G over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) + ( over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_G ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_G ( bold_italic_m start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ] .

where 𝑸𝑸\boldsymbol{Q}bold_italic_Q and 𝑮𝑮\boldsymbol{G}bold_italic_G are known matrices, 𝒖subscript𝒖\boldsymbol{u}_{*}bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is the linearisation location and 𝒎=𝖤p¯(𝒖|𝒚,𝜽)𝒎subscript𝖤¯𝑝conditional𝒖𝒚𝜽\boldsymbol{m}=\mathsf{E}_{\overline{p}}(\boldsymbol{u}|\boldsymbol{y},% \boldsymbol{\theta})bold_italic_m = sansserif_E start_POSTSUBSCRIPT over¯ start_ARG italic_p end_ARG end_POSTSUBSCRIPT ( bold_italic_u | bold_italic_y , bold_italic_θ ).

Using the relationships between 𝑸~𝜽subscript~𝑸𝜽\widetilde{\boldsymbol{Q}}_{\boldsymbol{\theta}}over~ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, 𝒎~𝜽subscript~𝒎𝜽\widetilde{\boldsymbol{m}}_{\boldsymbol{\theta}}over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, 𝑸¯𝜽subscript¯𝑸𝜽\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, 𝒎¯𝜽subscript¯𝒎𝜽\overline{\boldsymbol{m}}_{\boldsymbol{\theta}}over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, and 𝒖subscript𝒖\boldsymbol{u}_{*}bold_italic_u start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, we can rewrite this as

𝖣𝖪𝖫(p¯p~)\displaystyle\mathsf{D}_{\mathsf{KL}}\left(\overline{p}\,\middle\|\,\widetilde% {p}\right)sansserif_D start_POSTSUBSCRIPT sansserif_KL end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ∥ over~ start_ARG italic_p end_ARG ) 12[lndet(𝑸¯𝜽)lndet(𝑸¯𝜽𝑮)tr(𝑮𝑸¯𝜽1)+(𝒎~𝜽𝒎¯𝜽)(𝑸¯𝜽𝑮)(𝒎~𝜽𝒎¯𝜽)].absent12delimited-[]subscript¯𝑸𝜽subscript¯𝑸𝜽𝑮tr𝑮subscriptsuperscript¯𝑸1𝜽superscriptsubscript~𝒎𝜽subscript¯𝒎𝜽topsubscript¯𝑸𝜽𝑮subscript~𝒎𝜽subscript¯𝒎𝜽\displaystyle\approx\frac{1}{2}\left[\ln\det(\overline{\boldsymbol{Q}}_{% \boldsymbol{\theta}})-\ln\det(\overline{\boldsymbol{Q}}_{\boldsymbol{\theta}}-% \boldsymbol{G})-\operatorname*{tr}\left(\boldsymbol{G}\overline{\boldsymbol{Q}% }^{-1}_{\boldsymbol{\theta}}\right)+(\widetilde{\boldsymbol{m}}_{\boldsymbol{% \theta}}-\overline{\boldsymbol{m}}_{\boldsymbol{\theta}})^{\top}(\overline{% \boldsymbol{Q}}_{\boldsymbol{\theta}}-\boldsymbol{G})(\widetilde{\boldsymbol{m% }}_{\boldsymbol{\theta}}-\overline{\boldsymbol{m}}_{\boldsymbol{\theta}})% \right].≈ divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) - roman_ln roman_det ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G ) - roman_tr ( bold_italic_G over¯ start_ARG bold_italic_Q end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) + ( over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_Q end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - bold_italic_G ) ( over~ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT - over¯ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ] .

Appendix G Iterative inlabru verbose output for Example 5.3

The amount of information printed by the bru() function can be set by the bru_verbose option, which has 4 levels of increasing verbosity. This information can also be accessed from the output object at a later time, as shown below, including higher verbosity levels.

R> fit_3 <- bru(+       components = cmp,+       z_lik,+       count_lik,+       options = list(+         bru_verbose = 1,+       )+     )R> print(bru_log(fit_3, verbosity = 2), timestamp = FALSE, verbosity = TRUE)

iinla: Iteration 1 [max:10] (level 1)iinla: Iteration 2 [max:10] (level 1)iinla: Step rescaling: 99.5% (norm0 = 94.74, norm1 = 5.154, norm01 = 95.16) (level 2)iinla: Max deviation from previous: 1500% of SD, and line search is active       [stop if: <10% and line search inactive] (level 1)iinla: Iteration 3 [max:10] (level 1)iinla: Step rescaling: 97.6% (norm0 = 4.934, norm1 = 0.1598, norm01 = 4.941) (level 2)iinla: Max deviation from previous: 108% of SD, and line search is active       [stop if: <10% and line search inactive] (level 1)iinla: Iteration 4 [max:10] (level 1)iinla: Step rescaling: 101% (norm0 = 2.613, norm1 = 0.03321, norm01 = 2.614) (level 2)iinla: Max deviation from previous: 81.7% of SD, and line search is active       [stop if: <10% and line search inactive] (level 1)iinla: Iteration 5 [max:10] (level 1)iinla: Step rescaling: 99.8% (norm0 = 0.2526, norm1 = 0.0007458, norm01 = 0.2526) (level 2)iinla: Max deviation from previous: 5.74% of SD, and line search is active       [stop if: <10% and line search inactive] (level 1)iinla: Iteration 6 [max:10] (level 1)iinla: Max deviation from previous: 4.28% of SD, and line search is inactive       [stop if: <10% and line search inactive] (level 1)iinla: Convergence criterion met.       Running final INLA integration step with known theta mode. (level 1)iinla: Iteration 7 [max:10] (level 1)