Search | arXiv e-print repository

Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation

Abstract: We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sam… ▽ More We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sampling sequentially from $m$-dimensional TMVN distributions, where $m$ is a tuning parameter controlling the fidelity. This allows us to overcome the existing methods' crucial problem of rapidly decreasing acceptance rates for increasing dimension. Throughout our experiments with up to tens of thousands of dimensions, we can produce high-fidelity samples with $m$ in the dozens, achieving superior scalability compared to existing state-of-the-art methods. We study a tetrachloroethylene concentration dataset that has $3{,}971$ observed responses and $20{,}730$ undetected responses, together modeled as a partially censored Gaussian process, where our method enables posterior inference for the censored responses through sampling a $20{,}730$-dimensional TMVN distribution. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2311.09426 [pdf, other]

Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities

Authors: Jian Cao, Matthias Katzfuss

Abstract: Multivariate normal (MVN) probabilities arise in myriad applications, but they are analytically intractable and need to be evaluated via Monte-Carlo-based numerical integration. For the state-of-the-art minimax exponential tilting (MET) method, we show that the complexity of each of its components can be greatly reduced through an integrand parameterization that utilizes the sparse inverse Cholesk… ▽ More Multivariate normal (MVN) probabilities arise in myriad applications, but they are analytically intractable and need to be evaluated via Monte-Carlo-based numerical integration. For the state-of-the-art minimax exponential tilting (MET) method, we show that the complexity of each of its components can be greatly reduced through an integrand parameterization that utilizes the sparse inverse Cholesky factor produced by the Vecchia approximation, whose approximation error is often negligible relative to the Monte-Carlo error. Based on this idea, we derive algorithms that can estimate MVN probabilities and sample from truncated MVN distributions in linear time (and that are easily parallelizable) at the same convergence or acceptance rate as MET, whose complexity is cubic in the dimension of the MVN probability. We showcase the advantages of our methods relative to existing approaches using several simulated examples. We also analyze a groundwater-contamination dataset with over twenty thousand censored measurements to demonstrate the scalability of our method for partially censored Gaussian-process models. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.06220 [pdf, other]

doi 10.1007/s13253-023-00580-z

Bayesian nonparametric generative modeling of large multivariate non-Gaussian spatial fields

Authors: Paul F. V. Wiemann, Matthias Katzfuss

Abstract: Multivariate spatial fields are of interest in many applications, including climate model emulation. Not only can the marginal spatial fields be subject to nonstationarity, but the dependence structure among the marginal fields and between the fields might also differ substantially. Extending a recently proposed Bayesian approach to describe the distribution of a nonstationary univariate spatial f… ▽ More Multivariate spatial fields are of interest in many applications, including climate model emulation. Not only can the marginal spatial fields be subject to nonstationarity, but the dependence structure among the marginal fields and between the fields might also differ substantially. Extending a recently proposed Bayesian approach to describe the distribution of a nonstationary univariate spatial field using a triangular transport map, we cast the inference problem for a multivariate spatial field for a small number of replicates into a series of independent Gaussian process (GP) regression tasks with Gaussian errors. Due to the potential nonlinearity in the conditional means, the joint distribution modeled can be non-Gaussian. The resulting nonparametric Bayesian methodology scales well to high-dimensional spatial fields. It is especially useful when only a few training samples are available, because it employs regularization priors and quantifies uncertainty. Inference is conducted in an empirical Bayes setting by a highly scalable stochastic gradient approach. The implementation benefits from mini-batching and could be accelerated with parallel computing. We illustrate the extended transport-map model by studying hydrological variables from non-Gaussian climate-model output. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2307.11648 [pdf, other]

Sparse Cholesky factorization by greedy conditional selection

Authors: Stephen Huan, Joseph Guinness, Matthias Katzfuss, Houman Owhadi, Florian Schäfer

Abstract: Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct… ▽ More Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct the sparsity pattern. In this work, we instead construct the sparsity pattern by leveraging a greedy selection algorithm that maximizes mutual information with target points, conditional on all points previously selected. For selecting $k$ points out of $N$, the naive time complexity is $\mathcal{O}(N k^4)$, but by maintaining a partial Cholesky factor we reduce this to $\mathcal{O}(N k^2)$. Furthermore, for multiple ($m$) targets we achieve a time complexity of $\mathcal{O}(N k^2 + N m^2 + m^3)$, which is maintained in the setting of aggregated Cholesky factorization where a selected point need not condition every target. We apply the selection algorithm to image classification and recovery of sparse Cholesky factors. By minimizing Kullback-Leibler divergence, we apply the algorithm to Cholesky factorization, Gaussian process regression, and preconditioning with the conjugate gradient, improving over $k$-nearest neighbors selection. △ Less

Submitted 21 July, 2023; originally announced July 2023.

MSC Class: 65F08; 65F55; 62-08

arXiv:2305.17063 [pdf, other]

Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

Authors: Felix Jimenez, Matthias Katzfuss

Abstract: For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning. We propose to synergistically combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN. GP scalability is achieved via Vecchia approximations that exploit nearest… ▽ More For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning. We propose to synergistically combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN. GP scalability is achieved via Vecchia approximations that exploit nearest-neighbor conditional independence. The resulting deep Vecchia ensemble not only imbues the DNN with uncertainty quantification but can also provide more accurate and robust predictions. We demonstrate the utility of our model on several datasets and carry out experiments to understand the inner workings of the proposed method. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 16 pages, 7 figures

arXiv:2301.13303 [pdf, other]

Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

Authors: Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Schafer, Matthias Katzfuss

Abstract: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on… ▽ More To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on a particular SIC ordering and nearest-neighbor-based sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate for stationary kernels than alternative approaches such as inducing-point and mean-field approximations at similar computational complexity. △ Less

Submitted 26 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: Accepted at the 2023 International Conference on Machine Learning (ICML). 18 pages with references and appendices, 14 figures

arXiv:2208.07431 [pdf, other]

Locally anisotropic covariance functions on the sphere

Authors: Jian Cao, **gjie Zhang, Zhuoer Sun, Matthias Katzfuss

Abstract: Rapid developments in satellite remote-sensing technology have enabled the collection of geospatial data on a global scale, hence increasing the need for covariance functions that can capture spatial dependence on spherical domains. We propose a general method of constructing nonstationary, locally anisotropic covariance functions on the sphere based on covariance functions in R^3. We also provide… ▽ More Rapid developments in satellite remote-sensing technology have enabled the collection of geospatial data on a global scale, hence increasing the need for covariance functions that can capture spatial dependence on spherical domains. We propose a general method of constructing nonstationary, locally anisotropic covariance functions on the sphere based on covariance functions in R^3. We also provide theorems that specify the conditions under which the resulting correlation function is isotropic or axially symmetric. For large datasets on the sphere commonly seen in modern applications, the Vecchia approximation is used to achieve higher scalability on statistical inference. The importance of flexible covariance structures is demonstrated numerically using simulated data and a precipitation dataset. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2207.09384 [pdf, other]

Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition

Authors: Marcin Jurek, Matthias Katzfuss

Abstract: We propose an approximation to the forward-filter-backward-sampler (FFBS) algorithm for large-scale spatio-temporal smoothing. FFBS is commonly used in Bayesian statistics when working with linear Gaussian state-space models, but it requires inverting covariance matrices which have the size of the latent state vector. The computational burden associated with this operation effectively prohibits it… ▽ More We propose an approximation to the forward-filter-backward-sampler (FFBS) algorithm for large-scale spatio-temporal smoothing. FFBS is commonly used in Bayesian statistics when working with linear Gaussian state-space models, but it requires inverting covariance matrices which have the size of the latent state vector. The computational burden associated with this operation effectively prohibits its applications in high-dimensional settings. We propose a scalable spatio-temporal FFBS approach based on the hierarchical Vecchia approximation of Gaussian processes, which has been previously successfully used in spatial statistics. On simulated and real data, our approach outperformed a low-rank FFBS approximation. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2203.01459 [pdf, other]

Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes

Authors: Felix Jimenez, Matthias Katzfuss

Abstract: Bayesian optimization is a technique for optimizing black-box target functions. At the core of Bayesian optimization is a surrogate model that predicts the output of the target function at previously unseen inputs to facilitate the selection of promising input values. Gaussian processes (GPs) are commonly used as surrogate models but are known to scale poorly with the number of observations. We ad… ▽ More Bayesian optimization is a technique for optimizing black-box target functions. At the core of Bayesian optimization is a surrogate model that predicts the output of the target function at previously unseen inputs to facilitate the selection of promising input values. Gaussian processes (GPs) are commonly used as surrogate models but are known to scale poorly with the number of observations. We adapt the Vecchia approximation, a popular GP approximation from spatial statistics, to enable scalable high-dimensional Bayesian optimization. We develop several improvements and extensions, including training warped GPs using mini-batch gradient descent, approximate neighbor search, and selecting multiple input values in parallel. We focus on the use of our warped Vecchia GP in trust-region Bayesian optimization via Thompson sampling. On several test functions and on two reinforcement-learning problems, our methods compared favorably to the state of the art. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.12981 [pdf, other]

Scalable Gaussian-process regression and variable selection using Vecchia approximations

Authors: Jian Cao, Joseph Guinness, Marc G. Genton, Matthias Katzfuss

Abstract: Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the… ▽ More Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods. △ Less

Submitted 10 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: 30 pages, 9 figures

arXiv:2112.14591 [pdf, other]

doi 10.1007/s11222-023-10231-5

Correlation-based sparse inverse Cholesky factorization for fast Gaussian-process inference

Authors: Myeongjong Kang, Matthias Katzfuss

Abstract: Gaussian processes are widely used as priors for unknown functions in statistics and machine learning. To achieve computationally feasible inference for large datasets, a popular approach is the Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. The ordering and sparsity pattern are typically determ… ▽ More Gaussian processes are widely used as priors for unknown functions in statistics and machine learning. To achieve computationally feasible inference for large datasets, a popular approach is the Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. The ordering and sparsity pattern are typically determined based on Euclidean distance of the inputs or locations corresponding to the data points. Here, we propose instead to use a correlation-based distance metric, which implicitly applies the Vecchia approximation in a suitable transformed input space. The correlation-based algorithm can be carried out in quasilinear time in the size of the dataset, and so it can be applied even for iterative inference on unknown parameters in the correlation structure. The correlation-based approach has two advantages for complex settings: It can result in more accurate approximations, and it offers a simple, automatic strategy that can be applied to any covariance, even when Euclidean distance is not applicable. We demonstrate these advantages in several settings, including anisotropic, nonstationary, multivariate, and spatio-temporal processes. We also illustrate our method on multivariate spatio-temporal temperature fields produced by a regional climate model. △ Less

Submitted 7 April, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 25 pages, 11 figures

Journal ref: Statistics and Computing, 33(3), 56 (2023)

arXiv:2111.13428 [pdf, other]

Nonstationary Spatial Modeling of Massive Global Satellite Data

Authors: Huang Huang, Lewis R. Blake, Matthias Katzfuss, Dorit M. Hammerling

Abstract: Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete,… ▽ More Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2110.07062 [pdf, other]

Ordered conditional approximation of Potts models

Authors: Anirban Chakraborty, Matthias Katzfuss, Joseph Guinness

Abstract: Potts models, which can be used to analyze dependent observations on a lattice, have seen widespread application in a variety of areas, including statistical mechanics, neuroscience, and quantum computing. To address the intractability of Potts likelihoods for large spatial fields, we propose fast ordered conditional approximations that enable rapid inference for observed and hidden Potts models.… ▽ More Potts models, which can be used to analyze dependent observations on a lattice, have seen widespread application in a variety of areas, including statistical mechanics, neuroscience, and quantum computing. To address the intractability of Potts likelihoods for large spatial fields, we propose fast ordered conditional approximations that enable rapid inference for observed and hidden Potts models. Our methods can be used to directly obtain samples from the approximate joint distribution of an entire Potts field. The computational complexity of our approximation methods is linear in the number of spatial locations; in addition, some of the necessary computations are naturally parallel. We illustrate the advantages of our approach using simulated data and a satellite image. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2108.04211 [pdf, other]

Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields

Authors: Matthias Katzfuss, Florian Schäfer

Abstract: A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while still resulting in a closed-form and invertible poster… ▽ More A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while still resulting in a closed-form and invertible posterior map. We then focus on inferring the distribution of a nonstationary spatial field from a small number of replicates. We develop specific transport-map priors that are highly flexible and are motivated by the behavior of a large class of stochastic processes. Our approach is scalable to high-dimensional distributions due to data-dependent sparsity and parallel computations. We also discuss extensions, including Dirichlet process mixtures for flexible marginals. We present numerical results to demonstrate the accuracy, scalability, and usefulness of our methods, including statistical emulation of non-Gaussian climate-model output. △ Less

Submitted 16 January, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: code available at https://github.com/katzfuss-group/BaTraMaSpa

arXiv:2012.05967 [pdf, other]

Bayesian nonstationary and nonparametric covariance estimation for large spatial data

Authors: Brian Kidd, Matthias Katzfuss

Abstract: In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the numb… ▽ More In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2006.16901 [pdf, other]

Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering

Authors: Marcin Jurek, Matthias Katzfuss

Abstract: Spatial statistics often involves Cholesky decomposition of covariance matrices. To ensure scalability to high dimensions, several recent approximations have assumed a sparse Cholesky factor of the precision matrix. We propose a hierarchical Vecchia approximation, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix. This… ▽ More Spatial statistics often involves Cholesky decomposition of covariance matrices. To ensure scalability to high dimensions, several recent approximations have assumed a sparse Cholesky factor of the precision matrix. We propose a hierarchical Vecchia approximation, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix. This remarkable property is crucial for applications to high-dimensional spatio-temporal filtering. We present a fast and simple algorithm to compute our hierarchical Vecchia approximation, and we provide extensions to non-linear data assimilation with non-Gaussian data based on the Laplace approximation. In several numerical comparisons, including a filtering analysis of satellite data, our methods strongly outperformed alternative approaches. △ Less

Submitted 23 September, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

arXiv:2005.09210 [pdf, other]

Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide

Authors: Kyle P Messier, Matthias Katzfuss

Abstract: Nitrogen dioxide (NO$_2$) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO$_2$ is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regre… ▽ More Nitrogen dioxide (NO$_2$) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO$_2$ is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regression (LUR). We develop a scalable approach for simultaneous variable selection and estimation of LUR models with spatiotemporally correlated errors, by combining a general-Vecchia Gaussian process approximation with a penalty on the LUR coefficients. In comparisons to existing methods using simulated data, our approach resulted in higher model-selection specificity and sensitivity and in better prediction in terms of calibration and sharpness, for a wide range of relevant settings. In our spatiotemporal analysis of daily, US-wide, ground-level NO$_2$ data, our approach was more accurate, and produced a sparser and more interpretable model. Our daily predictions elucidate spatiotemporal patterns of NO$_2$ concentrations across the United States, including significant variations between cities and intra-urban variation. Thus, our predictions will be useful for epidemiological and risk-assessment studies seeking daily, national-scale predictions, and they can be used in acute-outcome health-risk assessments. △ Less

Submitted 16 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:2005.00386 [pdf, other]

Scaled Vecchia approximation for fast computer-model emulation

Authors: Matthias Katzfuss, Joseph Guinness, Earl Lawrence

Abstract: Many scientific phenomena are studied using computer experiments consisting of multiple runs of a computer model while varying the input settings. Gaussian processes (GPs) are a popular tool for the analysis of computer experiments, enabling interpolation between input settings, but direct GP inference is computationally infeasible for large datasets. We adapt and extend a powerful class of GP met… ▽ More Many scientific phenomena are studied using computer experiments consisting of multiple runs of a computer model while varying the input settings. Gaussian processes (GPs) are a popular tool for the analysis of computer experiments, enabling interpolation between input settings, but direct GP inference is computationally infeasible for large datasets. We adapt and extend a powerful class of GP methods from spatial statistics to enable the scalable analysis and emulation of large computer experiments. Specifically, we apply Vecchia's ordered conditional approximation in a transformed input space, with each input scaled according to how strongly it relates to the computer-model response. The scaling is learned from the data, by estimating parameters in the GP covariance function using Fisher scoring. Our methods are highly scalable, enabling estimation, joint prediction and simulation in near-linear time in the number of model runs. In several numerical examples, our approach substantially outperformed existing methods. △ Less

Submitted 20 July, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: R code available at https://github.com/katzfuss-group/scaledVecchia

arXiv:2004.14455 [pdf, other]

Sparse Cholesky factorization by Kullback-Leibler minimization

Authors: Florian Schäfer, Matthias Katzfuss, Houman Owhadi

Abstract: We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Θ$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $\mathcal{N}(0, Θ)$ and $\mathcal{N}(0, L^{-\top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia ap… ▽ More We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Θ$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $\mathcal{N}(0, Θ)$ and $\mathcal{N}(0, L^{-\top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of $Θ$ obtained from pairwise evaluation of Green's functions of elliptic boundary-value problems at points $\{x_{i}\}_{1 \leq i \leq N} \subset \mathbb{R}^{d}$, we propose an elimination ordering and sparsity pattern that allows us to compute $ε$-approximate inverse Cholesky factors of such $Θ$ in computational complexity $\mathcal{O}(N \log(N/ε)^d)$ in space and $\mathcal{O}(N \log(N/ε)^{2d})$ in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in $N$) space complexity. Motivated by the optimality properties of our methods, we propose methods for applying it to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, sacrificing neither accuracy nor computational complexity. △ Less

Submitted 22 October, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: The code used to run the numerical experiments can be found under https://github.com/f-t-s/cholesky_by_KL_minimization. Appeared in SIAM Journal on Scientific Computing

arXiv:1906.07828 [pdf, other]

doi 10.1016/j.csda.2020.107081

Vecchia-Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data

Authors: Daniel Zilber, Matthias Katzfuss

Abstract: Generalized Gaussian processes (GGPs) are highly flexible models that combine latent GPs with potentially non-Gaussian likelihoods from the exponential family. GGPs can be used in a variety of settings, including GP classification, nonparametric count regression, modeling non-Gaussian spatial data, and analyzing point patterns. However, inference for GGPs can be analytically intractable, and large… ▽ More Generalized Gaussian processes (GGPs) are highly flexible models that combine latent GPs with potentially non-Gaussian likelihoods from the exponential family. GGPs can be used in a variety of settings, including GP classification, nonparametric count regression, modeling non-Gaussian spatial data, and analyzing point patterns. However, inference for GGPs can be analytically intractable, and large datasets pose computational challenges due to the inversion of the GP covariance matrix. We propose a Vecchia-Laplace approximation for GGPs, which combines a Laplace approximation to the non-Gaussian likelihood with a computationally efficient Vecchia approximation to the GP, resulting in a simple, general, scalable, and accurate methodology. We provide numerical studies and comparisons on simulated and real spatial data. Our methods are implemented in a freely available R package. △ Less

Submitted 4 June, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: 26 pages, 10 figures, code available at https://github.com/katzfuss-group/GPvecchia-Laplace

Journal ref: Computational Statistics & Data Analysis (2021), 153, 107081

arXiv:1810.04200 [pdf, other]

Multi-resolution filters for massive spatio-temporal data

Authors: Marcin Jurek, Matthias Katzfuss

Abstract: Spatio-temporal data sets are rapidly growing in size. For example, environmental variables are measured with ever-higher resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with proper uncertainty quantification. We focus her… ▽ More Spatio-temporal data sets are rapidly growing in size. For example, environmental variables are measured with ever-higher resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with proper uncertainty quantification. We focus here on real-time filtering inference in linear Gaussian state-space models. At each time point, the state is a spatial field evaluated on a very large spatial grid, making exact inference using the Kalman filter computationally infeasible. Instead, we propose a multi-resolution filter (MRF), a highly scalable and fully probabilistic filtering method that resolves spatial features at all scales. We prove that the MRF matrices exhibit a particular block-sparse multi-resolution structure that is preserved under filtering operations through time. We also discuss inference on time-varying parameters using an approximate Rao-Blackwellized particle filter, in which the integrated likelihood is computed using the MRF. We compare the MRF to existing approaches in a simulation study and a real satellite-data application. △ Less

Submitted 13 November, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

arXiv:1810.03576 [pdf, other]

doi 10.1080/01621459.2019.1665526

Fine-scale spatiotemporal air pollution analysis using mobile monitors on Google Street View vehicles

Authors: Yawen Guan, Margaret Johnson, Matthias Katzfuss, Elizabeth Mannshardt, Kyle P Messier, Brian J Reich, Joon ** Song

Abstract: People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. In order to make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift… ▽ More People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. In order to make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally-sized fixed-location network. This modeling framework has important real-world implications in understanding citizens' personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies. △ Less

Submitted 5 September, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

Comments: This manuscript has been approved for public access. Please put this version online. Previously, this version was removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

Journal ref: Journal of the American Statistical Association (2020), 115(531), 1111-1124

arXiv:1805.03309 [pdf, other]

doi 10.1007/s13253-020-00401-7

Vecchia approximations of Gaussian-process predictions

Authors: Matthias Katzfuss, Joseph Guinness, Wenlong Gong, Daniel Zilber

Abstract: Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets. Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference. Here, we study Vecchia approximations of spatial predictions at observed and unobse… ▽ More Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets. Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference. Here, we study Vecchia approximations of spatial predictions at observed and unobserved locations, including obtaining joint predictive distributions at large sets of locations. We consider a general Vecchia framework for GP predictions, which contains some novel and some existing special cases. We study the accuracy and computational properties of these approaches theoretically and numerically, proving that our new methods exhibit linear computational complexity in the total number of spatial locations. We show that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings. We also apply our methods to a satellite dataset of chlorophyll fluorescence, showing that the new methods are faster or more accurate than existing methods, and reduce unrealistic artifacts in prediction maps. △ Less

Submitted 14 May, 2020; v1 submitted 8 May, 2018; originally announced May 2018.

Journal ref: Journal of Agricultural, Biological, and Environmental Statistics, 25(3), 383-414 (2020)

arXiv:1710.08976 [pdf, other]

doi 10.1007/s13253-020-00401-7

A class of multi-resolution approximations for large spatial datasets

Authors: Matthias Katzfuss, Wenlong Gong

Abstract: Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution… ▽ More Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution-approximation framework, a taper version and a domain-partitioning (block) version. We describe theoretical properties and inference procedures, and study the computational complexity of the methods. Numerical comparisons and an application to satellite data are also provided. △ Less

Submitted 20 July, 2018; v1 submitted 24 October, 2017; originally announced October 2017.

Journal ref: Statistica Sinica, 30(4), 2203-2226 (2020)

arXiv:1710.05013 [pdf, other]

A Case Study Competition Among Methods for Analyzing Large Spatial Data

Authors: Matthew J. Heaton, Abhirup Datta, Andrew Finley, Reinhard Furrer, Rajarshi Guhaniyogi, Florian Gerber, Robert B. Gramacy, Dorit Hammerling, Matthias Katzfuss, Finn Lindgren, Douglas W. Nychka, Furong Sun, Andrew Zammit-Mangion

Abstract: The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structu… ▽ More The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. △ Less

Submitted 25 April, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

arXiv:1708.06302 [pdf, other]

doi 10.1214/19-STS755

A general framework for Vecchia approximations of Gaussian processes

Authors: Matthias Katzfuss, Joseph Guinness

Abstract: Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many… ▽ More Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large spatial datasets but can lead to considerable improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations in spatial statistics. △ Less

Submitted 17 August, 2019; v1 submitted 21 August, 2017; originally announced August 2017.

Journal ref: Statistical Science, 36(1), 124-141 (2021)

arXiv:1704.06988 [pdf, other]

doi 10.1080/01621459.2019.1592753

Ensemble Kalman methods for high-dimensional hierarchical dynamic space-time models

Authors: Matthias Katzfuss, Jonathan R. Stroud, Christopher K. Wikle

Abstract: We propose a new class of filtering and smoothing methods for inference in high-dimensional, nonlinear, non-Gaussian, spatio-temporal state-space models. The main idea is to combine the ensemble Kalman filter and smoother, developed in the geophysics literature, with state-space algorithms from the statistics literature. Our algorithms address a variety of estimation scenarios, including on-line a… ▽ More We propose a new class of filtering and smoothing methods for inference in high-dimensional, nonlinear, non-Gaussian, spatio-temporal state-space models. The main idea is to combine the ensemble Kalman filter and smoother, developed in the geophysics literature, with state-space algorithms from the statistics literature. Our algorithms address a variety of estimation scenarios, including on-line and off-line state and parameter estimation. We take a Bayesian perspective, for which the goal is to generate samples from the joint posterior distribution of states and parameters. The key benefit of our approach is the use of ensemble Kalman methods for dimension reduction, which allows inference for high-dimensional state vectors. We compare our methods to existing ones, including ensemble Kalman filters, particle filters, and particle MCMC. Using a real data example of cloud motion and data simulated under a number of nonlinear and non-Gaussian scenarios, we show that our approaches outperform these existing methods. △ Less

Submitted 8 August, 2018; v1 submitted 23 April, 2017; originally announced April 2017.

Journal ref: Journal of the American Statistical Association, Theory & Methods (2019+)

arXiv:1611.03835 [pdf, other]

A Bayesian adaptive ensemble Kalman filter for sequential state and parameter estimation

Authors: Jonathan R. Stroud, Matthias Katzfuss, Christopher K. Wikle

Abstract: This paper proposes new methodology for sequential state and parameter estimation within the ensemble Kalman filter. The method is fully Bayesian and propagates the joint posterior density of states and parameters over time. In order to implement the method we consider two representations of the marginal posterior distribution of the parameters: a grid-based approach and a Gaussian approximation.… ▽ More This paper proposes new methodology for sequential state and parameter estimation within the ensemble Kalman filter. The method is fully Bayesian and propagates the joint posterior density of states and parameters over time. In order to implement the method we consider two representations of the marginal posterior distribution of the parameters: a grid-based approach and a Gaussian approximation. Contrary to existing algorithms, the new method explicitly accounts for parameter uncertainty and provides a formal way to combine information about the parameters from data at different time periods. The method is illustrated and compared to existing approaches using simulated and real data. △ Less

Submitted 11 November, 2016; originally announced November 2016.

Comments: 19 pages

arXiv:1507.04789 [pdf, other]

doi 10.1080/01621459.2015.1123632

A multi-resolution approximation for massive spatial datasets

Authors: Matthias Katzfuss

Abstract: Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big da… ▽ More Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method. △ Less

Submitted 7 December, 2015; v1 submitted 16 July, 2015; originally announced July 2015.

Comments: 23 pages; to be published in Journal of the American Statistical Association

arXiv:1506.01917 [pdf, other]

Interpretation of point forecasts with unkown directive

Authors: Patrick Schmidt, Matthias Katzfuß, Tilmann Gneiting

Abstract: Point forecasts can be interpreted as functionals (i.e., point summaries) of predictive distributions. We consider the situation where forecasters' directives are hidden and develop methodology for the identification of the unknown functional based on time series data of point forecasts and associated realizations. Focusing on the natural cases of state-dependent quantiles and expectiles, we provi… ▽ More Point forecasts can be interpreted as functionals (i.e., point summaries) of predictive distributions. We consider the situation where forecasters' directives are hidden and develop methodology for the identification of the unknown functional based on time series data of point forecasts and associated realizations. Focusing on the natural cases of state-dependent quantiles and expectiles, we provide a generalized method of moments estimator for the functional, along with tests of optimality relative to information sets that are specified by instrumental variables. Using simulation, we demonstrate that our optimality test is better calibrated and more powerful than existing solutions. In empirical examples, Greenbook gross domestic product (GDP) forecasts of the US Federal Reserve and model output for precipitation from the European Centre for Medium-Range Weather Forecasts (ECMWF) are indicative of overstatement in anticipation of extreme events. △ Less

Submitted 18 February, 2019; v1 submitted 5 June, 2015; originally announced June 2015.

arXiv:1410.4827 [pdf, other]

BADER: Bayesian analysis of differential expression in RNA sequencing data

Authors: Matthias Katzfuss, Andreas Neudecker, Simon Anders, Julien Gagneur

Abstract: Identifying differentially expressed genes from RNA sequencing data remains a challenging task because of the considerable uncertainties in parameter estimation and the small sample sizes in typical applications. Here we introduce Bayesian Analysis of Differential Expression in RNA-sequencing data (BADER). Due to our choice of data and prior distributions, full posterior inference for BADER can be… ▽ More Identifying differentially expressed genes from RNA sequencing data remains a challenging task because of the considerable uncertainties in parameter estimation and the small sample sizes in typical applications. Here we introduce Bayesian Analysis of Differential Expression in RNA-sequencing data (BADER). Due to our choice of data and prior distributions, full posterior inference for BADER can be carried out efficiently. The method appropriately takes uncertainty in gene variance into account, leading to higher power than existing methods in detecting differentially expressed genes. Moreover, we show that the posterior samples can be naturally integrated into downstream gene set enrichment analyses, with excellent performance in detecting enriched sets. An open-source R package (BADER) that provides a user-friendly interface to a C++ back-end is available on Bioconductor. △ Less

Submitted 7 November, 2014; v1 submitted 17 October, 2014; originally announced October 2014.

Comments: 14 pages, 3 figures, 1 table

arXiv:1402.1472 [pdf, other]

doi 10.1007/s11222-016-9627-4

Parallel inference for massive distributed spatial data using low-rank models

Authors: Matthias Katzfuss, Dorit Hammerling

Abstract: Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require proc… ▽ More Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require processing all data at a single central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The communication cost of our distributed algorithms does not depend on the number of data points. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems. △ Less

Submitted 5 February, 2016; v1 submitted 6 February, 2014; originally announced February 2014.

Comments: 20 pages; published in Statistics and Computing

arXiv:1204.2098 [pdf, other]

doi 10.1002/env.2200

Bayesian Nonstationary Spatial Modeling for Very Large Datasets

Authors: Matthias Katzfuss

Abstract: With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: First, traditional spatial-statistical techniques are o… ▽ More With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: First, traditional spatial-statistical techniques are often unable to handle large numbers of observations in a computationally feasible way. Second, for large and heterogeneous spatial domains, it is often not appropriate to assume that a process of interest is stationary over the entire domain. We address the first challenge by using a model combining a low-rank component, which allows for flexible modeling of medium-to-long-range dependence via a set of spatial basis functions, with a tapered remainder component, which allows for modeling of local dependence using a compactly supported covariance function. Addressing the second challenge, we propose two extensions to this model that result in increased flexibility: First, the model is parameterized based on a nonstationary Matern covariance, where the parameters vary smoothly across space. Second, in our fully Bayesian model, all components and parameters are considered random, including the number, locations, and shapes of the basis functions used in the low-rank component. Using simulated data and a real-world dataset of high-resolution soil measurements, we show that both extensions can result in substantial improvements over the current state-of-the-art. △ Less

Submitted 21 December, 2012; v1 submitted 10 April, 2012; originally announced April 2012.

Comments: 16 pages, 2 color figures

Journal ref: Environmetrics 24 (2013) 189-200

Showing 1–33 of 33 results for author: Katzfuss, M