Skip to main content

Showing 1–33 of 33 results for author: Katzfuss, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.17307  [pdf, other

    stat.CO

    Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation

    Authors: Jian Cao, Matthias Katzfuss

    Abstract: We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sam… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2311.09426  [pdf, other

    stat.CO

    Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities

    Authors: Jian Cao, Matthias Katzfuss

    Abstract: Multivariate normal (MVN) probabilities arise in myriad applications, but they are analytically intractable and need to be evaluated via Monte-Carlo-based numerical integration. For the state-of-the-art minimax exponential tilting (MET) method, we show that the complexity of each of its components can be greatly reduced through an integrand parameterization that utilizes the sparse inverse Cholesk… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  3. Bayesian nonparametric generative modeling of large multivariate non-Gaussian spatial fields

    Authors: Paul F. V. Wiemann, Matthias Katzfuss

    Abstract: Multivariate spatial fields are of interest in many applications, including climate model emulation. Not only can the marginal spatial fields be subject to nonstationarity, but the dependence structure among the marginal fields and between the fields might also differ substantially. Extending a recently proposed Bayesian approach to describe the distribution of a nonstationary univariate spatial f… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  4. arXiv:2307.11648  [pdf, other

    stat.CO math.NA

    Sparse Cholesky factorization by greedy conditional selection

    Authors: Stephen Huan, Joseph Guinness, Matthias Katzfuss, Houman Owhadi, Florian Schäfer

    Abstract: Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    MSC Class: 65F08; 65F55; 62-08

  5. arXiv:2305.17063  [pdf, other

    stat.ML cs.LG

    Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

    Authors: Felix Jimenez, Matthias Katzfuss

    Abstract: For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning. We propose to synergistically combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN. GP scalability is achieved via Vecchia approximations that exploit nearest… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 16 pages, 7 figures

  6. arXiv:2301.13303  [pdf, other

    stat.ML cs.LG stat.CO

    Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

    Authors: Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Schafer, Matthias Katzfuss

    Abstract: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on… ▽ More

    Submitted 26 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted at the 2023 International Conference on Machine Learning (ICML). 18 pages with references and appendices, 14 figures

  7. arXiv:2208.07431  [pdf, other

    stat.ME

    Locally anisotropic covariance functions on the sphere

    Authors: Jian Cao, **gjie Zhang, Zhuoer Sun, Matthias Katzfuss

    Abstract: Rapid developments in satellite remote-sensing technology have enabled the collection of geospatial data on a global scale, hence increasing the need for covariance functions that can capture spatial dependence on spherical domains. We propose a general method of constructing nonstationary, locally anisotropic covariance functions on the sphere based on covariance functions in R^3. We also provide… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

  8. arXiv:2207.09384  [pdf, other

    stat.ME stat.CO

    Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition

    Authors: Marcin Jurek, Matthias Katzfuss

    Abstract: We propose an approximation to the forward-filter-backward-sampler (FFBS) algorithm for large-scale spatio-temporal smoothing. FFBS is commonly used in Bayesian statistics when working with linear Gaussian state-space models, but it requires inverting covariance matrices which have the size of the latent state vector. The computational burden associated with this operation effectively prohibits it… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  9. arXiv:2203.01459  [pdf, other

    cs.LG stat.ML

    Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes

    Authors: Felix Jimenez, Matthias Katzfuss

    Abstract: Bayesian optimization is a technique for optimizing black-box target functions. At the core of Bayesian optimization is a surrogate model that predicts the output of the target function at previously unseen inputs to facilitate the selection of promising input values. Gaussian processes (GPs) are commonly used as surrogate models but are known to scale poorly with the number of observations. We ad… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  10. arXiv:2202.12981  [pdf, other

    stat.ME stat.ML

    Scalable Gaussian-process regression and variable selection using Vecchia approximations

    Authors: Jian Cao, Joseph Guinness, Marc G. Genton, Matthias Katzfuss

    Abstract: Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the… ▽ More

    Submitted 10 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 30 pages, 9 figures

  11. Correlation-based sparse inverse Cholesky factorization for fast Gaussian-process inference

    Authors: Myeongjong Kang, Matthias Katzfuss

    Abstract: Gaussian processes are widely used as priors for unknown functions in statistics and machine learning. To achieve computationally feasible inference for large datasets, a popular approach is the Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. The ordering and sparsity pattern are typically determ… ▽ More

    Submitted 7 April, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: 25 pages, 11 figures

    Journal ref: Statistics and Computing, 33(3), 56 (2023)

  12. arXiv:2111.13428  [pdf, other

    stat.AP

    Nonstationary Spatial Modeling of Massive Global Satellite Data

    Authors: Huang Huang, Lewis R. Blake, Matthias Katzfuss, Dorit M. Hammerling

    Abstract: Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete,… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  13. arXiv:2110.07062  [pdf, other

    stat.CO

    Ordered conditional approximation of Potts models

    Authors: Anirban Chakraborty, Matthias Katzfuss, Joseph Guinness

    Abstract: Potts models, which can be used to analyze dependent observations on a lattice, have seen widespread application in a variety of areas, including statistical mechanics, neuroscience, and quantum computing. To address the intractability of Potts likelihoods for large spatial fields, we propose fast ordered conditional approximations that enable rapid inference for observed and hidden Potts models.… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  14. arXiv:2108.04211  [pdf, other

    stat.ME stat.AP stat.CO

    Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields

    Authors: Matthias Katzfuss, Florian Schäfer

    Abstract: A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while still resulting in a closed-form and invertible poster… ▽ More

    Submitted 16 January, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: code available at https://github.com/katzfuss-group/BaTraMaSpa

  15. arXiv:2012.05967  [pdf, other

    stat.ME stat.AP stat.CO

    Bayesian nonstationary and nonparametric covariance estimation for large spatial data

    Authors: Brian Kidd, Matthias Katzfuss

    Abstract: In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the numb… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

  16. arXiv:2006.16901  [pdf, other

    stat.CO stat.ME

    Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering

    Authors: Marcin Jurek, Matthias Katzfuss

    Abstract: Spatial statistics often involves Cholesky decomposition of covariance matrices. To ensure scalability to high dimensions, several recent approximations have assumed a sparse Cholesky factor of the precision matrix. We propose a hierarchical Vecchia approximation, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix. This… ▽ More

    Submitted 23 September, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

  17. arXiv:2005.09210  [pdf, other

    stat.AP

    Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide

    Authors: Kyle P Messier, Matthias Katzfuss

    Abstract: Nitrogen dioxide (NO$_2$) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO$_2$ is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regre… ▽ More

    Submitted 16 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

  18. arXiv:2005.00386  [pdf, other

    stat.ME stat.CO stat.ML

    Scaled Vecchia approximation for fast computer-model emulation

    Authors: Matthias Katzfuss, Joseph Guinness, Earl Lawrence

    Abstract: Many scientific phenomena are studied using computer experiments consisting of multiple runs of a computer model while varying the input settings. Gaussian processes (GPs) are a popular tool for the analysis of computer experiments, enabling interpolation between input settings, but direct GP inference is computationally infeasible for large datasets. We adapt and extend a powerful class of GP met… ▽ More

    Submitted 20 July, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: R code available at https://github.com/katzfuss-group/scaledVecchia

  19. arXiv:2004.14455  [pdf, other

    math.NA math.OC math.ST stat.CO

    Sparse Cholesky factorization by Kullback-Leibler minimization

    Authors: Florian Schäfer, Matthias Katzfuss, Houman Owhadi

    Abstract: We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Θ$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $\mathcal{N}(0, Θ)$ and $\mathcal{N}(0, L^{-\top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia ap… ▽ More

    Submitted 22 October, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: The code used to run the numerical experiments can be found under https://github.com/f-t-s/cholesky_by_KL_minimization. Appeared in SIAM Journal on Scientific Computing

  20. Vecchia-Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data

    Authors: Daniel Zilber, Matthias Katzfuss

    Abstract: Generalized Gaussian processes (GGPs) are highly flexible models that combine latent GPs with potentially non-Gaussian likelihoods from the exponential family. GGPs can be used in a variety of settings, including GP classification, nonparametric count regression, modeling non-Gaussian spatial data, and analyzing point patterns. However, inference for GGPs can be analytically intractable, and large… ▽ More

    Submitted 4 June, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 26 pages, 10 figures, code available at https://github.com/katzfuss-group/GPvecchia-Laplace

    Journal ref: Computational Statistics & Data Analysis (2021), 153, 107081

  21. arXiv:1810.04200  [pdf, other

    stat.ME stat.CO

    Multi-resolution filters for massive spatio-temporal data

    Authors: Marcin Jurek, Matthias Katzfuss

    Abstract: Spatio-temporal data sets are rapidly growing in size. For example, environmental variables are measured with ever-higher resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with proper uncertainty quantification. We focus her… ▽ More

    Submitted 13 November, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

  22. Fine-scale spatiotemporal air pollution analysis using mobile monitors on Google Street View vehicles

    Authors: Yawen Guan, Margaret Johnson, Matthias Katzfuss, Elizabeth Mannshardt, Kyle P Messier, Brian J Reich, Joon ** Song

    Abstract: People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. In order to make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift… ▽ More

    Submitted 5 September, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

    Comments: This manuscript has been approved for public access. Please put this version online. Previously, this version was removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

    Journal ref: Journal of the American Statistical Association (2020), 115(531), 1111-1124

  23. Vecchia approximations of Gaussian-process predictions

    Authors: Matthias Katzfuss, Joseph Guinness, Wenlong Gong, Daniel Zilber

    Abstract: Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets. Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference. Here, we study Vecchia approximations of spatial predictions at observed and unobse… ▽ More

    Submitted 14 May, 2020; v1 submitted 8 May, 2018; originally announced May 2018.

    Journal ref: Journal of Agricultural, Biological, and Environmental Statistics, 25(3), 383-414 (2020)

  24. A class of multi-resolution approximations for large spatial datasets

    Authors: Matthias Katzfuss, Wenlong Gong

    Abstract: Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution… ▽ More

    Submitted 20 July, 2018; v1 submitted 24 October, 2017; originally announced October 2017.

    Journal ref: Statistica Sinica, 30(4), 2203-2226 (2020)

  25. arXiv:1710.05013  [pdf, other

    stat.ME

    A Case Study Competition Among Methods for Analyzing Large Spatial Data

    Authors: Matthew J. Heaton, Abhirup Datta, Andrew Finley, Reinhard Furrer, Rajarshi Guhaniyogi, Florian Gerber, Robert B. Gramacy, Dorit Hammerling, Matthias Katzfuss, Finn Lindgren, Douglas W. Nychka, Furong Sun, Andrew Zammit-Mangion

    Abstract: The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structu… ▽ More

    Submitted 25 April, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

  26. arXiv:1708.06302  [pdf, other

    stat.ME stat.CO

    A general framework for Vecchia approximations of Gaussian processes

    Authors: Matthias Katzfuss, Joseph Guinness

    Abstract: Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many… ▽ More

    Submitted 17 August, 2019; v1 submitted 21 August, 2017; originally announced August 2017.

    Journal ref: Statistical Science, 36(1), 124-141 (2021)

  27. Ensemble Kalman methods for high-dimensional hierarchical dynamic space-time models

    Authors: Matthias Katzfuss, Jonathan R. Stroud, Christopher K. Wikle

    Abstract: We propose a new class of filtering and smoothing methods for inference in high-dimensional, nonlinear, non-Gaussian, spatio-temporal state-space models. The main idea is to combine the ensemble Kalman filter and smoother, developed in the geophysics literature, with state-space algorithms from the statistics literature. Our algorithms address a variety of estimation scenarios, including on-line a… ▽ More

    Submitted 8 August, 2018; v1 submitted 23 April, 2017; originally announced April 2017.

    Journal ref: Journal of the American Statistical Association, Theory & Methods (2019+)

  28. arXiv:1611.03835  [pdf, other

    stat.ME stat.CO

    A Bayesian adaptive ensemble Kalman filter for sequential state and parameter estimation

    Authors: Jonathan R. Stroud, Matthias Katzfuss, Christopher K. Wikle

    Abstract: This paper proposes new methodology for sequential state and parameter estimation within the ensemble Kalman filter. The method is fully Bayesian and propagates the joint posterior density of states and parameters over time. In order to implement the method we consider two representations of the marginal posterior distribution of the parameters: a grid-based approach and a Gaussian approximation.… ▽ More

    Submitted 11 November, 2016; originally announced November 2016.

    Comments: 19 pages

  29. A multi-resolution approximation for massive spatial datasets

    Authors: Matthias Katzfuss

    Abstract: Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big da… ▽ More

    Submitted 7 December, 2015; v1 submitted 16 July, 2015; originally announced July 2015.

    Comments: 23 pages; to be published in Journal of the American Statistical Association

  30. arXiv:1506.01917  [pdf, other

    stat.ME

    Interpretation of point forecasts with unkown directive

    Authors: Patrick Schmidt, Matthias Katzfuß, Tilmann Gneiting

    Abstract: Point forecasts can be interpreted as functionals (i.e., point summaries) of predictive distributions. We consider the situation where forecasters' directives are hidden and develop methodology for the identification of the unknown functional based on time series data of point forecasts and associated realizations. Focusing on the natural cases of state-dependent quantiles and expectiles, we provi… ▽ More

    Submitted 18 February, 2019; v1 submitted 5 June, 2015; originally announced June 2015.

  31. arXiv:1410.4827  [pdf, other

    stat.AP stat.ME

    BADER: Bayesian analysis of differential expression in RNA sequencing data

    Authors: Matthias Katzfuss, Andreas Neudecker, Simon Anders, Julien Gagneur

    Abstract: Identifying differentially expressed genes from RNA sequencing data remains a challenging task because of the considerable uncertainties in parameter estimation and the small sample sizes in typical applications. Here we introduce Bayesian Analysis of Differential Expression in RNA-sequencing data (BADER). Due to our choice of data and prior distributions, full posterior inference for BADER can be… ▽ More

    Submitted 7 November, 2014; v1 submitted 17 October, 2014; originally announced October 2014.

    Comments: 14 pages, 3 figures, 1 table

  32. Parallel inference for massive distributed spatial data using low-rank models

    Authors: Matthias Katzfuss, Dorit Hammerling

    Abstract: Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require proc… ▽ More

    Submitted 5 February, 2016; v1 submitted 6 February, 2014; originally announced February 2014.

    Comments: 20 pages; published in Statistics and Computing

  33. arXiv:1204.2098  [pdf, other

    stat.ME stat.AP stat.CO

    Bayesian Nonstationary Spatial Modeling for Very Large Datasets

    Authors: Matthias Katzfuss

    Abstract: With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: First, traditional spatial-statistical techniques are o… ▽ More

    Submitted 21 December, 2012; v1 submitted 10 April, 2012; originally announced April 2012.

    Comments: 16 pages, 2 color figures

    Journal ref: Environmetrics 24 (2013) 189-200