Search | arXiv e-print repository

Optimal prediction of positive-valued spatial processes: asymmetric power-divergence loss

Authors: Alan R. Pearse, Noel Cressie, David Gunawan

Abstract: This article studies the use of asymmetric loss functions for the optimal prediction of positive-valued spatial processes. We focus on the family of power-divergence loss functions due to its many convenient properties, such as its continuity, convexity, relationship to well known divergence measures, and the ability to control the asymmetry and behaviour of the loss function via a power parameter… ▽ More This article studies the use of asymmetric loss functions for the optimal prediction of positive-valued spatial processes. We focus on the family of power-divergence loss functions due to its many convenient properties, such as its continuity, convexity, relationship to well known divergence measures, and the ability to control the asymmetry and behaviour of the loss function via a power parameter. The properties of power-divergence loss functions, optimal power-divergence (OPD) spatial predictors, and related measures of uncertainty quantification are examined. In addition, we examine the notion of asymmetry in loss functions defined for positive-valued spatial processes and define an asymmetry measure that is applied to the power-divergence loss function and other common loss functions. The paper concludes with a spatial statistical analysis of zinc measurements in the soil of a floodplain of the Meuse River, Netherlands, using OPD spatial prediction. △ Less

Submitted 8 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 42 pages including appendix, 10 figures, submitted to Spatial Statistics

arXiv:2311.09491 [pdf, other]

Spatial Bayesian Neural Networks

Authors: Andrew Zammit-Mangion, Michael D. Kaminski, Ba-Hien Tran, Maurizio Filippone, Noel Cressie

Abstract: interpretable, and well understood models that are routinely employed even though, as is revealed through prior and posterior predictive checks, these can poorly characterise the spatial heterogeneity in the underlying process of interest. Here, we propose a new, flexible class of spatial-process models, which we refer to as spatial Bayesian neural networks (SBNNs). An SBNN leverages the represent… ▽ More interpretable, and well understood models that are routinely employed even though, as is revealed through prior and posterior predictive checks, these can poorly characterise the spatial heterogeneity in the underlying process of interest. Here, we propose a new, flexible class of spatial-process models, which we refer to as spatial Bayesian neural networks (SBNNs). An SBNN leverages the representational capacity of a Bayesian neural network; it is tailored to a spatial setting by incorporating a spatial ``embedding layer'' into the network and, possibly, spatially-varying network parameters. An SBNN is calibrated by matching its finite-dimensional distribution at locations on a fine gridding of space to that of a target process of interest. That process could be easy to simulate from or we may have many realisations from it. We propose several variants of SBNNs, most of which are able to match the finite-dimensional distribution of the target process at the selected grid better than conventional BNNs of similar complexity. We also show that an SBNN can be used to represent a variety of spatial processes often used in practice, such as Gaussian processes, lognormal processes, and max-stable processes. We briefly discuss the tools that could be used to make inference with SBNNs, and we conclude with a discussion of their advantages and limitations. △ Less

Submitted 4 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 35 pages, 21 figures

arXiv:2210.10479 [pdf, other]

Inferring changes to the global carbon cycle with WOMBAT v2.0, a hierarchical flux-inversion framework

Authors: Michael Bertolacci, Andrew Zammit-Mangion, Andrew Schuh, Beata Bukosa, Jenny Fisher, Yi Cao, Aleya Kaushik, Noel Cressie

Abstract: The natural cycles of the surface-to-atmosphere fluxes of carbon dioxide (CO$_2$) and other important greenhouse gases are changing in response to human influences. These changes need to be quantified to understand climate change and its impacts, but this is difficult to do because natural fluxes occur over large spatial and temporal scales. To infer trends in fluxes and identify phase shifts and… ▽ More The natural cycles of the surface-to-atmosphere fluxes of carbon dioxide (CO$_2$) and other important greenhouse gases are changing in response to human influences. These changes need to be quantified to understand climate change and its impacts, but this is difficult to do because natural fluxes occur over large spatial and temporal scales. To infer trends in fluxes and identify phase shifts and amplitude changes in flux seasonal cycles, we construct a flux-inversion system that uses a novel spatially varying time-series decomposition of the fluxes, while also accommodating physical constraints on the fluxes. We incorporate these features into the Wollongong Methodology for Bayesian Assimilation of Trace-gases (WOMBAT, Zammit-Mangion et al., Geosci. Model Dev., 15, 2022), a hierarchical flux-inversion framework that yields posterior distributions for all unknowns in the underlying model. We apply the new method, which we call WOMBAT v2.0, to a mix of satellite observations of CO$_2$ mole fraction from the Orbiting Carbon Observatory-2 (OCO-2) satellite and direct measurements of CO$_2$ mole fraction from a variety of sources. We estimate the changes to CO$_2$ fluxes that occurred from January 2015 to December 2020, and compare our posterior estimates to those from an alternative method based on a bottom-up understanding of the physical processes involved. We find substantial trends in the fluxes, including that tropical ecosystems trended from being a net source to a net sink of CO$_2$ over the study period. We also find that the amplitude of the global seasonal cycle of ecosystem CO$_2$ fluxes increased over the study period by 0.11 PgC/month (an increase of 8%), and that the seasonal cycle of ecosystem CO$_2$ fluxes in the northern temperate and northern boreal regions shifted earlier in the year by 0.4-0.7 and 0.4-0.9 days, respectively (2.5th to 97.5th posterior percentiles). △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2209.13157 [pdf, ps, other]

Decisions, decisions, decisions in an uncertain environment

Authors: Noel Cressie

Abstract: Decision-makers abhor uncertainty, and it is certainly true that the less there is of it the better. However, recognizing that uncertainty is part of the equation, particularly for deciding on environmental policy, is a prerequisite for making wise decisions. Even making no decision is a decision that has consequences, and using the presence of uncertainty as the reason for failing to act is a poo… ▽ More Decision-makers abhor uncertainty, and it is certainly true that the less there is of it the better. However, recognizing that uncertainty is part of the equation, particularly for deciding on environmental policy, is a prerequisite for making wise decisions. Even making no decision is a decision that has consequences, and using the presence of uncertainty as the reason for failing to act is a poor excuse. Statistical science is the science of uncertainty, and it should play a critical role in the decision-making process. This opinion piece focuses on the summit of the knowledge pyramid that starts from data and rises in steps from data to information, from information to knowledge, and finally from knowledge to decisions. Enormous advances have been made in the last 100 years ascending the pyramid, with deviations that have followed different routes. There has generally been a healthy supply of uncertainty quantification along the way but, in a rush to the top, where the decisions are made, uncertainty is often left behind. In my opinion, statistical science needs to be much more pro-active in evolving classical decision theory into a relevant and practical area of decision applications. This article follows several threads, building on the decision-theoretic foundations of loss functions and Bayesian uncertainty. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 15 pages

arXiv:2202.03660 [pdf, other]

Basis-Function Models in Spatial Statistics

Authors: Noel Cressie, Matthew Sainsbury-Dale, Andrew Zammit-Mangion

Abstract: Spatial statistics is concerned with the analysis of data that have spatial locations associated with them, and those locations are used to model statistical dependence between the data. The spatial data are treated as a single realisation from a probability model that encodes the dependence through both fixed effects and random effects, where randomness is manifest in the underlying spatial proce… ▽ More Spatial statistics is concerned with the analysis of data that have spatial locations associated with them, and those locations are used to model statistical dependence between the data. The spatial data are treated as a single realisation from a probability model that encodes the dependence through both fixed effects and random effects, where randomness is manifest in the underlying spatial process and in the noisy, incomplete, measurement process. The focus of this review article is on the use of basis functions to provide an extremely flexible and computationally efficient way to model spatial processes that are possibly highly non-stationary. Several examples of basis-function models are provided to illustrate how they are used in Gaussian, non-Gaussian, multivariate, and spatio-temporal settings, with applications in geophysics. Our aim is to emphasise the versatility of these spatial statistical models and to demonstrate that they are now centre-stage in a number of application domains. The review concludes with a discussion and illustration of software currently available to fit spatial-basis-function models and implement spatial-statistical prediction. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 30 pages, 6 figures

arXiv:2110.02507 [pdf, other]

Modelling Big, Heterogeneous, Non-Gaussian Spatial and Spatio-Temporal Data using FRK

Authors: Matthew Sainsbury-Dale, Andrew Zammit-Mangion, Noel Cressie

Abstract: Non-Gaussian spatial and spatio-temporal data are becoming increasingly prevalent, and their analysis is needed in a variety of disciplines. FRK is an R package for spatial/spatio-temporal modelling and prediction with very large data sets that, to date, has only supported linear process models and Gaussian data models. In this paper, we describe a major upgrade to FRK that allows for non-Gaussian… ▽ More Non-Gaussian spatial and spatio-temporal data are becoming increasingly prevalent, and their analysis is needed in a variety of disciplines. FRK is an R package for spatial/spatio-temporal modelling and prediction with very large data sets that, to date, has only supported linear process models and Gaussian data models. In this paper, we describe a major upgrade to FRK that allows for non-Gaussian data to be analysed in a generalised linear mixed model framework. These vastly more general spatial and spatio-temporal models are fitted using the Laplace approximation via the software TMB. The existing functionality of FRK is retained with this advance into non-Gaussian models; in particular, it allows for automatic basis-function construction, it can handle both point-referenced and areal data simultaneously, and it can predict process values at any spatial support from these data. This new version of FRK also allows for the use of a large number of basis functions when modelling the spatial process, and is thus often able to achieve more accurate predictions than previous versions of the package in a Gaussian setting. We demonstrate innovative features in this new version of FRK, highlight its ease of use, and compare it to alternative packages using both simulated and real data sets. △ Less

Submitted 19 November, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 38 pages, 15 figures

MSC Class: 62-04

arXiv:2107.04208 [pdf, other]

doi 10.1029/2022GL098277

From Many to One: Consensus Inference in a MIP

Authors: Noel Cressie, Michael Bertolacci, Andrew Zammit-Mangion

Abstract: A Model Intercomparison Project (MIP) consists of teams who each estimate the same underlying quantity (e.g., temperature projections to the year 2070), and the spread of the estimates indicates their uncertainty. It recognizes that a community of scientists will not agree completely but that there is value in looking for a consensus and information in the range of disagreement. A simple average o… ▽ More A Model Intercomparison Project (MIP) consists of teams who each estimate the same underlying quantity (e.g., temperature projections to the year 2070), and the spread of the estimates indicates their uncertainty. It recognizes that a community of scientists will not agree completely but that there is value in looking for a consensus and information in the range of disagreement. A simple average of the teams' outputs gives a consensus estimate, but it does not recognize that some outputs are more variable than others. Statistical analysis of variance (ANOVA) models offer a way to obtain a weighted consensus estimate of outputs with a variance that is the smallest possible and hence the tightest possible 'one-sigma' and 'two-sigma' intervals. Modulo dependence between MIP outputs, the ANOVA approach weights a team's output inversely proportional to its variation. When external verification data are available for evaluating the fidelity of each MIP output, ANOVA weights can also provide a prior distribution for Bayesian Model Averaging to yield a consensus estimate. We use a MIP of carbon dioxide flux inversions to illustrate the ANOVA-based weighting and subsequent consensus inferences. △ Less

Submitted 9 July, 2021; originally announced July 2021.

arXiv:2105.07216 [pdf, other]

Spatial Statistics

Authors: Noel Cressie, Matthew T. Moores

Abstract: Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the "location information" associated with the "attribute information," whose study defines a research area called "spatial analysis." Many of the ways to manipulate spatial data are driven by algorithms with no uncertainty quantification ass… ▽ More Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the "location information" associated with the "attribute information," whose study defines a research area called "spatial analysis." Many of the ways to manipulate spatial data are driven by algorithms with no uncertainty quantification associated with them. When a spatial analysis is statistical, that is, it incorporates uncertainty quantification, it falls in the research area called spatial statistics. The primary feature of spatial statistical models is that nearby attribute values are more statistically dependent than distant attribute values; this is a paraphrasing of what is sometimes called the First Law of Geography (Tobler, 1970). △ Less

Submitted 15 May, 2021; originally announced May 2021.

MSC Class: 62H11

arXiv:2102.04004 [pdf, other]

WOMBAT: A fully Bayesian global flux-inversion framework

Authors: Andrew Zammit-Mangion, Michael Bertolacci, Jenny Fisher, Ann Stavert, Matthew L. Rigby, Yi Cao, Noel Cressie

Abstract: WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertaint… ▽ More WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertainty quantification on all unknowns that appear in the Bayesian statistical model. We show, in an observing system simulation experiment (OSSE), that these extensions are crucial when the data are indeed biased and have errors that are correlated. Using the GEOS-Chem atmospheric transport model, we show that WOMBAT is able to obtain posterior means and uncertainties on non-fossil-fuel CO$_2$ fluxes from Orbiting Carbon Observatory-2 (OCO-2) data that are comparable to those from the Model Intercomparison Project (MIP) reported in Crowell et al. (2019, Atmos. Chem. Phys., vol. 19). We also find that our predictions of out-of-sample retrievals from the Total Column Carbon Observing Network are, for the most part, more accurate than those made by the MIP participants. Subsequent versions of the OCO-2 datasets will be ingested into WOMBAT as they become available. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: 46 pages, 13 figures

arXiv:2102.01892 [pdf, ps, other]

A few statistical principles for data science

Authors: Noel Cressie

Abstract: In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science,… ▽ More In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions), and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow 'error bars' to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: 19 pages; written for a special issue (festschrift) of the Australian and New Zealand Journal of Statistics

arXiv:2004.08724 [pdf, other]

doi 10.5705/ss.202020.0156

Modeling Nonstationary and Asymmetric Multivariate Spatial Covariances via Deformations

Authors: Quan Vu, Andrew Zammit-Mangion, Noel Cressie

Abstract: Multivariate spatial-statistical models are often used when modeling environmental and socio-demographic processes. The most commonly used models for multivariate spatial covariances assume both stationarity and symmetry for the cross-covariances, but these assumptions are rarely tenable in practice. In this article we introduce a new and highly flexible class of nonstationary and asymmetric multi… ▽ More Multivariate spatial-statistical models are often used when modeling environmental and socio-demographic processes. The most commonly used models for multivariate spatial covariances assume both stationarity and symmetry for the cross-covariances, but these assumptions are rarely tenable in practice. In this article we introduce a new and highly flexible class of nonstationary and asymmetric multivariate spatial covariance models that are constructed by modeling the simpler and more familiar stationary and symmetric multivariate covariances on a warped domain. Inspired by recent developments in the univariate case, we propose modeling the war** function as a composition of a number of simple injective war** functions in a deep-learning framework. Importantly, covariance-model validity is guaranteed by construction. We establish the types of war**s that allow for cross-covariance symmetry and asymmetry, and we use likelihood-based methods for inference that are computationally efficient. The utility of this new class of models is shown through two data illustrations: a simulation study on nonstationary data and an application on ocean temperatures at two different depths. △ Less

Submitted 26 January, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

arXiv:2003.06843 [pdf, other]

Bayesian Inference of Spatio-Temporal Changes of Arctic Sea Ice

Authors: Bohai Zhang, Noel Cressie

Abstract: Arctic sea ice extent has drawn increasing interest and alarm from geoscientists, owing to its rapid decline. In this article, we propose a Bayesian spatio-temporal hierarchical statistical model for binary Arctic sea ice data over two decades, where a latent dynamic spatio-temporal Gaussian process is used to model the data-dependence through a logit link function. Our ultimate goal is to perform… ▽ More Arctic sea ice extent has drawn increasing interest and alarm from geoscientists, owing to its rapid decline. In this article, we propose a Bayesian spatio-temporal hierarchical statistical model for binary Arctic sea ice data over two decades, where a latent dynamic spatio-temporal Gaussian process is used to model the data-dependence through a logit link function. Our ultimate goal is to perform inference on the dynamic spatial behavior of Arctic sea ice over a period of two decades. Physically motivated covariates are assessed using autologistic diagnostics. Our Bayesian spatio-temporal model shows how parameter uncertainty in such a complex hierarchical model can influence spatio-temporal prediction. The posterior distributions of new summary statistics are proposed to detect the changing patterns of Arctic sea ice over two decades since 1997. △ Less

Submitted 15 March, 2020; originally announced March 2020.

MSC Class: 62F15; 60G15; 62J12; 62P12

arXiv:1905.06268 [pdf, other]

False Discovery Rates to Detect Signals from Incomplete Spatially Aggregated Data

Authors: Hsin-Cheng Huang, Noel Cressie, Andrew Zammit-Mangion, Guowen Huang

Abstract: There are a number of ways to test for the absence/presence of a spatial signal in a completely observed fine-resolution image. One of these is a powerful nonparametric procedure called Enhanced False Discovery Rate (EFDR). A drawback of EFDR is that it requires the data to be defined on regular pixels in a rectangular spatial domain. Here, we develop an EFDR procedure for possibly incomplete data… ▽ More There are a number of ways to test for the absence/presence of a spatial signal in a completely observed fine-resolution image. One of these is a powerful nonparametric procedure called Enhanced False Discovery Rate (EFDR). A drawback of EFDR is that it requires the data to be defined on regular pixels in a rectangular spatial domain. Here, we develop an EFDR procedure for possibly incomplete data defined on irregular small areas. Motivated by statistical learning, we use conditional simulation (CS) to condition on the available data and simulate the full rectangular image at its finest resolution many times (M, say). EFDR is then applied to each of these simulations resulting in M estimates of the signal and M statistically dependent p-values. Averaging over these estimates yields a single, combined estimate of a possible signal, but inference is needed to determine whether there really is a signal present. We test the original null hypothesis of no signal by combining the M p-values into a single p-value using copulas and a composite likelihood. If the null hypothesis of no signal is rejected, we use the combined estimate. We call this new procedure EFDR-CS and, to demonstrate its effectiveness, we show results from a simulation study; an experiment where we introduce aggregation and incompleteness into temperature-change data in the Asia-Pacific; and an application to total-column carbon dioxide from satellite remote sensing data over a region of the Middle East, Afghanistan, and the western part of Pakistan. △ Less

Submitted 17 October, 2020; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: 45 pages, 23 figures, 2 tables

MSC Class: 62M30

arXiv:1808.05928 [pdf, other]

doi 10.1029/2018GL080082

A hierarchical statistical framework for emergent constraints: application to snow-albedo feedback

Authors: Kevin Bowman, Noel Cressie, Xin Qu, Alex Hall

Abstract: Emergent constraints use relationships between future and current climate states to constrain projections of climate response. Here, we introduce a statistical, hierarchical emergent constraint (HEC) framework in order to link future and current climate with observations. Under Gaussian assumptions, the mean and variance of the future state is shown analytically to be a function of the signal-to-n… ▽ More Emergent constraints use relationships between future and current climate states to constrain projections of climate response. Here, we introduce a statistical, hierarchical emergent constraint (HEC) framework in order to link future and current climate with observations. Under Gaussian assumptions, the mean and variance of the future state is shown analytically to be a function of the signal-to-noise (SNR) ratio between data-model error and current-climate uncertainty, and the correlation between future and current climate states. We apply the HEC to the climate-change, snow-albedo feedback, which is related to the seasonal cycle in the Northern Hemisphere. We obtain a snow-albedo-feedback prediction interval of $(-1.25, -0.58)$ \%$K^{-1}$. The critical dependence on SNR and correlation shows that neglecting these terms can lead to bias and under-estimated uncertainty in constrained projections. The flexibility of using HEC under general assumptions throughout the Earth System is discussed. △ Less

Submitted 17 August, 2018; originally announced August 2018.

Comments: 19 pages, 5 Figures

arXiv:1711.07629 [pdf, other]

doi 10.3390/rs10010155

On statistical approaches to generate Level 3 products from satellite remote sensing retrievals

Authors: Andrew Zammit-Mangion, Noel Cressie, Clint Shumack

Abstract: Satellite remote sensing of trace gases such as carbon dioxide (CO$_2$) has increased our ability to observe and understand Earth's climate. However, these remote sensing data, specifically~Level 2 retrievals, tend to be irregular in space and time, and hence, spatio-temporal prediction is required to infer values at any location and time point. Such inferences are not only required to answer impo… ▽ More Satellite remote sensing of trace gases such as carbon dioxide (CO$_2$) has increased our ability to observe and understand Earth's climate. However, these remote sensing data, specifically~Level 2 retrievals, tend to be irregular in space and time, and hence, spatio-temporal prediction is required to infer values at any location and time point. Such inferences are not only required to answer important questions about our climate, but they are also needed for validating the satellite instrument, since Level 2 retrievals are generally not co-located with ground-based remote sensing instruments. Here, we discuss statistical approaches to construct Level 3 products from Level 2 retrievals, placing particular emphasis on the strengths and potential pitfalls when using statistical prediction in this context. Following this discussion, we use a spatio-temporal statistical modelling framework known as fixed rank kriging (FRK) to obtain global predictions and prediction standard errors of column-averaged carbon dioxide based on Version 7r and Version 8r retrievals from the Orbiting Carbon Observatory-2 (OCO-2) satellite. The FRK predictions allow us to validate statistically the Level 2 retrievals globally even though the data are at locations and at time points that do not coincide with validation data. Importantly, the validation takes into account the prediction uncertainty, which is dependent both on the temporally-varying density of observations around the ground-based measurement sites and on the spatio-temporal high-frequency components of the trace gas field that are not explicitly modelled. Here, for validation of remotely-sensed CO$_2$ data, we use observations from the Total Carbon Column Observing Network. We demonstrate that the resulting FRK product based on Version 8r compares better with TCCON data than that based on Version 7r. △ Less

Submitted 6 February, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: 28 pages, 10 figures, 4 tables

Journal ref: Zammit-Mangion, A.; Cressie, N.; Shumack, C. On Statistical Approaches to Generate Level 3 Products from Satellite Remote Sensing Retrievals. Remote Sens. 2018, 10, 155

arXiv:1705.08105 [pdf, other]

FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets

Authors: Andrew Zammit-Mangion, Noel Cressie

Abstract: FRK is an R software package for spatial/spatio-temporal modelling and prediction with large datasets. It facilitates optimal spatial prediction (kriging) on the most commonly used manifolds (in Euclidean space and on the surface of the sphere), for both spatial and spatio-temporal fields. It differs from many of the packages for spatial modelling and prediction by avoiding stationary and isotropi… ▽ More FRK is an R software package for spatial/spatio-temporal modelling and prediction with large datasets. It facilitates optimal spatial prediction (kriging) on the most commonly used manifolds (in Euclidean space and on the surface of the sphere), for both spatial and spatio-temporal fields. It differs from many of the packages for spatial modelling and prediction by avoiding stationary and isotropic covariance and variogram models, instead constructing a spatial random effects (SRE) model on a fine-resolution discretised spatial domain. The discrete element is known as a basic areal unit (BAU), whose introduction in the software leads to several practical advantages. The software can be used to (i) integrate multiple observations with different supports with relative ease; (ii) obtain exact predictions at millions of prediction locations (without conditional simulation); and (iii) distinguish between measurement error and fine-scale variation at the resolution of the BAU, thereby allowing for reliable uncertainty quantification. The temporal component is included by adding another dimension. A key component of the SRE model is the specification of spatial or spatio-temporal basis functions; in the package, they can be generated automatically or by the user. The package also offers automatic BAU construction, an expectation-maximisation (EM) algorithm for parameter estimation, and functionality for prediction over any user-specified polygons or BAUs. Use of the package is illustrated on several spatial and spatio-temporal datasets, and its predictions and the model it implements are extensively compared to others commonly used for spatial prediction and modelling. △ Less

Submitted 7 June, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

Comments: 44 pages, 22 figures

arXiv:1606.04564 [pdf, other]

Non-Gaussian bivariate modelling with application to atmospheric trace-gas inversion

Authors: Andrew Zammit-Mangion, Noel Cressie, Anita L. Ganesan

Abstract: Atmospheric trace-gas inversion is the procedure by which the sources and sinks of a trace gas are identified from observations of its mole fraction at isolated locations in space and time. This is inherently a spatio-temporal bivariate inversion problem, since the mole-fraction field evolves in space and time and the flux is also spatio-temporally distributed. Further, the bivariate model is like… ▽ More Atmospheric trace-gas inversion is the procedure by which the sources and sinks of a trace gas are identified from observations of its mole fraction at isolated locations in space and time. This is inherently a spatio-temporal bivariate inversion problem, since the mole-fraction field evolves in space and time and the flux is also spatio-temporally distributed. Further, the bivariate model is likely to be non-Gaussian since the flux field is rarely Gaussian. Here, we use conditioning to construct a non-Gaussian bivariate model, and we describe some of its properties through auto- and cross-cumulant functions. A bivariate non-Gaussian, specifically trans-Gaussian, model is then achieved through the use of Box--Cox transformations, and we facilitate Bayesian inference by approximating the likelihood in a hierarchical framework. Trace-gas inversion, especially at high spatial resolution, is frequently highly sensitive to prior specification. Therefore, unlike conventional approaches, we assimilate trace-gas inventory information with the observational data at the parameter layer, thus shifting prior sensitivity from the inventory itself to its spatial characteristics (e.g., its spatial length scale). We demonstrate the approach in controlled-experiment studies of methane inversion, using fluxes extracted from inventories of the UK and Ireland and of Northern Australia. △ Less

Submitted 14 June, 2016; originally announced June 2016.

Comments: 45 pages, 7 figures

arXiv:1509.04819 [pdf, ps, other]

doi 10.1214/15-BA944B

Comment on Article by Ferreira and Gamerman

Authors: Noel Cressie, Raymond L. Chambers

Abstract: A utility-function approach to optimal spatial sampling design is a powerful way to quantify what "optimality" means. The emphasis then should be to capture all possible contributions to utility, including scientific impact and the cost of sampling. The resulting sampling plan should contain a component of designed randomness that would allow for a non-parametric design-based analysis if model-bas… ▽ More A utility-function approach to optimal spatial sampling design is a powerful way to quantify what "optimality" means. The emphasis then should be to capture all possible contributions to utility, including scientific impact and the cost of sampling. The resulting sampling plan should contain a component of designed randomness that would allow for a non-parametric design-based analysis if model-based assumptions were in doubt. [arXiv:1509.03410] △ Less

Submitted 16 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/15-BA944B in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

Report number: VTeX-BA-BA944B

Journal ref: Bayesian Analysis 2015, Vol. 10, No. 3, 741-748

arXiv:1509.00915 [pdf, other]

doi 10.1016/j.chemolab.2015.09.006

Spatio-temporal bivariate statistical models for atmospheric trace-gas inversion

Authors: Andrew Zammit-Mangion, Noel Cressie, Anita L. Ganesan, Simon O' Doherty, Alistair J. Manning

Abstract: Atmospheric trace-gas inversion refers to any technique used to predict spatial and temporal fluxes using mole-fraction measurements and atmospheric simulations obtained from computer models. Studies to date are most often of a data-assimilation flavour, which implicitly consider univariate statistical models with the flux as the variate of interest. This univariate approach typically assumes that… ▽ More Atmospheric trace-gas inversion refers to any technique used to predict spatial and temporal fluxes using mole-fraction measurements and atmospheric simulations obtained from computer models. Studies to date are most often of a data-assimilation flavour, which implicitly consider univariate statistical models with the flux as the variate of interest. This univariate approach typically assumes that the flux field is either a spatially correlated Gaussian process or a spatially uncorrelated non-Gaussian process with prior expectation fixed using flux inventories (e.g., NAEI or EDGAR in Europe). Here, we extend this approach in three ways. First, we develop a bivariate model for the mole-fraction field and the flux field. The bivariate approach allows optimal prediction of both the flux field and the mole-fraction field, and it leads to significant computational savings over the univariate approach. Second, we employ a lognormal spatial process for the flux field that captures both the lognormal characteristics of the flux field (when appropriate) and its spatial dependence. Third, we propose a new, geostatistical approach to incorporate the flux inventories in our updates, such that the posterior spatial distribution of the flux field is predominantly data-driven. The approach is illustrated on a case study of methane (CH$_4$) emissions in the United Kingdom and Ireland. △ Less

Submitted 19 October, 2015; v1 submitted 2 September, 2015; originally announced September 2015.

Comments: 39 pages, 8 figures

Journal ref: Chemometrics and Intelligent Laboratory Systems, Vol. 149, 15.12.2015, p. 227-241

arXiv:1507.08401 [pdf, ps, other]

doi 10.1214/15-STS517

Capturing Multivariate Spatial Dependence: Model, Estimate and then Predict

Authors: Noel Cressie, Sandy Burden, Walter Davis, Pavel N. Krivitsky, Payam Mokhtarian, Thomas Suesse, Andrew Zammit-Mangion

Abstract: Physical processes rarely occur in isolation, rather they influence and interact with one another. Thus, there is great benefit in modeling potential dependence between both spatial locations and different processes. It is the interaction between these two dependencies that is the focus of Genton and Kleiber's paper under discussion. We see the problem of ensuring that any multivariate spatial cov… ▽ More Physical processes rarely occur in isolation, rather they influence and interact with one another. Thus, there is great benefit in modeling potential dependence between both spatial locations and different processes. It is the interaction between these two dependencies that is the focus of Genton and Kleiber's paper under discussion. We see the problem of ensuring that any multivariate spatial covariance matrix is nonnegative definite as important, but we also see it as a means to an end. That "end" is solving the scientific problem of predicting a multivariate field. [arXiv:1507.08017]. △ Less

Submitted 30 July, 2015; originally announced July 2015.

Comments: Published at http://dx.doi.org/10.1214/15-STS517 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS517

Journal ref: Statistical Science 2015, Vol. 30, No. 2, 170-175

arXiv:1504.01865 [pdf, other]

Multivariate Spatial Covariance Models: A Conditional Approach

Authors: Noel Cressie, Andrew Zammit-Mangion

Abstract: Multivariate geostatistics is based on modelling all covariances between all possible combinations of two or more variables at any sets of locations in a continuously indexed domain. Multivariate spatial covariance models need to be built with care, since any covariance matrix that is derived from such a model must be nonnegative-definite. In this article, we develop a conditional approach for spa… ▽ More Multivariate geostatistics is based on modelling all covariances between all possible combinations of two or more variables at any sets of locations in a continuously indexed domain. Multivariate spatial covariance models need to be built with care, since any covariance matrix that is derived from such a model must be nonnegative-definite. In this article, we develop a conditional approach for spatial-model construction whose validity conditions are easy to check. We start with bivariate spatial covariance models and go on to demonstrate the approach's connection to multivariate models defined by networks of spatial variables. In some circumstances, such as modelling respiratory illness conditional on air pollution, the direction of conditional dependence is clear. When it is not, the two directional models can be compared. More generally, the graph structure of the network reduces the number of possible models to compare. Model selection then amounts to finding possible causative links in the network. We demonstrate our conditional approach on surface temperature and pressure data, where the role of the two variables is seen to be asymmetric. △ Less

Submitted 7 October, 2016; v1 submitted 8 April, 2015; originally announced April 2015.

Comments: 22 pages, 3 figures

arXiv:1410.7748 [pdf, ps, other]

A Comparison of Spatial Predictors when Datasets Could be Very Large

Authors: Jonathan R. Bradley, Noel Cressie, Tao Shi

Abstract: In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a sto… ▽ More In this article, we review and compare a number of methods of spatial prediction. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, Fixed Rank Kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $\mathrm{CO}_{2}$ data from NASA's AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data. △ Less

Submitted 28 October, 2014; originally announced October 2014.

arXiv:1303.6668 [pdf, ps, other]

Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates

Authors: Aaron T. Porter, Scott H. Holan, Christopher K. Wikle, Noel Cressie

Abstract: The Fay-Herriot (FH) model is widely used in small area estimation and uses auxiliary information to reduce estimation variance at undersampled locations. We extend the type of covariate information used in the FH model to include functional covariates, such as social-media search loads or remote-sensing images (e.g., in crop-yield surveys). The inclusion of these functional covariates is facilita… ▽ More The Fay-Herriot (FH) model is widely used in small area estimation and uses auxiliary information to reduce estimation variance at undersampled locations. We extend the type of covariate information used in the FH model to include functional covariates, such as social-media search loads or remote-sensing images (e.g., in crop-yield surveys). The inclusion of these functional covariates is facilitated through a two-stage dimension-reduction approach that includes a Karhunen-Loève expansion followed by stochastic search variable selection. Additionally, the importance of modeling spatial autocorrelation has recently been recognized in the FH model; our model utilizes the intrinsic conditional autoregressive class of spatial models in addition to functional covariates. We demonstrate the effectiveness of our approach through simulation and analysis of data from the American Community Survey. We use Google Trends searches over time as functional covariates to analyze relative changes in rates of percent household Spanish-speaking in the eastern half of the United States. △ Less

Submitted 9 May, 2014; v1 submitted 26 March, 2013; originally announced March 2013.

Comments: 26 pages, 5 figures

arXiv:1211.1717 [pdf, other]

doi 10.1890/12-0312.1

Bayesian Learning and Predictability in a Stochastic Nonlinear Dynamical Model

Authors: John Parslow, Noel Cressie, Edward P. Campbell, Emlyn Jones, Lawrence Murray

Abstract: Bayesian inference methods are applied within a Bayesian hierarchical modelling framework to the problems of joint state and parameter estimation, and of state forecasting. We explore and demonstrate the ideas in the context of a simple nonlinear marine biogeochemical model. A novel approach is proposed to the formulation of the stochastic process model, in which ecophysiological properties of pla… ▽ More Bayesian inference methods are applied within a Bayesian hierarchical modelling framework to the problems of joint state and parameter estimation, and of state forecasting. We explore and demonstrate the ideas in the context of a simple nonlinear marine biogeochemical model. A novel approach is proposed to the formulation of the stochastic process model, in which ecophysiological properties of plankton communities are represented by autoregressive stochastic processes. This approach captures the effects of changes in plankton communities over time, and it allows the incorporation of literature metadata on individual species into prior distributions for process model parameters. The approach is applied to a case study at Ocean Station Papa, using Particle Markov chain Monte Carlo computational techniques. The results suggest that, by drawing on objective prior information, it is possible to extract useful information about model state and a subset of parameters, and even to make useful long-term forecasts, based on sparse and noisy observations. △ Less

Submitted 7 November, 2012; originally announced November 2012.

arXiv:1104.2703 [pdf, ps, other]

doi 10.1214/10-AOAS369

A spatial analysis of multivariate output from regional climate models

Authors: Stephan R. Sain, Reinhard Furrer, Noel Cressie

Abstract: Climate models have become an important tool in the study of climate and climate change, and ensemble experiments consisting of multiple climate-model runs are used in studying and quantifying the uncertainty in climate-model output. However, there are often only a limited number of model runs available for a particular experiment, and one of the statistical challenges is to characterize the distr… ▽ More Climate models have become an important tool in the study of climate and climate change, and ensemble experiments consisting of multiple climate-model runs are used in studying and quantifying the uncertainty in climate-model output. However, there are often only a limited number of model runs available for a particular experiment, and one of the statistical challenges is to characterize the distribution of the model output. To that end, we have developed a multivariate hierarchical approach, at the heart of which is a new representation of a multivariate Markov random field. This approach allows for flexible modeling of the multivariate spatial dependencies, including the cross-dependencies between variables. We demonstrate this statistical model on an ensemble arising from a regional-climate-model experiment over the western United States, and we focus on the projected change in seasonal temperature and precipitation over the next 50 years. △ Less

Submitted 14 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS369 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS369

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 1, 150-175

Showing 1–25 of 25 results for author: Cressie, N