Nonstationary Spatial Modeling of Massive Global Satellite Data
Authors:
Huang Huang,
Lewis R. Blake,
Matthias Katzfuss,
Dorit M. Hammerling
Abstract:
Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete,…
▽ More
Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data
Authors:
Huang Huang,
Lewis R. Blake,
Dorit M. Hammerling
Abstract:
The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA in C++ for both serial and parallel versions. In the parallel implementation, we use a hybrid parallelism that employs both distributed and shared memory compu…
▽ More
The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA in C++ for both serial and parallel versions. In the parallel implementation, we use a hybrid parallelism that employs both distributed and shared memory computing for communications between and within nodes by using the Message Passing Interface (MPI) and OpenMP, respectively. The performance of the serial code is compared between the C++ and MATLAB implementations over a small data set on a personal laptop. The C++ parallel program is further carefully studied under different configurations by applications to data sets from around a tenth of a million to 47 million observations. We show the practicality of this implementation by demonstrating that we can get quick inference for massive real-world data sets. The serial and parallel C++ code can be found at https://github.com/hhuang90.
△ Less
Submitted 5 May, 2019; v1 submitted 30 April, 2019;
originally announced May 2019.