-
Neural Methods for Amortised Inference
Authors:
Andrew Zammit-Mangion,
Matthew Sainsbury-Dale,
Raphaël Huser
Abstract:
Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, kee** pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex map**s between data and inferential targets. The resulting tools are am…
▽ More
Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, kee** pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex map**s between data and inferential targets. The resulting tools are amortised, in the sense that they allow rapid inference through fast feedforward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software, and include a simple illustration to showcase the wide array of tools available for amortised inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.
△ Less
Submitted 26 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Statistics of Extremes for Neuroscience
Authors:
Paolo V. Redondo,
Matheus B. Guerrero,
Raphaël Huser,
Hernando Ombao
Abstract:
This chapter illustrates how tools from univariate and multivariate statistics of extremes can complement classical methods used to study brain signals and enhance the understanding of brain activity and connectivity during specific cognitive tasks or abnormal episodes, such as an epileptic seizure.
This chapter illustrates how tools from univariate and multivariate statistics of extremes can complement classical methods used to study brain signals and enhance the understanding of brain activity and connectivity during specific cognitive tasks or abnormal episodes, such as an epileptic seizure.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Statistics of extremes for natural hazards: landslides and earthquakes
Authors:
Rishikesh Yadav,
Luigi Lombardo,
Raphaël Huser
Abstract:
In this chapter, we illustrate the use of split bulk-tail models and subasymptotic models motivated by extreme-value theory in the context of hazard assessment for earthquake-induced landslides. A spatial joint areal model is presented for modeling both landslides counts and landslide sizes, paying particular attention to extreme landslides, which are the most devastating ones.
In this chapter, we illustrate the use of split bulk-tail models and subasymptotic models motivated by extreme-value theory in the context of hazard assessment for earthquake-induced landslides. A spatial joint areal model is presented for modeling both landslides counts and landslide sizes, paying particular attention to extreme landslides, which are the most devastating ones.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Extreme quantile regression with deep learning
Authors:
Jordan Richards,
Raphaël Huser
Abstract:
Estimation of extreme conditional quantiles is often required for risk assessment of natural hazards in climate and geo-environmental sciences and for quantitative risk management in statistical finance, econometrics, and actuarial sciences. Interest often lies in extrapolating to quantile levels that exceed any past observations. Therefore, it is crucial to use a statistical framework that is wel…
▽ More
Estimation of extreme conditional quantiles is often required for risk assessment of natural hazards in climate and geo-environmental sciences and for quantitative risk management in statistical finance, econometrics, and actuarial sciences. Interest often lies in extrapolating to quantile levels that exceed any past observations. Therefore, it is crucial to use a statistical framework that is well-adapted and especially designed for this purpose, and here extreme-value theory plays a key role. This chapter reviews how extreme quantile regression may be performed using theoretically-justified models, and how modern deep learning approaches can be harnessed in this context to enhance the model's performance in complex high-dimensional settings. The power of deep learning combined with the rigor of theoretically-justified extreme-value methods opens the door to efficient extreme quantile regression, in cases where both the number of covariates and the quantile level of interest can be simultaneously ``extreme''.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Modeling of spatial extremes in environmental data science: Time to move away from max-stable processes
Authors:
Raphaël Huser,
Thomas Opitz,
Jennifer Wadsworth
Abstract:
Environmental data science for spatial extremes has traditionally relied heavily on max-stable processes. Even though the popularity of these models has perhaps peaked with statisticians, they are still perceived and considered as the `state-of-the-art' in many applied fields. However, while the asymptotic theory supporting the use of max-stable processes is mathematically rigorous and comprehensi…
▽ More
Environmental data science for spatial extremes has traditionally relied heavily on max-stable processes. Even though the popularity of these models has perhaps peaked with statisticians, they are still perceived and considered as the `state-of-the-art' in many applied fields. However, while the asymptotic theory supporting the use of max-stable processes is mathematically rigorous and comprehensive, we think that it has also been overused, if not misused, in environmental applications, to the detriment of more purposeful and meticulously validated models. In this paper, we review the main limitations of max-stable process models, and strongly argue against their systematic use in environmental studies. Alternative solutions based on more flexible frameworks using the exceedances of variables above appropriately chosen high thresholds are discussed, and an outlook on future research is given, highlighting recommendations moving forward and the opportunities offered by hybridizing machine learning with extreme-value statistics.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
At the junction between deep learning and statistics of extremes: formalizing the landslide hazard definition
Authors:
Ashok Dahal,
Raphaël Huser,
Luigi Lombardo
Abstract:
The most adopted definition of landslide hazard combines spatial information about landslide location (susceptibility), threat (intensity), and frequency (return period). Only the first two elements are usually considered and estimated when working over vast areas. Even then, separate models constitute the standard, with frequency being rarely investigated. Frequency and intensity are intertwined…
▽ More
The most adopted definition of landslide hazard combines spatial information about landslide location (susceptibility), threat (intensity), and frequency (return period). Only the first two elements are usually considered and estimated when working over vast areas. Even then, separate models constitute the standard, with frequency being rarely investigated. Frequency and intensity are intertwined and depend on each other because larger events occur less frequently and vice versa. However, due to the lack of multi-temporal inventories and joint statistical models, modelling such properties via a unified hazard model has always been challenging and has yet to be attempted. Here, we develop a unified model to estimate landslide hazard at the slope unit level to address such gaps. We employed deep learning, combined with a model motivated by extreme-value theory to analyse an inventory of 30 years of observed rainfall-triggered landslides in Nepal and assess landslide hazard for multiple return periods. We also use our model to further explore landslide hazard for the same return periods under different climate change scenarios up to the end of the century. Our results show that the proposed model performs excellently and can be used to model landslide hazard in a unified manner. Geomorphologically, we find that under both climate change scenarios (SSP245 and SSP885), landslide hazard is likely to increase up to two times on average in the lower Himalayan regions while remaining the same in the middle Himalayan region whilst decreasing slightly in the upper Himalayan region areas.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Max-convolution processes with random shape indicator kernels
Authors:
Pavel Krupskii,
Raphaël Huser
Abstract:
In this paper, we introduce a new class of models for spatial data obtained from max-convolution processes based on indicator kernels with random shape. We show that this class of models have appealing dependence properties including tail dependence at short distances and independence at long distances. We further consider max-convolutions between such processes and processes with tail independenc…
▽ More
In this paper, we introduce a new class of models for spatial data obtained from max-convolution processes based on indicator kernels with random shape. We show that this class of models have appealing dependence properties including tail dependence at short distances and independence at long distances. We further consider max-convolutions between such processes and processes with tail independence, in order to separately control the bulk and tail dependence behaviors, and to increase flexibility of the model at longer distances, in particular, to capture intermediate tail dependence. We show how parameters can be estimated using a weighted pairwise likelihood approach, and we conduct an extensive simulation study to show that the proposed inference approach is feasible in high dimensions and it yields accurate parameter estimates in most cases. We apply the proposed methodology to analyse daily temperature maxima measured at 100 monitoring stations in the state of Oklahoma, US. Our results indicate that our proposed model provides a good fit to the data, and that it captures both the bulk and the tail dependence structures accurately.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
A Neural Network-Based Approach to Normality Testing for Dependent Data
Authors:
Minwoo Kim,
Marc G Genton,
Raphael Huser,
Stefano Castruccio
Abstract:
There is a wide availability of methods for testing normality under the assumption of independent and identically distributed data. When data are dependent in space and/or time, however, assessing and testing the marginal behavior is considerably more challenging, as the marginal behavior is impacted by the degree of dependence. We propose a new approach to assess normality for dependent data by n…
▽ More
There is a wide availability of methods for testing normality under the assumption of independent and identically distributed data. When data are dependent in space and/or time, however, assessing and testing the marginal behavior is considerably more challenging, as the marginal behavior is impacted by the degree of dependence. We propose a new approach to assess normality for dependent data by non-linearly incorporating existing statistics from normality tests as well as sample moments such as skewness and kurtosis through a neural network. We calibrate (deep) neural networks by simulated normal and non-normal data with a wide range of dependence structures and we determine the probability of rejecting the null hypothesis. We compare several approaches for normality tests and demonstrate the superiority of our method in terms of statistical power through an extensive simulation study. A real world application to global temperature data further demonstrates how the degree of spatio-temporal aggregation affects the marginal normality in the data.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Neural Bayes Estimators for Irregular Spatial Data using Graph Neural Networks
Authors:
Matthew Sainsbury-Dale,
Andrew Zammit-Mangion,
Jordan Richards,
Raphaël Huser
Abstract:
Neural Bayes estimators are neural networks that approximate Bayes estimators in a fast and likelihood-free manner. Although they are appealing to use with spatial models, where estimation is often a computational bottleneck, neural Bayes estimators in spatial applications have, to date, been restricted to data collected over a regular grid. These estimators are also currently dependent on a presc…
▽ More
Neural Bayes estimators are neural networks that approximate Bayes estimators in a fast and likelihood-free manner. Although they are appealing to use with spatial models, where estimation is often a computational bottleneck, neural Bayes estimators in spatial applications have, to date, been restricted to data collected over a regular grid. These estimators are also currently dependent on a prescribed set of spatial locations, which means that the neural network needs to be re-trained for new data sets; this renders them impractical in many applications and impedes their widespread adoption. In this work, we employ graph neural networks to tackle the important problem of parameter point estimation from data collected over arbitrary spatial locations. In addition to extending neural Bayes estimation to irregular spatial data, our architecture leads to substantial computational benefits, since the estimator can be used with any configuration or number of locations and independent replicates, thus amortising the cost of training for a given spatial model. We also facilitate fast uncertainty quantification by training an accompanying neural Bayes estimator that approximates a set of marginal posterior quantiles. We illustrate our methodology on Gaussian and max-stable processes. Finally, we showcase our methodology on a data set of global sea-surface temperature, where we estimate the parameters of a Gaussian process model in 2161 spatial regions, each containing thousands of irregularly-spaced data points, in just a few minutes with a single graphics processing unit.
△ Less
Submitted 13 June, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Deep graphical regression for jointly moderate and extreme Australian wildfires
Authors:
Daniela Cisneros,
Jordan Richards,
Ashok Dahal,
Luigi Lombardo,
Raphaël Huser
Abstract:
Recent wildfires in Australia have led to considerable economic loss and property destruction, and there is increasing concern that climate change may exacerbate their intensity, duration, and frequency. Hazard quantification for extreme wildfires is an important component of wildfire management, as it facilitates efficient resource distribution, adverse effect mitigation, and recovery efforts. Ho…
▽ More
Recent wildfires in Australia have led to considerable economic loss and property destruction, and there is increasing concern that climate change may exacerbate their intensity, duration, and frequency. Hazard quantification for extreme wildfires is an important component of wildfire management, as it facilitates efficient resource distribution, adverse effect mitigation, and recovery efforts. However, although extreme wildfires are typically the most impactful, both small and moderate fires can still be devastating to local communities and ecosystems. Therefore, it is imperative to develop robust statistical methods to reliably model the full distribution of wildfire spread. We do so for a novel dataset of Australian wildfires from 1999 to 2019, and analyse monthly spread over areas approximately corresponding to Statistical Areas Level~1 and~2 (SA1/SA2) regions. Given the complex nature of wildfire ignition and spread, we exploit recent advances in statistical deep learning and extreme value theory to construct a parametric regression model using graph convolutional neural networks and the extended generalized Pareto distribution, which allows us to model wildfire spread observed on an irregular spatial domain. We highlight the efficacy of our newly proposed model and perform a wildfire hazard assessment for Australia and population-dense communities, namely Tasmania, Sydney, Melbourne, and Perth.
△ Less
Submitted 11 January, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Spatial wildfire risk modeling using mixtures of tree-based multivariate Pareto distributions
Authors:
Daniela Cisneros,
Arnab Hazra,
Raphaël Huser
Abstract:
Wildfires pose a severe threat to the ecosystem and economy, and risk assessment is typically based on fire danger indices such as the McArthur Forest Fire Danger Index (FFDI) used in Australia. Studying the joint tail dependence structure of high-resolution spatial FFDI data is thus crucial for estimating current and future extreme wildfire risk. However, existing likelihood-based inference appro…
▽ More
Wildfires pose a severe threat to the ecosystem and economy, and risk assessment is typically based on fire danger indices such as the McArthur Forest Fire Danger Index (FFDI) used in Australia. Studying the joint tail dependence structure of high-resolution spatial FFDI data is thus crucial for estimating current and future extreme wildfire risk. However, existing likelihood-based inference approaches are computationally prohibitive in high dimensions due to the need to censor observations in the bulk of the distribution. To address this, we construct models for spatial FFDI extremes by leveraging the sparse conditional independence structure of Hüsler--Reiss-type generalized Pareto processes defined on trees. These models allow for a simplified likelihood function that is computationally efficient. Our framework involves a mixture of tree-based multivariate Pareto distributions with randomly generated tree structures, resulting in a flexible model that can capture nonstationary spatial dependence structures. We fit the model to summer FFDI data from different spatial clusters in Mainland Australia and 14 decadal windows between 1999--2022 to study local spatiotemporal variability with respect to the magnitude and extent of extreme wildfires. Our results demonstrate that our proposed method fits the margins and spatial tail dependence structure adequately, and is helpful to provide extreme wildfire risk measures.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Extremal Dependence of Moving Average Processes Driven by Exponential-Tailed Lévy Noise
Authors:
Zhongwei Zhang,
David Bolin,
Sebastian Engelke,
Raphaël Huser
Abstract:
Moving average processes driven by exponential-tailed Lévy noise are important extensions of their Gaussian counterparts in order to capture deviations from Gaussianity, more flexible dependence structures, and sample paths with jumps. Popular examples include non-Gaussian Ornstein--Uhlenbeck processes and type G Matérn stochastic partial differential equation random fields. This paper is concerne…
▽ More
Moving average processes driven by exponential-tailed Lévy noise are important extensions of their Gaussian counterparts in order to capture deviations from Gaussianity, more flexible dependence structures, and sample paths with jumps. Popular examples include non-Gaussian Ornstein--Uhlenbeck processes and type G Matérn stochastic partial differential equation random fields. This paper is concerned with the open problem of determining their extremal dependence structure. We leverage the fact that such processes admit approximations on grids or triangulations that are used in practice for efficient simulations and inference. These approximations can be expressed as special cases of a class of linear transformations of independent, exponential-tailed random variables, that bridge asymptotic dependence and independence in a novel, tractable way. This result is of independent interest since models that can capture both extremal dependence regimes are scarce and the construction of such flexible models is an active area of research. This new fundamental result allows us to show that the integral approximation of general moving average processes with exponential-tailed Lévy noise is asymptotically independent when the mesh is fine enough. Under mild assumptions on the kernel function we also derive the limiting residual tail dependence function. For the popular exponential-tailed Ornstein--Uhlenbeck process we prove that it is asymptotically independent, but with a different residual tail dependence function than its Gaussian counterpart. Our results are illustrated through simulation studies.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Fast spatial simulation of extreme high-resolution radar precipitation data using INLA
Authors:
Silius M. Vandeskog,
Raphaël Huser,
Oddbjørn Bruland,
Sara Martino
Abstract:
Aiming to deliver improved precipitation simulations for hydrological impact assessment studies, we develop a methodology for modelling and simulating high-dimensional spatial precipitation extremes, focusing on both their marginal distributions and tail dependence structures. Tail dependence is a crucial property for assessing the consequences of an extreme precipitation event, yet most stochasti…
▽ More
Aiming to deliver improved precipitation simulations for hydrological impact assessment studies, we develop a methodology for modelling and simulating high-dimensional spatial precipitation extremes, focusing on both their marginal distributions and tail dependence structures. Tail dependence is a crucial property for assessing the consequences of an extreme precipitation event, yet most stochastic weather generators do not attempt to capture this property. We model extreme precipitation using a latent Gaussian version of the spatial conditional extremes model. This requires data with Laplace marginal distributions, but precipitation distributions contain point masses at zero that complicate necessary standardisation procedures. We therefore employ two separate models, one for describing extremes of nonzero precipitation and one for describing the probability of precipitation occurrence. Extreme precipitation is simulated by combining simulations from the two models. Nonzero precipitation marginals are modelled using latent Gaussian models with gamma and generalised Pareto likelihoods, and four different precipitation occurrence models are investigated. Fast inference is achieved using integrated nested Laplace approximations (INLA). We model and simulate spatial precipitation extremes in Central Norway, using high-density radar data. Inference on a 6000-dimensional data set is achieved within hours, and the simulations capture the main trends of the observed precipitation well.
△ Less
Submitted 23 February, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Flexible and efficient spatial extremes emulation via variational autoencoders
Authors:
Likun Zhang,
Xiaoyu Ma,
Christopher K. Wikle,
Raphaël Huser
Abstract:
Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we aim to push the boundaries on computation and modeling of high-dimensional spatia…
▽ More
Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we aim to push the boundaries on computation and modeling of high-dimensional spatial extremes via integrating a new spatial extremes model that has flexible and non-stationary dependence properties in the encoding-decoding structure of a variational autoencoder called the XVAE. The XVAE can emulate spatial observations and produce outputs that have the same statistical properties as the inputs, especially in the tail. Our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while outperforming many spatial extremes models with a stationary dependence structure. Lastly, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We demonstrate how to use XVAE to identify regions susceptible to marine heatwaves under climate change and examine the spatial and temporal variability of the extremal dependence structure.
△ Less
Submitted 9 May, 2024; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Neural Bayes estimators for censored inference with peaks-over-threshold models
Authors:
Jordan Richards,
Matthew Sainsbury-Dale,
Andrew Zammit-Mangion,
Raphaël Huser
Abstract:
Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that {use data augment…
▽ More
Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that {use data augmentation techniques} to encode censoring information in the neural network {input}. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (${\rm PM}_{2.5}$) concentration over the whole of Saudi Arabia.
△ Less
Submitted 18 June, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Measuring Information Transfer Between Nodes in a Brain Network through Spectral Transfer Entropy
Authors:
Paolo Victor Redondo,
Raphael Huser,
Hernando Ombao
Abstract:
Brain connectivity reflects how different regions of the brain interact during performance of a cognitive task. In studying brain signals such as electroencephalograms (EEG), this may be explored via an information-theoretic causal measure, called transfer entropy (TE), which does not impose any distributional assumption on the variables and covers any form of relationship (beyond linear) between…
▽ More
Brain connectivity reflects how different regions of the brain interact during performance of a cognitive task. In studying brain signals such as electroencephalograms (EEG), this may be explored via an information-theoretic causal measure, called transfer entropy (TE), which does not impose any distributional assumption on the variables and covers any form of relationship (beyond linear) between them. To improve utility of TE in brain signal analysis, we propose a novel methodology to capture cross-channel information transfer in the frequency domain. Specifically, we introduce a new causal measure, the spectral transfer entropy (STE), to quantify the magnitude and direction of information flow from a certain frequency-band oscillation of a channel to an oscillation of another channel. In contrast with previous works on TE in the frequency domain, we differentiate our work by considering an extreme value perspective that employs the maximum magnitude of filtered series within time blocks. The main advantages of our proposed approach is that it is robust to the inherent problems of linear filtering and allows adjustments for multiple comparisons to control family-wise error rate (FWER). Another novel contribution is a simple yet efficient estimation method based on the combination vine copulas and extreme value theory that enables estimates to capture zero (boundary point) without the need for bias adjustments. With the vine copula representation, a null copula model, which exhibits zero STE, is defined, making significance testing for STE straightforward through a standard resampling approach. Lastly, we illustrate the advantage of our proposed measure through some numerical experiments and provide interesting and novel findings on the analysis of EEG recordings linked to a visual task.
△ Less
Submitted 25 May, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Patterns in Spatio-Temporal Extremes
Authors:
Marco Oesting,
Raphaël Huser
Abstract:
In environmental science applications, extreme events frequently exhibit a complex spatio-temporal structure, which is difficult to describe flexibly and estimate in a computationally efficient way using state-of-art parametric extreme-value models. In this paper, we propose a computationally-cheap non-parametric approach to investigate the probability distribution of temporal clusters of spatial…
▽ More
In environmental science applications, extreme events frequently exhibit a complex spatio-temporal structure, which is difficult to describe flexibly and estimate in a computationally efficient way using state-of-art parametric extreme-value models. In this paper, we propose a computationally-cheap non-parametric approach to investigate the probability distribution of temporal clusters of spatial extremes, and study within-cluster patterns with respect to various characteristics. These include risk functionals describing the overall event magnitude, spatial risk measures such as the size of the affected area, and measures representing the location of the extreme event. Under the framework of functional regular variation, we verify the existence of the corresponding limit distributions as the considered events become increasingly extreme. Furthermore, we develop non-parametric estimators for the limiting expressions of interest and show their asymptotic normality under appropriate mixing conditions. Uncertainty is assessed using a multiplier block bootstrap. The finite-sample behavior of our estimators and the bootstrap scheme is demonstrated in a spatio-temporal simulated example. Our methodology is then applied to study the spatio-temporal dependence structure of high-dimensional sea surface temperature data for the southern Red Sea. Our analysis reveals new insights into the temporal persistence, and the complex hydrodynamic patterns of extreme sea temperature events in this region.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Club Exco: clustering brain extreme communities from multi-channel EEG data
Authors:
Matheus B. Guerrero,
Hernando Ombao,
Raphaël Huser
Abstract:
Current methods for clustering nodes over time in a brain network are determined by cross-dependence measures, which are computed from the entire range of values of the electroencephalogram (EEG) signals, from low to high amplitudes. We here developed the Club Exco method for clustering brain communities that exhibit synchronized extreme behaviors. To cluster multi-channel EEG data, Club-Exco uses…
▽ More
Current methods for clustering nodes over time in a brain network are determined by cross-dependence measures, which are computed from the entire range of values of the electroencephalogram (EEG) signals, from low to high amplitudes. We here developed the Club Exco method for clustering brain communities that exhibit synchronized extreme behaviors. To cluster multi-channel EEG data, Club-Exco uses a spherical $k$-means procedure applied to the ``pseudo-angles,'' derived from extreme absolute amplitudes of EEG signals. With this approach, a cluster center is considered an ``extremal prototype,'' revealing a community of EEG nodes sharing the same extreme behavior, a feature that traditional methods fail to identify. Hence, Club Exco serves as an exploratory tool to classify EEG channels into mutually asymptotically dependent or asymptotically independent groups. It provides insights into how the brain network organizes itself during an extreme event (e.g., an epileptic seizure) in contrast to a baseline state. We apply the Club Exco method to investigate temporal differences in EEG brain connectivity networks of a patient diagnosed with epilepsy, a chronic neurological disorder affecting more than 50 million people globally. Our extreme-value method reveals substantial differences in alpha (8--12 Hertz) oscillations across the brain network compared to coherence-based methods.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Spatial modeling and future projection of extreme precipitation extents
Authors:
Peng Zhong,
Manuela Brunner,
Thomas Opitz,
Raphaël Huser
Abstract:
Extreme precipitation events with large spatial extents may have more severe impacts than localized events as they can lead to widespread flooding. It is debated how climate change may affect the spatial extent of precipitation extremes, whose investigation often directly relies on simulations from climate models. Here, we use a different strategy to investigate how future changes in spatial exten…
▽ More
Extreme precipitation events with large spatial extents may have more severe impacts than localized events as they can lead to widespread flooding. It is debated how climate change may affect the spatial extent of precipitation extremes, whose investigation often directly relies on simulations from climate models. Here, we use a different strategy to investigate how future changes in spatial extents of precipitation extremes differ across climate zones and seasons in two river basins (Danube and Mississippi). We rely on observed precipitation extremes while exploiting a physics-based mean temperature covariate, which enables us to project future precipitation extents. We include the covariate into newly developed time-varying $r$-Pareto processes using a suitably chosen spatial aggregation functional $r$. This model captures temporal non-stationarity in the spatial dependence structure of precipitation extremes by linking it to the temperature covariate, which we derive from observations for model calibration and from debiased climate simulations (CMIP6) for projections. For both river basins, our results show negative correlation between the spatial extent and the temperature covariate for most of the rain season and an increasing trend in the margins, indicating a decrease in spatial precipitation extent in a warming climate during rain seasons as precipitation intensity increases locally.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning
Authors:
Jordan Richards,
Raphaël Huser,
Emanuele Bevacqua,
Jakob Zscheischler
Abstract:
Extreme wildfires are a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, we must identify the main drivers of extreme wildfires a…
▽ More
Extreme wildfires are a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, we must identify the main drivers of extreme wildfires and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. We analyse the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020, and identify high fire activity during this period in Algeria, Italy and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land cover usage, and orography. To model the complex relationships between the predictor variables and wildfires, we use a hybrid statistical deep-learning framework that can disentangle the effects of vapour-pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that whilst VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. To gain insights into the effect of climate trends on wildfires in the near future, we focus on August 2001 and perturb temperature according to its observed trends (median over Europe: +0.04K per year). We find that, on average over Europe, these trends lead to a relative increase of 17.1\% and 1.6\% in the expected frequency and severity, respectively, of wildfires in August 2001, with spatially non-uniform changes in both aspects.
△ Less
Submitted 5 June, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Partial Tail-Correlation Coefficient Applied to Extremal-Network Learning
Authors:
Yan Gong,
Peng Zhong,
Thomas Opitz,
Raphaël Huser
Abstract:
We propose a novel extremal dependence measure called the partial tail-correlation coefficient (PTCC), in analogy to the partial correlation coefficient in classical multivariate analysis. The construction of our new coefficient is based on the framework of multivariate regular variation and transformed-linear algebra operations. We show how this coefficient allows identifying pairs of variables t…
▽ More
We propose a novel extremal dependence measure called the partial tail-correlation coefficient (PTCC), in analogy to the partial correlation coefficient in classical multivariate analysis. The construction of our new coefficient is based on the framework of multivariate regular variation and transformed-linear algebra operations. We show how this coefficient allows identifying pairs of variables that have partially uncorrelated tails given the other variables in a random vector. Unlike other recently introduced conditional independence frameworks for extremes, our approach requires minimal modeling assumptions and can thus be used in exploratory analyses to learn the structure of extremal graphical models. Similarly to traditional Gaussian graphical models where edges correspond to the non-zero entries of the precision matrix, we can exploit classical inference methods for high-dimensional data, such as the graphical LASSO with Laplacian spectral constraints, to efficiently learn the extremal network structure via the PTCC. We apply our new method to study extreme risk networks in two different datasets (extreme river discharges and historical global currency exchange data) and show that we can extract meaningful extremal structures with meaningful domain-specific interpretations.
△ Less
Submitted 22 November, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Flexible Modeling of Nonstationary Extremal Dependence using Spatially-Fused LASSO and Ridge Penalties
Authors:
Xuanjie Shao,
Arnab Hazra,
Jordan Richards,
Raphaël Huser
Abstract:
Statistical modeling of a nonstationary spatial extremal dependence structure is challenging. Max-stable processes are common choices for modeling spatially-indexed block maxima, where an assumption of stationarity is usual to make inference feasible. However, this assumption is often unrealistic for data observed over a large or complex domain. We propose a computationally-efficient method for es…
▽ More
Statistical modeling of a nonstationary spatial extremal dependence structure is challenging. Max-stable processes are common choices for modeling spatially-indexed block maxima, where an assumption of stationarity is usual to make inference feasible. However, this assumption is often unrealistic for data observed over a large or complex domain. We propose a computationally-efficient method for estimating extremal dependence using a globally nonstationary, but locally-stationary, max-stable process by exploiting nonstationary kernel convolutions. We divide the spatial domain into a fine grid of subregions, assign each of them its own dependence parameters, and use LASSO ($L_1$) or ridge ($L_2$) penalties to obtain spatially-smooth parameter estimates. We then develop a novel data-driven algorithm to merge homogeneous neighboring subregions. The algorithm facilitates model parsimony and interpretability. To make our model suitable for high-dimensional data, we exploit a pairwise likelihood to draw inferences and discuss computational and statistical efficiency. An extensive simulation study demonstrates the superior performance of our proposed model and the subregion-merging algorithm over the approaches that either do not model nonstationarity or do not update the domain partition. We apply our proposed method to model monthly maximum temperatures at over 1400 sites in Nepal and the surrounding Himalayan and sub-Himalayan regions; we again observe significant improvements in model fit compared to a stationary process and a nonstationary process without subregion-merging. Furthermore, we demonstrate that the estimated merged partition is interpretable from a geographic perspective and leads to better model diagnostics by adequately reducing the number of subregion-specific parameters.
△ Less
Submitted 30 April, 2024; v1 submitted 11 October, 2022;
originally announced October 2022.
-
An Efficient Workflow for Modelling High-Dimensional Spatial Extremes
Authors:
Silius M. Vandeskog,
Sara Martino,
Raphaël Huser
Abstract:
A successful model for high-dimensional spatial extremes should, in principle, be able to describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. Furthermore, the model should allow for computationally tractable inference using inference methods that efficiently extract information from d…
▽ More
A successful model for high-dimensional spatial extremes should, in principle, be able to describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. Furthermore, the model should allow for computationally tractable inference using inference methods that efficiently extract information from data and that are robust to model misspecification. In this paper, we demonstrate how to fulfil all these requirements by develo** a comprehensive methodological workflow for efficient Bayesian modelling of high-dimensional spatial extremes using the spatial conditional extremes model while performing fast inference with R-INLA. We then propose a post hoc adjustment method that results in more robust inference by properly accounting for possible model misspecification. The developed methodology is applied for modelling extreme hourly precipitation from high-resolution radar data in Norway. Inference is computationally efficient, and the resulting model fit successfully captures the main trends in the extremal dependence structure of the data. Robustifying the model fit by adjusting for possible misspecification further improves model performance.
△ Less
Submitted 13 December, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Likelihood-Free Parameter Estimation with Neural Bayes Estimators
Authors:
Matthew Sainsbury-Dale,
Andrew Zammit-Mangion,
Raphaël Huser
Abstract:
Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software…
▽ More
Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.
△ Less
Submitted 4 October, 2023; v1 submitted 27 August, 2022;
originally announced August 2022.
-
Regression modelling of spatiotemporal extreme U.S. wildfires via partially-interpretable neural networks
Authors:
Jordan Richards,
Raphaël Huser
Abstract:
Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe, e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, requi…
▽ More
Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe, e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, require specification of parametric extreme value models within a regression framework. Classical approaches in this context utilise linear or additive relationships between predictor and response variables and suffer in either their predictive capabilities or computational efficiency; moreover, their simplicity is unlikely to capture the truly complex structures that lead to the creation of extreme wildfires. In this paper, we propose a new methodological framework for performing extreme quantile regression using artificial neutral networks, which are able to capture complex non-linear relationships and scale well to high-dimensional data. The "black box" nature of neural networks means that they lack the desirable trait of interpretability often favoured by practitioners; thus, we unify linear, and additive, regression methodology with deep learning to create partially-interpretable neural networks that can be used for statistical inference but retain high prediction accuracy. To complement this methodology, we further propose a novel point process model for extreme values which overcomes the finite lower-endpoint problem associated with the generalised extreme value class of distributions. Efficacy of our unified framework is illustrated on U.S. wildfire data with a high-dimensional predictor set and we illustrate vast improvements in predictive performance over linear and spline-based regression techniques.
△ Less
Submitted 7 March, 2024; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Functional-Coefficient Models for Multivariate Time Series in Designed Experiments: with Applications to Brain Signals
Authors:
Paolo Victor Redondo,
Raphaël Huser,
Hernando Ombao
Abstract:
To study the neurophysiological basis of attention deficit hyperactivity disorder (ADHD), clinicians use electroencephalography (EEG) which record neuronal electrical activity on the cortex. The most commonly-used metric in ADHD is the theta-to-beta spectral power ratio (TBR) that is based on a single-channel analysis. However, initial findings for this measure have not been replicated in other st…
▽ More
To study the neurophysiological basis of attention deficit hyperactivity disorder (ADHD), clinicians use electroencephalography (EEG) which record neuronal electrical activity on the cortex. The most commonly-used metric in ADHD is the theta-to-beta spectral power ratio (TBR) that is based on a single-channel analysis. However, initial findings for this measure have not been replicated in other studies. Thus, instead of focusing on single-channel spectral power, a novel model for investigating interactions (dependence) between channels in the entire network is proposed. Although dependence measures such as coherence and partial directed coherence (PDC) are well explored in studying brain connectivity, these measures only capture linear dependence. Moreover, in designed clinical experiments, these dependence measures are observed to vary across subjects even within a homogeneous group. To address these limitations, we propose the mixed-effects functional-coefficient autoregressive (MX-FAR) model which captures between-subject variation by incorporating subject-specific random effects. The advantages of the MX-FAR model are the following: (1.) it captures potential non-linear dependence between channels; (2.) it is nonparametric and hence flexible and robust to model mis-specification; (3.) it can capture differences between groups when they exist; (4.) it accounts for variation across subjects; (5.) the framework easily incorporates well-known inference methods from mixed-effects models; (6.) it can be generalized to accommodate various covariates and factors. Finally, we apply the proposed MX-FAR model to analyze multichannel EEG signals and report novel findings on altered brain functional networks in ADHD.
△ Less
Submitted 8 August, 2022; v1 submitted 30 July, 2022;
originally announced August 2022.
-
Flexible Modeling of Multivariate Spatial Extremes
Authors:
Yan Gong,
Raphaël Huser
Abstract:
We develop a novel multi-factor copula model for multivariate spatial extremes, which is designed to capture the different combinations of marginal and cross-extremal dependence structures within and across different spatial random fields. Our proposed model, which can be seen as a multi-factor copula model, can capture all possible distinct combinations of extremal dependence structures within ea…
▽ More
We develop a novel multi-factor copula model for multivariate spatial extremes, which is designed to capture the different combinations of marginal and cross-extremal dependence structures within and across different spatial random fields. Our proposed model, which can be seen as a multi-factor copula model, can capture all possible distinct combinations of extremal dependence structures within each individual spatial process while allowing flexible cross-process extremal dependence structures for both upper and lower tails. We show how to perform Bayesian inference for the proposed model using a Markov chain Monte Carlo algorithm based on carefully designed block proposals with an adaptive step size. In our real data application, we apply our model to study the upper and lower extremal dependence structures of the daily maximum air temperature (TMAX) and daily minimum air temperature (TMIN) from the state of Alabama in the southeastern United States. The fitted multivariate spatial model is found to provide a good fit in the lower and upper joint tails, both in terms of the spatial dependence structure within each individual process, as well as in terms of the cross-process dependence structure. Our results suggest that the TMAX and TMIN processes are quite strongly spatially dependent over the state of Alabama, and moderately cross-dependent. From a practical perspective, this implies that it may be worthwhile to model them jointly when interest lies in a computing spatial risk measures that involve both quantities.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Joint modeling of landslide counts and sizes using spatial marked point processes with sub-asymptotic mark distributions
Authors:
Rishikesh Yadav,
Raphaël Huser,
Thomas Opitz,
Luigi Lombardo
Abstract:
To accurately quantify landslide hazard in a region of Turkey, we develop new marked point process models within a Bayesian hierarchical framework for the joint prediction of landslide counts and sizes. To accommodate for the dominant role of the few largest landslides in aggregated sizes, we leverage mark distributions with strong justification from extreme-value theory, thus bridging the two bro…
▽ More
To accurately quantify landslide hazard in a region of Turkey, we develop new marked point process models within a Bayesian hierarchical framework for the joint prediction of landslide counts and sizes. To accommodate for the dominant role of the few largest landslides in aggregated sizes, we leverage mark distributions with strong justification from extreme-value theory, thus bridging the two broad areas of statistics of extremes and marked point patterns. At the data level, we assume a Poisson distribution for landslide counts, while we compare different "sub-asymptotic" distributions for landslide sizes to flexibly model their upper and lower tails. At the latent level, Poisson intensities and the median of the size distribution vary spatially in terms of fixed and random effects, with shared spatial components capturing cross-correlation between landslide counts and sizes. We robustly model spatial dependence using intrinsic conditional autoregressive priors. Our novel models are fitted efficiently using a customized adaptive Markov chain Monte Carlo algorithm. We show that, for our dataset, sub-asymptotic mark distributions provide improved predictions of large landslide sizes compared to more traditional choices. To showcase the benefits of joint occurrence-size models and illustrate their usefulness for risk assessment, we map landslide hazard along major roads.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Vecchia Likelihood Approximation for Accurate and Fast Inference in Intractable Spatial Extremes Models
Authors:
Raphaël Huser,
Michael L. Stein,
Peng Zhong
Abstract:
Max-stable processes are the most popular models for high-impact spatial extreme events, as they arise as the only possible limits of spatially-indexed block maxima. However, likelihood inference for such models suffers severely from the curse of dimensionality, since the likelihood function involves a combinatorially exploding number of terms. In this paper, we propose using the Vecchia approxima…
▽ More
Max-stable processes are the most popular models for high-impact spatial extreme events, as they arise as the only possible limits of spatially-indexed block maxima. However, likelihood inference for such models suffers severely from the curse of dimensionality, since the likelihood function involves a combinatorially exploding number of terms. In this paper, we propose using the Vecchia approximation, which conveniently decomposes the full joint density into a linear number of low-dimensional conditional density terms based on well-chosen conditioning sets designed to improve and accelerate inference in high dimensions. Theoretical asymptotic relative efficiencies in the Gaussian setting and simulation experiments in the max-stable setting show significant efficiency gains and computational savings using the Vecchia likelihood approximation method compared to traditional composite likelihoods. Our application to extreme sea surface temperature data at more than a thousand sites across the entire Red Sea further demonstrates the superiority of the Vecchia likelihood approximation for fitting complex models with intractable likelihoods, delivering significantly better results than traditional composite likelihoods, and accurately capturing the extremal dependence structure at lower computational cost.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Joint Modeling and Prediction of Massive Spatio-Temporal Wildfire Count and Burnt Area Data with the INLA-SPDE Approach
Authors:
Zhongwei Zhang,
Elias Krainski,
Peng Zhong,
Håvard Rue,
Raphaël Huser
Abstract:
This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differenti…
▽ More
This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLA-SPDE) approach. In the first part, a binary non-stationary spatio-temporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a non-stationary model that is based on log-Gaussian Cox processes for positive wildfire count data, and a non-stationary log-Gaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatio-temporal random effect. Our two-part modeling approach performs well in terms of the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes
Authors:
Daniela Cisneros,
Yan Gong,
Rishikesh Yadav,
Arnab Hazra,
Raphael Huser
Abstract:
Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage high-dimensional bivariate sparse spatial model for zero-inflated data, which is…
▽ More
Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage high-dimensional bivariate sparse spatial model for zero-inflated data, which is developed using stochastic partial differential equations(SPDE). In Stage 1, the observations are categorized in zero/nonzero categories and are modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, before modeling the positive observations using spatially-varying coefficients, smoothed parameter surfaces are obtained from empirical estimates using fixed rank kriging. This approximate Bayesian method inference was employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, the standardized log-transformed positive observations from the second stage are further modeled using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus in Stage 4, the predicted values are rectified using Random Forests. The posterior inference is drawn for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. A cross-validation scheme is then created for the artificially generated gaps, and the EVA 2021 prediction scores of the proposed model are compared to those obtained using certain natural competitors.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Efficient Modeling of Spatial Extremes over Large Geographical Domains
Authors:
Arnab Hazra,
Raphaël Huser,
David Bolin
Abstract:
Various natural phenomena exhibit spatial extremal dependence at short spatial distances. However, existing models proposed in the spatial extremes literature often assume that extremal dependence persists across the entire domain. This is a strong limitation when modeling extremes over large geographical domains, and yet it has been mostly overlooked in the literature. We here develop a more real…
▽ More
Various natural phenomena exhibit spatial extremal dependence at short spatial distances. However, existing models proposed in the spatial extremes literature often assume that extremal dependence persists across the entire domain. This is a strong limitation when modeling extremes over large geographical domains, and yet it has been mostly overlooked in the literature. We here develop a more realistic Bayesian framework based on a novel Gaussian scale mixture model, with the Gaussian process component defined by a stochastic partial differential equation yielding a sparse precision matrix, and the random scale component modeled as a low-rank Pareto-tailed or Weibull-tailed spatial process determined by compactly-supported basis functions. We show that our proposed model is approximately tail-stationary and that it can capture a wide range of extremal dependence structures. Its inherently sparse structure allows fast Bayesian computations in high spatial dimensions based on a customized Markov chain Monte Carlo algorithm prioritizing calibration in the tail. We fit our model to analyze heavy monsoon rainfall data in Bangladesh. Our study shows that our model outperforms natural competitors and that it fits precipitation extremes well. We finally use the fitted model to draw inference on long-term return levels for marginal precipitation and spatial aggregates.
△ Less
Submitted 30 April, 2024; v1 submitted 19 December, 2021;
originally announced December 2021.
-
A flexible Bayesian hierarchical modeling framework for spatially dependent peaks-over-threshold data
Authors:
Rishikesh Yadav,
Raphaël Huser,
Thomas Opitz
Abstract:
In this work, we develop a constructive modeling framework for extreme threshold exceedances in repeated observations of spatial fields, based on general product mixtures of random fields possessing light or heavy-tailed margins and various spatial dependence characteristics, which are suitably designed to provide high flexibility in the tail and at sub-asymptotic levels. Our proposed model is aki…
▽ More
In this work, we develop a constructive modeling framework for extreme threshold exceedances in repeated observations of spatial fields, based on general product mixtures of random fields possessing light or heavy-tailed margins and various spatial dependence characteristics, which are suitably designed to provide high flexibility in the tail and at sub-asymptotic levels. Our proposed model is akin to a recently proposed Gamma-Gamma model using a ratio of processes with Gamma marginal distributions, but it possesses a higher degree of flexibility in its joint tail structure, capturing strong dependence more easily. We focus on constructions with the following three product factors, whose different roles ensure their statistical identifiability: a heavy-tailed spatially-dependent field, a lighter-tailed spatially-constant field, and another lighter-tailed spatially-independent field. Thanks to the model's hierarchical formulation, inference may be conveniently performed based on Markov chain Monte Carlo methods. We leverage the Metropolis adjusted Langevin algorithm (MALA) with random block proposals for latent variables, as well as the stochastic gradient Langevin dynamics (SGLD) algorithm for hyperparameters, in order to fit our proposed model very efficiently in relatively high spatio-temporal dimensions, while simultaneously censoring non-threshold exceedances and performing spatial prediction at multiple sites. The censoring mechanism is applied to the spatially independent component, such that only univariate cumulative distribution functions have to be evaluated. We explore the theoretical properties of the novel model, and illustrate the proposed methodology by simulation and application to daily precipitation data from North-Eastern Spain measured at about 100 stations over the period 2011-2020.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Latent Gaussian Models for High-Dimensional Spatial Extremes
Authors:
Arnab Hazra,
Raphaël Huser,
Árni V. Jóhannesson
Abstract:
In this chapter, we show how to efficiently model high-dimensional extreme peaks-over-threshold events over space in complex non-stationary settings, using extended latent Gaussian Models (LGMs), and how to exploit the fitted model in practice for the computation of long-term return levels. The extended LGM framework assumes that the data follow a specific parametric distribution, whose unknown pa…
▽ More
In this chapter, we show how to efficiently model high-dimensional extreme peaks-over-threshold events over space in complex non-stationary settings, using extended latent Gaussian Models (LGMs), and how to exploit the fitted model in practice for the computation of long-term return levels. The extended LGM framework assumes that the data follow a specific parametric distribution, whose unknown parameters are transformed using a multivariate link function and are then further modeled at the latent level in terms of fixed and random effects that have a joint Gaussian distribution. In the extremal context, we here assume that the data level distribution is described in terms of a Poisson point process likelihood, motivated by asymptotic extreme-value theory, and which conveniently exploits information from all threshold exceedances. This contrasts with the more common data-wasteful approach based on block maxima, which are typically modeled with the generalized extreme-value (GEV) distribution. When conditional independence can be assumed at the data level and latent random effects have a sparse probabilistic structure, fast approximate Bayesian inference becomes possible in very high dimensions, and we here present the recently proposed inference approach called "Max-and-Smooth", which provides exceptional speed-up compared to alternative methods. The proposed methodology is illustrated by application to satellite-derived precipitation data over Saudi Arabia, obtained from the Tropical Rainfall Measuring Mission, with 2738 grid cells and about 20 million spatio-temporal observations in total. Our fitted model captures the spatial variability of extreme precipitation satisfactorily and our results show that the most intense precipitation events are expected near the south-western part of Saudi Arabia, along the Red Sea coastline.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Practical strategies for GEV-based regression models for extremes
Authors:
Daniela Castro-Camilo,
Raphaël Huser,
Håvard Rue
Abstract:
The generalised extreme value (GEV) distribution is a three parameter family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter $ξ$ is zero, the GEV distribution has unbounded support, whereas if $ξ$ is positive, the limiting distribution is heavy-tailed with infinite upper endpoi…
▽ More
The generalised extreme value (GEV) distribution is a three parameter family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter $ξ$ is zero, the GEV distribution has unbounded support, whereas if $ξ$ is positive, the limiting distribution is heavy-tailed with infinite upper endpoint but finite lower endpoint. In practical applications, we assume that the GEV family is a reasonable approximation for the distribution of maxima over blocks, and we fit it accordingly. This implies that GEV properties, such as finite lower endpoint in the case $ξ>0$, are inherited by the finite-sample maxima, which might not have bounded support. This is particularly problematic when predicting extreme observations based on multiple and interacting covariates. To tackle this usually overlooked issue, we propose a blended GEV distribution, which smoothly combines the left tail of a Gumbel distribution (GEV with $ξ=0$) with the right tail of a Fréchet distribution (GEV with $ξ>0$) and, therefore, has unbounded support. Using a Bayesian framework, we reparametrise the GEV distribution to offer a more natural interpretation of the (possibly covariate-dependent) model parameters. Independent priors over the new location and spread parameters induce a joint prior distribution for the original location and scale parameters. We introduce the concept of property-preserving penalised complexity (P$^3$C) priors and apply it to the shape parameter to preserve first and second moments. We illustrate our methods with an application to NO$_2$ pollution levels in California, which reveals the robustness of the bGEV distribution, as well as the suitability of the new parametrisation and the P$^3$C prior framework.
△ Less
Submitted 7 May, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Modeling spatial extremes using normal mean-variance mixtures
Authors:
Zhongwei Zhang,
Raphaël Huser,
Thomas Opitz,
Jennifer L. Wadsworth
Abstract:
Classical models for multivariate or spatial extremes are mainly based upon the asymptotically justified max-stable or generalized Pareto processes. These models are suitable when asymptotic dependence is present, i.e., the joint tail decays at the same rate as the marginal tail. However, recent environmental data applications suggest that asymptotic independence is equally important and, unfortun…
▽ More
Classical models for multivariate or spatial extremes are mainly based upon the asymptotically justified max-stable or generalized Pareto processes. These models are suitable when asymptotic dependence is present, i.e., the joint tail decays at the same rate as the marginal tail. However, recent environmental data applications suggest that asymptotic independence is equally important and, unfortunately, existing spatial models in this setting that are both flexible and can be fitted efficiently are scarce. Here, we propose a new spatial copula model based on the generalized hyperbolic distribution, which is a specific normal mean-variance mixture and is very popular in financial modeling. The tail properties of this distribution have been studied in the literature, but with contradictory results. It turns out that the proofs from the literature contain mistakes. We here give a corrected theoretical description of its tail dependence structure and then exploit the model to analyze a simulated dataset from the inverted Brown-Resnick process, hindcast significant wave height data in the North Sea, and wind gust data in the state of Oklahoma, USA. We demonstrate that our proposed model is flexible enough to capture the dependence structure not only in the tail but also in the bulk.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Exact Simulation of Max-Infinitely Divisible Processes
Authors:
Peng Zhong,
Raphaël Huser,
Thomas Opitz
Abstract:
Max-infinitely divisible (max-id) processes play a central role in extreme-value theory and include the subclass of all max-stable processes. They allow for a constructive representation based on the pointwise maximum of random functions drawn from a Poisson point process defined on a suitable function space. Simulating from a max-id process is often difficult due to its complex stochastic structu…
▽ More
Max-infinitely divisible (max-id) processes play a central role in extreme-value theory and include the subclass of all max-stable processes. They allow for a constructive representation based on the pointwise maximum of random functions drawn from a Poisson point process defined on a suitable function space. Simulating from a max-id process is often difficult due to its complex stochastic structure, while calculating its joint density in high dimensions is often numerically infeasible. Therefore, exact and efficient simulation techniques for max-id processes are useful tools for studying the characteristics of the process and for drawing statistical inferences. Inspired by the simulation algorithms for max-stable processes, theory and algorithms to generalize simulation approaches tailored for certain flexible (existing or new) classes of max-id processes are presented. Efficient simulation for a large class of models can be achieved by implementing an adaptive rejection sampling scheme to sidestep a numerical integration step in the algorithm. The results of a simulation study highlight that our simulation algorithm works as expected and is highly accurate and efficient, such that it clearly outperforms customary approximate sampling schemes. As a by-product, new max-id models, which can be represented as pointwise maxima of general location-scale mixtures and possess flexible tail dependence structures capturing a wide range of asymptotic dependence scenarios, are also developed.
△ Less
Submitted 28 February, 2022; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Modeling spatial tail dependence with Cauchy convolution processes
Authors:
Pavel Krupskii,
Raphaël Huser
Abstract:
We study the class of dependence models for spatial data obtained from Cauchy convolution processes based on different types of kernel functions. We show that the resulting spatial processes have appealing tail dependence properties, such as tail dependence at short distances and independence at long distances with suitable kernel functions. We derive the extreme-value limits of these processes, s…
▽ More
We study the class of dependence models for spatial data obtained from Cauchy convolution processes based on different types of kernel functions. We show that the resulting spatial processes have appealing tail dependence properties, such as tail dependence at short distances and independence at long distances with suitable kernel functions. We derive the extreme-value limits of these processes, study their smoothness properties, and detail some interesting special cases. To get higher flexibility at sub-asymptotic levels and separately control the bulk and the tail dependence properties, we further propose spatial models constructed by mixing a Cauchy convolution process with a Gaussian process. We demonstrate that this framework indeed provides a rich class of models for the joint modeling of the bulk and the tail behaviors. Our proposed inference approach relies on matching model-based and empirical summary statistics, and an extensive simulation study shows that it yields accurate estimates. We demonstrate our new methodology by application to a temperature dataset measured at 97 monitoring stations in the state of Oklahoma, US. Our results indicate that our proposed model provides a very good fit to the data, and that it captures both the bulk and the tail dependence structures accurately.
△ Less
Submitted 8 June, 2022; v1 submitted 14 February, 2021;
originally announced February 2021.
-
Conex-Connect: Learning Patterns in Extremal Brain Connectivity From Multi-Channel EEG Data
Authors:
Matheus B. Guerrero,
Raphaël Huser,
Hernando Ombao
Abstract:
Epilepsy is a chronic neurological disorder affecting more than 50 million people globally. An epileptic seizure acts like a temporary shock to the neuronal system, disrupting normal electrical activity in the brain. Epilepsy is frequently diagnosed with electroencephalograms (EEGs). Current methods study the time-varying spectra and coherence but do not directly model changes in extreme behavior.…
▽ More
Epilepsy is a chronic neurological disorder affecting more than 50 million people globally. An epileptic seizure acts like a temporary shock to the neuronal system, disrupting normal electrical activity in the brain. Epilepsy is frequently diagnosed with electroencephalograms (EEGs). Current methods study the time-varying spectra and coherence but do not directly model changes in extreme behavior. Thus, we propose a new approach to characterize brain connectivity based on the joint tail behavior of the EEGs. Our proposed method, the conditional extremal dependence for brain connectivity (Conex-Connect), is a pioneering approach that links the association between extreme values of higher oscillations at a reference channel with the other brain network channels. Using the Conex-Connect method, we discover changes in the extremal dependence driven by the activity at the foci of the epileptic seizure. Our model-based approach reveals that, pre-seizure, the dependence is notably stable for all channels when conditioning on extreme values of the focal seizure area. Post-seizure, by contrast, the dependence between channels is weaker, and dependence patterns are more "chaotic". Moreover, in terms of spectral decomposition, we find that high values of the high-frequency Gamma-band are the most relevant features to explain the conditional extremal dependence of brain connectivity.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Tractable Bayes of Skew-Elliptical Link Models for Correlated Binary Data
Authors:
Zhongwei Zhang,
Reinaldo B. Arellano-Valle,
Marc G. Genton,
Raphaël Huser
Abstract:
Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate…
▽ More
Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate skew-elliptical link model for correlated binary responses, which includes the multivariate probit model as a special case. Furthermore, we perform Bayesian inference for this new model and prove that the regression coefficients have a closed-form unified skew-elliptical posterior. The new methodology is illustrated by application to COVID-19 pandemic data from three different counties of the state of California, USA. By jointly modeling extreme spikes in weekly new cases, our results show that the spatial dependence cannot be neglected. Furthermore, the results also show that the skewed latent structure of our proposed model improves the flexibility of the multivariate probit model and provides better fit to our highly imbalanced dataset.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Advances in Statistical Modeling of Spatial Extremes
Authors:
Raphaël Huser,
Jennifer L. Wadsworth
Abstract:
The classical modeling of spatial extremes relies on asymptotic models (i.e., max-stable processes or $r$-Pareto processes) for block maxima or peaks over high thresholds, respectively. However, at finite levels, empirical evidence often suggests that such asymptotic models are too rigidly constrained, and that they do not adequately capture the frequent situation where more severe events tend to…
▽ More
The classical modeling of spatial extremes relies on asymptotic models (i.e., max-stable processes or $r$-Pareto processes) for block maxima or peaks over high thresholds, respectively. However, at finite levels, empirical evidence often suggests that such asymptotic models are too rigidly constrained, and that they do not adequately capture the frequent situation where more severe events tend to be spatially more localized. In other words, these asymptotic models have a strong tail dependence that persists at increasingly high levels, while data usually suggest that it should weaken instead. Another well-known limitation of classical spatial extremes models is that they are either computationally prohibitive to fit in high dimensions, or they need to be fitted using less efficient techniques. In this review paper, we describe recent progress in the modeling and inference for spatial extremes, focusing on new models that have more flexible tail structures that can bridge asymptotic dependence classes, and that are more easily amenable to likelihood-based inference for large datasets. In particular, we discuss various types of random scale constructions, as well as the conditional spatial extremes model, which have recently been getting increasing attention within the statistics of extremes community. We illustrate some of these new spatial models on two different environmental applications.
△ Less
Submitted 13 September, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
High-resolution Bayesian map** of landslide hazard with unobserved trigger event
Authors:
Thomas Opitz,
Haakon Bakka,
Raphaël Huser,
Luigi Lombardo
Abstract:
Statistical models for landslide hazard enable map** of risk factors and landslide occurrence intensity by using geomorphological covariates available at high spatial resolution. However, the spatial distribution of the triggering event (e.g., precipitation or earthquakes) is often not directly observed. In this paper, we develop Bayesian spatial hierarchical models for point patterns of landsli…
▽ More
Statistical models for landslide hazard enable map** of risk factors and landslide occurrence intensity by using geomorphological covariates available at high spatial resolution. However, the spatial distribution of the triggering event (e.g., precipitation or earthquakes) is often not directly observed. In this paper, we develop Bayesian spatial hierarchical models for point patterns of landslide occurrences using different types of log-Gaussian Cox processes. Starting from a competitive baseline model that captures the unobserved precipitation trigger through a spatial random effect at slope unit resolution, we explore novel complex model structures that take clusters of events arising at small spatial scales into account, as well as nonlinear or spatially-varying covariate effects. For a 2009 event of around 4000 precipitation-triggered landslides in Sicily, Italy, we show how to fit our proposed models efficiently using the integrated nested Laplace approximation (INLA), and rigorously compare the performance of our models both from a statistical and applied perspective. In this context, we argue that model comparison should not be based on a single criterion, and that different models of various complexity may provide insights into complementary aspects of the same applied problem. In our application, our models are found to have mostly the same spatial predictive performance, implying that key to successful prediction is the inclusion of a slope-unit resolved random effect capturing the precipitation trigger. Interestingly, a parsimonious formulation of space-varying slope effects reflects a physical interpretation of the precipitation trigger: in subareas with weak trigger, the slope steepness is shown to be mostly irrelevant.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Modeling Non-Stationary Temperature Maxima Based on Extremal Dependence Changing with Event Magnitude
Authors:
Peng Zhong,
Raphaël Huser,
Thomas Opitz
Abstract:
The modeling of spatio-temporal trends in temperature extremes can help better understand the structure and frequency of heatwaves in a changing climate. Here, we study annual temperature maxima over Southern Europe using a century-spanning dataset observed at 44 monitoring stations. Extending the spectral representation of max-stable processes, our modeling framework relies on a novel constructio…
▽ More
The modeling of spatio-temporal trends in temperature extremes can help better understand the structure and frequency of heatwaves in a changing climate. Here, we study annual temperature maxima over Southern Europe using a century-spanning dataset observed at 44 monitoring stations. Extending the spectral representation of max-stable processes, our modeling framework relies on a novel construction of max-infinitely divisible processes, which include covariates to capture spatio-temporal non-stationarities. Our new model keeps a popular max-stable process on the boundary of the parameter space, while flexibly capturing weakening extremal dependence at increasing quantile levels and asymptotic independence. This is achieved by linking the overall magnitude of a spatial event to its spatial correlation range, in such a way that more extreme events become less spatially dependent, thus more localized. Our model reveals salient features of the spatio-temporal variability of European temperature extremes, and it clearly outperforms natural alternative models. Results show that the spatial extent of heatwaves is smaller for more severe events at higher altitudes, and that recent heatwaves are moderately wider. Our probabilistic assessment of the 2019 annual maxima confirms the severity of the 2019 heatwaves both spatially and at individual sites, especially when compared to climatic conditions prevailing in 1950-1975.
△ Less
Submitted 6 September, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Estimating high-resolution Red Sea surface temperature hotspots, using a low-rank semiparametric spatial model
Authors:
Arnab Hazra,
Raphaël Huser
Abstract:
In this work, we estimate extreme sea surface temperature (SST) hotspots, i.e., high threshold exceedance regions, for the Red Sea, a vital region of high biodiversity. We analyze high-resolution satellite-derived SST data comprising daily measurements at 16703 grid cells across the Red Sea over the period 1985-2015. We propose a semiparametric Bayesian spatial mixed-effects linear model with a fl…
▽ More
In this work, we estimate extreme sea surface temperature (SST) hotspots, i.e., high threshold exceedance regions, for the Red Sea, a vital region of high biodiversity. We analyze high-resolution satellite-derived SST data comprising daily measurements at 16703 grid cells across the Red Sea over the period 1985-2015. We propose a semiparametric Bayesian spatial mixed-effects linear model with a flexible mean structure to capture spatially-varying trend and seasonality, while the residual spatial variability is modeled through a Dirichlet process mixture (DPM) of low-rank spatial Student-$t$ processes (LTPs). By specifying cluster-specific parameters for each LTP mixture component, the bulk of the SST residuals influence tail inference and hotspot estimation only moderately. Our proposed model has a nonstationary mean, covariance and tail dependence, and posterior inference can be drawn efficiently through Gibbs sampling. In our application, we show that the proposed method outperforms some natural parametric and semiparametric alternatives. Moreover, we show how hotspots can be identified and we estimate extreme SST hotspots for the whole Red Sea, projected until the year 2100, based on the Representative Concentration Pathways 4.5 and 8.5. The estimated 95\% credible region for joint high threshold exceedances include large areas covering major endangered coral reefs in the southern Red Sea.
△ Less
Submitted 18 October, 2020; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Spatial hierarchical modeling of threshold exceedances using rate mixtures
Authors:
Rishikesh Yadav,
Raphaël Huser,
Thomas Opitz
Abstract:
We develop new flexible univariate models for light-tailed and heavy-tailed data, which extend a hierarchical representation of the generalized Pareto (GP) limit for threshold exceedances. These models can accommodate departure from asymptotic threshold stability in finite samples while kee** the asymptotic GP distribution as a special (or boundary) case and can capture the tails and the bulk jo…
▽ More
We develop new flexible univariate models for light-tailed and heavy-tailed data, which extend a hierarchical representation of the generalized Pareto (GP) limit for threshold exceedances. These models can accommodate departure from asymptotic threshold stability in finite samples while kee** the asymptotic GP distribution as a special (or boundary) case and can capture the tails and the bulk jointly without losing much flexibility. Spatial dependence is modeled through a latent process, while the data are assumed to be conditionally independent. Focusing on a gamma-gamma model construction, we design penalized complexity priors for crucial model parameters, shrinking our proposed spatial Bayesian hierarchical model toward a simpler reference whose marginal distributions are GP with moderately heavy tails. Our model can be fitted in fairly high dimensions using Markov chain Monte Carlo by exploiting the Metropolis-adjusted Langevin algorithm (MALA), which guarantees fast convergence of Markov chains with efficient block proposals for the latent variables. We also develop an adaptive scheme to calibrate the MALA tuning parameters. Moreover, our model avoids the expensive numerical evaluations of multifold integrals in censored likelihood expressions. We demonstrate our new methodology by simulation and application to a dataset of extreme rainfall events that occurred in Germany. Our fitted gamma-gamma model provides a satisfactory performance and can be successfully used to predict rainfall extremes at unobserved locations.
△ Less
Submitted 10 September, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Space-Time Landslide Predictive Modelling
Authors:
Luigi Lombardo,
Thomas Opitz,
Francesca Ardizzone,
Fausto Guzzetti,
Raphaël Huser
Abstract:
Landslides are nearly ubiquitous phenomena and pose severe threats to people, properties, and the environment. Investigators have for long attempted to estimate landslide hazard to determine where, when, and how destructive landslides are expected to be in an area. This information is useful to design landslide mitigation strategies, and to reduce landslide risk and societal and economic losses. I…
▽ More
Landslides are nearly ubiquitous phenomena and pose severe threats to people, properties, and the environment. Investigators have for long attempted to estimate landslide hazard to determine where, when, and how destructive landslides are expected to be in an area. This information is useful to design landslide mitigation strategies, and to reduce landslide risk and societal and economic losses. In the geomorphology literature, most attempts at predicting the occurrence of populations of landslides rely on the observation that landslides are the result of multiple interacting, conditioning and triggering factors. Here, we propose a novel Bayesian modelling framework for the prediction of space-time landslide occurrences of the slide type caused by weather triggers. We consider log-Gaussian cox processes, assuming that individual landslides stem from a point process described by an unknown intensity function. We tested our prediction framework in the Collazzone area, Umbria, Central Italy, for which a detailed multi-temporal landslide inventory spanning 1941-2014 is available together with lithological and bedding data. We tested five models of increasing complexity. Our most complex model includes fixed effects and latent spatio-temporal effects, thus largely fulfilling the common definition of landslide hazard in the literature. We quantified the spatio-temporal predictive skill of our model and found that it performed better than simpler alternatives. We then developed a novel classification strategy and prepared an intensity-susceptibility landslide map, providing more information than traditional susceptibility zonations for land planning and management. We expect our novel approach to lead to better projections of future landslides, and to improve our collective understanding of the evolution of landscapes dominated by mass-wasting processes under geophysical and weather triggers.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Editorial: EVA 2019 data competition on spatio-temporal prediction of Red Sea surface temperature extremes
Authors:
Raphaël Huser
Abstract:
Large, non-stationary spatio-temporal data are ubiquitous in modern statistical applications, and the modeling of spatio-temporal extremes is crucial for assessing risks in environmental sciences among others. While the modeling of extremes is challenging in itself, the prediction of rare events at unobserved spatial locations and time points is even more difficult. In this editorial, we describe…
▽ More
Large, non-stationary spatio-temporal data are ubiquitous in modern statistical applications, and the modeling of spatio-temporal extremes is crucial for assessing risks in environmental sciences among others. While the modeling of extremes is challenging in itself, the prediction of rare events at unobserved spatial locations and time points is even more difficult. In this editorial, we describe the data competition that was organized for the 11th international conference on Extreme-Value Analysis (EVA 2019), for which several teams modeled and predicted Red Sea surface temperature extremes over space and time. After introducing the dataset and the goal of the competition, we disclose the final ranking of the teams, and we finally discuss some interesting outcomes and future challenges.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Max-and-Smooth: a two-step approach for approximate Bayesian inference in latent Gaussian models
Authors:
Birgir Hrafnkelsson,
Stefan Siegert,
Raphaël Huser,
Haakon Bakka,
Árni V. Jóhannesson
Abstract:
With modern high-dimensional data, complex statistical models are necessary, requiring computationally feasible inference schemes. We introduce Max-and-Smooth, an approximate Bayesian inference scheme for a flexible class of latent Gaussian models (LGMs) where one or more of the likelihood parameters are modeled by latent additive Gaussian processes. Max-and-Smooth consists of two-steps. In the fi…
▽ More
With modern high-dimensional data, complex statistical models are necessary, requiring computationally feasible inference schemes. We introduce Max-and-Smooth, an approximate Bayesian inference scheme for a flexible class of latent Gaussian models (LGMs) where one or more of the likelihood parameters are modeled by latent additive Gaussian processes. Max-and-Smooth consists of two-steps. In the first step (Max), the likelihood function is approximated by a Gaussian density with mean and covariance equal to either (a) the maximum likelihood estimate and the inverse observed information, respectively, or (b) the mean and covariance of the normalized likelihood function. In the second step (Smooth), the latent parameters and hyperparameters are inferred and smoothed with the approximated likelihood function. The proposed method ensures that the uncertainty from the first step is correctly propagated to the second step. Since the approximated likelihood function is Gaussian, the approximate posterior density of the latent parameters of the LGM (conditional on the hyperparameters) is also Gaussian, thus facilitating efficient posterior inference in high dimensions. Furthermore, the approximate marginal posterior distribution of the hyperparameters is tractable, and as a result, the hyperparameters can be sampled independently of the latent parameters. In the case of a large number of independent data replicates, sparse precision matrices, and high-dimensional latent vectors, the speedup is substantial in comparison to an MCMC scheme that infers the posterior density from the exact likelihood function. The proposed inference scheme is demonstrated on one spatially referenced real dataset and on simulated data mimicking spatial, temporal, and spatio-temporal inference problems. Our results show that Max-and-Smooth is accurate and fast.
△ Less
Submitted 14 February, 2020; v1 submitted 27 July, 2019;
originally announced July 2019.
-
Approximate Bayesian inference for analysis of spatio-temporal flood frequency data
Authors:
Árni V. Johannesson,
Stefan Siegert,
Raphaël Huser,
Haakon Bakka,
Birgir Hrafnkelsson
Abstract:
Extreme floods cause casualties, and widespread damage to property and vital civil infrastructure. We here propose a Bayesian approach for predicting extreme floods using the generalized extreme-value (GEV) distribution within gauged and ungauged catchments. A major methodological challenge is to find a suitable parametrization for the GEV distribution when covariates or latent spatial effects are…
▽ More
Extreme floods cause casualties, and widespread damage to property and vital civil infrastructure. We here propose a Bayesian approach for predicting extreme floods using the generalized extreme-value (GEV) distribution within gauged and ungauged catchments. A major methodological challenge is to find a suitable parametrization for the GEV distribution when covariates or latent spatial effects are involved. Other challenges involve balancing model complexity and parsimony using an appropriate model selection procedure, and making inference using a reliable and computationally efficient approach. Our approach relies on a latent Gaussian modeling framework with a novel multivariate link function designed to separate the interpretation of the parameters at the latent level and to avoid unreasonable estimates of the shape and time trend parameters. Structured additive regression models are proposed for the four parameters at the latent level. For computational efficiency with large datasets and richly parametrized models, we exploit an accurate and fast approximate Bayesian inference approach. We applied our proposed methodology to annual peak river flow data from 554 catchments across the United Kingdom (UK). Our model performed well in terms of flood predictions for both gauged and ungauged catchments. The results show that the spatial model components for the transformed location and scale parameters, and the time trend, are all important. Posterior estimates of the time trend parameters correspond to an average increase of about $1.5\%$ per decade and reveal a spatial structure across the UK. To estimate return levels for spatial aggregates, we further develop a novel copula-based post-processing approach of posterior predictive samples, in order to mitigate the effect of the conditional independence assumption at the data level, and we show that our approach provides accurate results.
△ Less
Submitted 6 April, 2021; v1 submitted 10 July, 2019;
originally announced July 2019.
-
Asymmetric tail dependence modeling, with application to cryptocurrency market data
Authors:
Yan Gong,
Raphaël Huser
Abstract:
Since the inception of Bitcoin in 2008, cryptocurrencies have played an increasing role in the world of e-commerce, but the recent turbulence in the cryptocurrency market in 2018 has raised some concerns about their stability and associated risks. For investors, it is crucial to uncover the dependence relationships between cryptocurrencies for a more resilient portfolio diversification. Moreover,…
▽ More
Since the inception of Bitcoin in 2008, cryptocurrencies have played an increasing role in the world of e-commerce, but the recent turbulence in the cryptocurrency market in 2018 has raised some concerns about their stability and associated risks. For investors, it is crucial to uncover the dependence relationships between cryptocurrencies for a more resilient portfolio diversification. Moreover, the stochastic behavior in both tails is important, as long positions are sensitive to a decrease in prices (lower tail), while short positions are sensitive to an increase in prices (upper tail). In order to assess both risk types, we develop in this paper a flexible copula model which is able to distinctively capture asymptotic dependence or independence in its lower and upper tails simultaneously. Our proposed model is parsimonious and smoothly bridges (in each tail) both extremal dependence classes in the interior of the parameter space. Inference is performed using a full or censored likelihood approach, and we investigate by simulation the estimators' efficiency under three different censoring schemes which reduce the impact of non-extreme observations. We also develop a local likelihood approach to capture the temporal dynamics of extremal dependence among two leading cryptocurrencies. We here apply our model to historical closing prices of five leading cryotocurrencies, which share most of the cryptocurrency market capitalizations. The results show that our proposed copula model outperforms alternative copula models and that the lower tail dependence level between most pairs of leading cryptocurrencies -- and in particular Bitcoin and Ethereum -- has become stronger over time, smoothly transitioning from an asymptotic independence regime to an asymptotic dependence regime in recent years, whilst the upper tail has been relatively more stable overall at a weaker dependence level.
△ Less
Submitted 13 April, 2021; v1 submitted 13 May, 2019;
originally announced May 2019.