-
A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease
Authors:
Troy P. Wixson,
Benjamin A. Shaby,
Daisy L. Philtron,
International Parkinson Disease Genomics Consortium,
Leandro A. Lima,
Stacia K. Wyman,
Julia A. Kaye,
Steven Finkbeiner
Abstract:
We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the fa…
▽ More
We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Semiparametric Estimation of the Shape of the Limiting Bivariate Point Cloud
Authors:
Reetam Majumder,
Benjamin A. Shaby,
Brian J. Reich,
Daniel Cooley
Abstract:
We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We…
▽ More
We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We then fit the Bezier splines to data in pseudo-polar coordinates using Markov chain Monte Carlo sampling, utilizing a limiting approximation to the conditional likelihood of the radii given angles. By imposing appropriate constraints on the parameters of the Bezier splines, we guarantee that each posterior sample is a valid limit set boundary, allowing direct posterior analysis of any quantity derived from the shape of the curve. Furthermore, we obtain interpretable inference on the asymptotic dependence class by using mixture priors with point masses on the corner of the unit box. Finally, we apply our model to bivariate datasets of extremes of variables related to fire risk and air pollution.
△ Less
Submitted 3 June, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Modeling First Arrival of Migratory Birds using a Hierarchical Max-infinitely Divisible Process
Authors:
Dhanushi A. Wijeyakulasuriya,
Ephraim M. Hanks,
Benjamin A. Shaby
Abstract:
Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dyn…
▽ More
Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically-coherent way, and to project arrival dynamics into the future by conditioning on climatic variables.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Modeling Extremal Streamflow using Deep Learning Approximations and a Flexible Spatial Process
Authors:
Reetam Majumder,
Brian J. Reich,
Benjamin A. Shaby
Abstract:
Quantifying changes in the probability and magnitude of extreme flooding events is key to mitigating their impacts. While hydrodynamic data are inherently spatially dependent, traditional spatial models such as Gaussian processes are poorly suited for modeling extreme events. Spatial extreme value models with more realistic tail dependence characteristics are under active development. They are the…
▽ More
Quantifying changes in the probability and magnitude of extreme flooding events is key to mitigating their impacts. While hydrodynamic data are inherently spatially dependent, traditional spatial models such as Gaussian processes are poorly suited for modeling extreme events. Spatial extreme value models with more realistic tail dependence characteristics are under active development. They are theoretically justified, but give intractable likelihoods, making computation challenging for small datasets and prohibitive for continental-scale studies. We propose a process mixture model (PMM) which specifies spatial dependence in extreme values as a convex combination of a Gaussian process and a max-stable process, yielding desirable tail dependence properties but intractable likelihoods. To address this, we employ a unique computational strategy where a feed-forward neural network is embedded in a density regression model to approximate the conditional distribution at one spatial location given a set of neighbors. We then use this univariate density function to approximate the joint likelihood for all locations by way of a Vecchia approximation. The PMM is used to analyze changes in annual maximum streamflow within the US over the last 50 years, and is able to detect areas which show increases in extreme streamflow over time.
△ Less
Submitted 27 September, 2023; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Asymptotic posterior normality of the generalized extreme value distribution
Authors:
Likun Zhang,
Benjamin A. Shaby
Abstract:
The univariate generalized extreme value (GEV) distribution is the most commonly used tool for analyzing the properties of rare events. The ever greater utilization of Bayesian methods for extreme value analysis warrants detailed theoretical investigation, which has thus far been underdeveloped. Even the most basic asymptotic results are difficult to obtain because the GEV fails to satisfy standar…
▽ More
The univariate generalized extreme value (GEV) distribution is the most commonly used tool for analyzing the properties of rare events. The ever greater utilization of Bayesian methods for extreme value analysis warrants detailed theoretical investigation, which has thus far been underdeveloped. Even the most basic asymptotic results are difficult to obtain because the GEV fails to satisfy standard regularity conditions. Here, we prove that the posterior distribution of the GEV parameter vector, given $n$ independent and identically distributed samples, converges in distribution to a trivariate normal distribution. The proof necessitates analyzing integrals of the GEV likelihood function over the entire parameter space, which requires considerable care because the support of the GEV density depends on the parameters in complicated ways.
△ Less
Submitted 30 June, 2023; v1 submitted 9 March, 2021;
originally announced March 2021.
-
A Vecchia Approximation for High-Dimensional Gaussian Cumulative Distribution Functions Arising from Spatial Data
Authors:
Mauricio Nascimento,
Benjamin A. Shaby
Abstract:
We introduce an approach to quickly and accurately approximate the cumulative distribution function of multivariate Gaussian distributions arising from spatial Gaussian processes. This approximation is trivially parallelizable and simple to implement using standard software. We demonstrate its accuracy and computational efficiency in a series of simulation experiments and apply it to analyzing the…
▽ More
We introduce an approach to quickly and accurately approximate the cumulative distribution function of multivariate Gaussian distributions arising from spatial Gaussian processes. This approximation is trivially parallelizable and simple to implement using standard software. We demonstrate its accuracy and computational efficiency in a series of simulation experiments and apply it to analyzing the joint tail of a large precipitation dataset using a recently-proposed scale mixture model for spatial extremes. This dataset is many times larger than what was previously considered possible to fit using preferred inferential techniques.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Projecting Flood-Inducing Precipitation with a Bayesian Analogue Model
Authors:
Gregory P. Bopp,
Benjamin A. Shaby,
Chris E. Forest,
Alfonso Mejía
Abstract:
The hazard of pluvial flooding is largely influenced by the spatial and temporal dependence characteristics of precipitation. When extreme precipitation possesses strong spatial dependence, the risk of flooding is amplified due to catchment factors that cause runoff accumulation such as topography. Temporal dependence can also increase flood risk as storm water drainage systems operating at capaci…
▽ More
The hazard of pluvial flooding is largely influenced by the spatial and temporal dependence characteristics of precipitation. When extreme precipitation possesses strong spatial dependence, the risk of flooding is amplified due to catchment factors that cause runoff accumulation such as topography. Temporal dependence can also increase flood risk as storm water drainage systems operating at capacity can be overwhelmed by heavy precipitation occurring over multiple days. While transformed Gaussian processes are common choices for modeling precipitation, their weak tail dependence may lead to underestimation of flood risk. Extreme value models such as the generalized Pareto processes for threshold exceedances and max-stable models are attractive alternatives, but are difficult to fit when the number of observation sites is large, and are of little use for modeling the bulk of the distribution, which may also be of interest to water management planners. While the atmospheric dynamics governing precipitation are complex and difficult to fully incorporate into a parsimonious statistical model, non-mechanistic analogue methods that approximate those dynamics have proven to be promising approaches to capturing the temporal dependence of precipitation. In this paper, we present a Bayesian analogue method that leverages large, synoptic-scale atmospheric patterns to make precipitation forecasts. Changing spatial dependence across varying intensities is modeled as a mixture of spatial Student-t processes that can accommodate both strong and weak tail dependence. The proposed model demonstrates improved performance at capturing the distribution of extreme precipitation over Community Atmosphere Model (CAM) 5.2 forecasts.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations
Authors:
Likun Zhang,
Benjamin A. Shaby,
Jennifer L. Wadsworth
Abstract:
Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of sp…
▽ More
Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of spatial processes, while retaining the desirable flexibility in the tail dependence structure, by modifying an established class of models based on scale mixtures Gaussian processes. We show that the desired extremal dependence properties from the original models are preserved under the modification, and demonstrate that the corresponding Bayesian hierarchical model does not involve the expensive computation of the multivariate Gaussian distribution function. We fit our model to exceedances of a high threshold, and perform coverage analyses and cross-model checks to validate its ability to capture different types of tail characteristics. We use a standard adaptive Metropolis algorithm for model fitting, and further accelerate the computation via parallelization and Rcpp. Lastly, we apply the model to a dataset of a fire threat index on the Great Plains region of the US, which is vulnerable to massively destructive wildfires. We find that the joint tail of the fire threat index exhibits a decaying dependence structure that cannot be captured by limiting extreme value models.
△ Less
Submitted 9 December, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index
Authors:
Arnab Hazra,
Brian J. Reich,
Benjamin A. Shaby,
Ana-Maria Staicu
Abstract:
Large wildfires pose a major environmental concern, and precise maps of fire risk can improve disaster relief planning. Fosberg Fire Weather Index (FFWI) is often used to measure wildfire risk; FFWI exhibits non-Gaussian marginal distributions as well as strong spatiotemporal extremal dependence and thus, modeling FFWI using geostatistical models like Gaussian processes is questionable. Extreme va…
▽ More
Large wildfires pose a major environmental concern, and precise maps of fire risk can improve disaster relief planning. Fosberg Fire Weather Index (FFWI) is often used to measure wildfire risk; FFWI exhibits non-Gaussian marginal distributions as well as strong spatiotemporal extremal dependence and thus, modeling FFWI using geostatistical models like Gaussian processes is questionable. Extreme value theory (EVT)-driven models like max-stable processes are theoretically appealing but are computationally demanding and applicable only for threshold exceedances or block maxima. Disaster management policies often consider moderate-to-extreme quantiles of climate parameters and hence, joint modeling of the bulk and the tail of the data is required. In this paper, we consider a Dirichlet process mixture of spatial skew-t processes that can flexibly model the bulk as well as the tail. The proposed model has nonstationary mean and covariance structure, and also nonzero spatiotemporal extremal dependence. A simulation study demonstrates that the proposed model has better spatial prediction performance compared to some competing models. We develop spatial maps of FFWI medians and extremes, and discuss the wildfire risk throughout the Santa Ana region of California.
△ Less
Submitted 16 November, 2020; v1 submitted 31 December, 2018;
originally announced December 2018.
-
A Hierarchical Max-Infinitely Divisible Spatial Model for Extreme Precipitation
Authors:
Gregory P. Bopp,
Benjamin A. Shaby,
Raphaël Huser
Abstract:
Understanding the spatial extent of extreme precipitation is necessary for determining flood risk and adequately designing infrastructure (e.g., stormwater pipes) to withstand such hazards. While environmental phenomena typically exhibit weakening spatial dependence at increasingly extreme levels, limiting max-stable process models for block maxima have a rigid dependence structure that does not c…
▽ More
Understanding the spatial extent of extreme precipitation is necessary for determining flood risk and adequately designing infrastructure (e.g., stormwater pipes) to withstand such hazards. While environmental phenomena typically exhibit weakening spatial dependence at increasingly extreme levels, limiting max-stable process models for block maxima have a rigid dependence structure that does not capture this type of behavior. We propose a flexible Bayesian model from a broader family of (conditionally) max-infinitely divisible processes that allows for weakening spatial dependence at increasingly extreme levels, and due to a hierarchical representation of the likelihood in terms of random effects, our inference approach scales to large datasets. Therefore, our model not only has a flexible dependence structure, but it also allows for fast, fully Bayesian inference, prediction and conditional simulation in high dimensions. The proposed model is constructed using flexible random basis functions that are estimated from the data, allowing for straightforward inspection of the predominant spatial patterns of extremes. In addition, the described process possesses (conditional) max-stability as a special case, making inference on the tail dependence class possible. We apply our model to extreme precipitation in North-Eastern America, and show that the proposed model adequately captures the extremal behavior of the data. Interestingly, we find that the principal modes of spatial variation estimated from our model resemble observed patterns in extreme precipitation events occurring along the coast (e.g., with localized tropical cyclones and convective storms) and mountain range borders. Our model, which can easily be adapted to other types of environmental datasets, is therefore useful to identify extreme weather patterns and regions at risk.
△ Less
Submitted 23 March, 2020; v1 submitted 15 May, 2018;
originally announced May 2018.
-
A Markov-switching model for heat waves
Authors:
Benjamin A. Shaby,
Brian J. Reich,
Daniel Cooley,
Cari G. Kaufman
Abstract:
Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal depen…
▽ More
Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal dependence of the latent variable controls the frequency and persistence of heat waves. Within each heat wave, temperatures are modeled using extreme value distributions, with extremal dependence across time accomplished through an extreme value Markov model. One important virtue of interpretability is that model parameters directly translate into quantities of interest for risk management, so that questions like whether heat waves are becoming longer, more severe or more frequent are easily answered by querying an appropriate fitted model. We demonstrate the latent state model on two recent, calamitous, examples: the European heat wave of 2003 and the Russian heat wave of 2010.
△ Less
Submitted 23 June, 2016; v1 submitted 15 May, 2014;
originally announced May 2014.
-
A hierarchical max-stable spatial model for extreme precipitation
Authors:
Brian J. Reich,
Benjamin A. Shaby
Abstract:
Extreme environmental phenomena such as major precipitation events manifestly exhibit spatial dependence. Max-stable processes are a class of asymptotically-justified models that are capable of representing spatial dependence among extreme values. While these models satisfy modeling requirements, they are limited in their utility because their corresponding joint likelihoods are unknown for more t…
▽ More
Extreme environmental phenomena such as major precipitation events manifestly exhibit spatial dependence. Max-stable processes are a class of asymptotically-justified models that are capable of representing spatial dependence among extreme values. While these models satisfy modeling requirements, they are limited in their utility because their corresponding joint likelihoods are unknown for more than a trivial number of spatial locations, preventing, in particular, Bayesian analyses. In this paper, we propose a new random effects model to account for spatial dependence. We show that our specification of the random effect distribution leads to a max-stable process that has the popular Gaussian extreme value process (GEVP) as a limiting case. The proposed model is used to analyze the yearly maximum precipitation from a regional climate model.
△ Less
Submitted 8 January, 2013;
originally announced January 2013.
-
The open-faced sandwich adjustment for MCMC using estimating functions
Authors:
Benjamin A Shaby
Abstract:
The situation frequently arises where working with the likelihood function is problematic. This can happen for several reasons---perhaps the likelihood is prohibitively computationally expensive, perhaps it lacks some robustness property, or perhaps it is simply not known for the model under consideration. In these cases, it is often possible to specify alternative functions of the parameters and…
▽ More
The situation frequently arises where working with the likelihood function is problematic. This can happen for several reasons---perhaps the likelihood is prohibitively computationally expensive, perhaps it lacks some robustness property, or perhaps it is simply not known for the model under consideration. In these cases, it is often possible to specify alternative functions of the parameters and the data that can be maximized to obtain asymptotically normal estimates. However, these scenarios present obvious problems if one is interested in applying Bayesian techniques. Here we describe open-faced sandwich adjustment, a way to incorporate a wide class of non-likelihood objective functions within Bayesian-like models to obtain asymptotically valid parameter estimates and inference via MCMC. Two simulation examples show that the method provides accurate frequentist uncertainty estimates. The open-faced sandwich adjustment is applied to a Poisson spatio-temporal model to analyze an ornithology dataset from the citizen science initiative eBird.
△ Less
Submitted 16 April, 2012;
originally announced April 2012.