Search | arXiv e-print repository

arXiv:2406.05262 [pdf, other]

A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease

Authors: Troy P. Wixson, Benjamin A. Shaby, Daisy L. Philtron, International Parkinson Disease Genomics Consortium, Leandro A. Lima, Stacia K. Wyman, Julia A. Kaye, Steven Finkbeiner

Abstract: We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the fa… ▽ More We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 26 pages, 6 figures, 4 tables

arXiv:2306.13257 [pdf, other]

Semiparametric Estimation of the Shape of the Limiting Bivariate Point Cloud

Authors: Reetam Majumder, Benjamin A. Shaby, Brian J. Reich, Daniel Cooley

Abstract: We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We… ▽ More We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We then fit the Bezier splines to data in pseudo-polar coordinates using Markov chain Monte Carlo sampling, utilizing a limiting approximation to the conditional likelihood of the radii given angles. By imposing appropriate constraints on the parameters of the Bezier splines, we guarantee that each posterior sample is a valid limit set boundary, allowing direct posterior analysis of any quantity derived from the shape of the curve. Furthermore, we obtain interpretable inference on the asymptotic dependence class by using mixture priors with point masses on the corner of the unit box. Finally, we apply our model to bivariate datasets of extremes of variables related to fire risk and air pollution. △ Less

Submitted 3 June, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.06295 [pdf, other]

Modeling First Arrival of Migratory Birds using a Hierarchical Max-infinitely Divisible Process

Authors: Dhanushi A. Wijeyakulasuriya, Ephraim M. Hanks, Benjamin A. Shaby

Abstract: Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dyn… ▽ More Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically-coherent way, and to project arrival dynamics into the future by conditioning on climatic variables. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2208.03344 [pdf, other]

doi 10.1214/23-AOAS1847

Modeling Extremal Streamflow using Deep Learning Approximations and a Flexible Spatial Process

Authors: Reetam Majumder, Brian J. Reich, Benjamin A. Shaby

Abstract: Quantifying changes in the probability and magnitude of extreme flooding events is key to mitigating their impacts. While hydrodynamic data are inherently spatially dependent, traditional spatial models such as Gaussian processes are poorly suited for modeling extreme events. Spatial extreme value models with more realistic tail dependence characteristics are under active development. They are the… ▽ More Quantifying changes in the probability and magnitude of extreme flooding events is key to mitigating their impacts. While hydrodynamic data are inherently spatially dependent, traditional spatial models such as Gaussian processes are poorly suited for modeling extreme events. Spatial extreme value models with more realistic tail dependence characteristics are under active development. They are theoretically justified, but give intractable likelihoods, making computation challenging for small datasets and prohibitive for continental-scale studies. We propose a process mixture model (PMM) which specifies spatial dependence in extreme values as a convex combination of a Gaussian process and a max-stable process, yielding desirable tail dependence properties but intractable likelihoods. To address this, we employ a unique computational strategy where a feed-forward neural network is embedded in a density regression model to approximate the conditional distribution at one spatial location given a set of neighbors. We then use this univariate density function to approximate the joint likelihood for all locations by way of a Vecchia approximation. The PMM is used to analyze changes in annual maximum streamflow within the US over the last 50 years, and is able to detect areas which show increases in extreme streamflow over time. △ Less

Submitted 27 September, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

arXiv:2103.05747 [pdf, other]

Asymptotic posterior normality of the generalized extreme value distribution

Authors: Likun Zhang, Benjamin A. Shaby

Abstract: The univariate generalized extreme value (GEV) distribution is the most commonly used tool for analyzing the properties of rare events. The ever greater utilization of Bayesian methods for extreme value analysis warrants detailed theoretical investigation, which has thus far been underdeveloped. Even the most basic asymptotic results are difficult to obtain because the GEV fails to satisfy standar… ▽ More The univariate generalized extreme value (GEV) distribution is the most commonly used tool for analyzing the properties of rare events. The ever greater utilization of Bayesian methods for extreme value analysis warrants detailed theoretical investigation, which has thus far been underdeveloped. Even the most basic asymptotic results are difficult to obtain because the GEV fails to satisfy standard regularity conditions. Here, we prove that the posterior distribution of the GEV parameter vector, given $n$ independent and identically distributed samples, converges in distribution to a trivariate normal distribution. The proof necessitates analyzing integrals of the GEV likelihood function over the entire parameter space, which requires considerable care because the support of the GEV density depends on the parameters in complicated ways. △ Less

Submitted 30 June, 2023; v1 submitted 9 March, 2021; originally announced March 2021.

arXiv:2007.15195 [pdf, other]

A Vecchia Approximation for High-Dimensional Gaussian Cumulative Distribution Functions Arising from Spatial Data

Authors: Mauricio Nascimento, Benjamin A. Shaby

Abstract: We introduce an approach to quickly and accurately approximate the cumulative distribution function of multivariate Gaussian distributions arising from spatial Gaussian processes. This approximation is trivially parallelizable and simple to implement using standard software. We demonstrate its accuracy and computational efficiency in a series of simulation experiments and apply it to analyzing the… ▽ More We introduce an approach to quickly and accurately approximate the cumulative distribution function of multivariate Gaussian distributions arising from spatial Gaussian processes. This approximation is trivially parallelizable and simple to implement using standard software. We demonstrate its accuracy and computational efficiency in a series of simulation experiments and apply it to analyzing the joint tail of a large precipitation dataset using a recently-proposed scale mixture model for spatial extremes. This dataset is many times larger than what was previously considered possible to fit using preferred inferential techniques. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 19 pages, 8 figures, 1 table

arXiv:1911.05881 [pdf, other]

doi 10.1007/s13253-020-00391-6

Projecting Flood-Inducing Precipitation with a Bayesian Analogue Model

Authors: Gregory P. Bopp, Benjamin A. Shaby, Chris E. Forest, Alfonso Mejía

Abstract: The hazard of pluvial flooding is largely influenced by the spatial and temporal dependence characteristics of precipitation. When extreme precipitation possesses strong spatial dependence, the risk of flooding is amplified due to catchment factors that cause runoff accumulation such as topography. Temporal dependence can also increase flood risk as storm water drainage systems operating at capaci… ▽ More The hazard of pluvial flooding is largely influenced by the spatial and temporal dependence characteristics of precipitation. When extreme precipitation possesses strong spatial dependence, the risk of flooding is amplified due to catchment factors that cause runoff accumulation such as topography. Temporal dependence can also increase flood risk as storm water drainage systems operating at capacity can be overwhelmed by heavy precipitation occurring over multiple days. While transformed Gaussian processes are common choices for modeling precipitation, their weak tail dependence may lead to underestimation of flood risk. Extreme value models such as the generalized Pareto processes for threshold exceedances and max-stable models are attractive alternatives, but are difficult to fit when the number of observation sites is large, and are of little use for modeling the bulk of the distribution, which may also be of interest to water management planners. While the atmospheric dynamics governing precipitation are complex and difficult to fully incorporate into a parsimonious statistical model, non-mechanistic analogue methods that approximate those dynamics have proven to be promising approaches to capturing the temporal dependence of precipitation. In this paper, we present a Bayesian analogue method that leverages large, synoptic-scale atmospheric patterns to make precipitation forecasts. Changing spatial dependence across varying intensities is modeled as a mixture of spatial Student-t processes that can accommodate both strong and weak tail dependence. The proposed model demonstrates improved performance at capturing the distribution of extreme precipitation over Community Atmosphere Model (CAM) 5.2 forecasts. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1907.09617 [pdf, other]

doi 10.1080/01621459.2020.1858838

Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations

Authors: Likun Zhang, Benjamin A. Shaby, Jennifer L. Wadsworth

Abstract: Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of sp… ▽ More Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of spatial processes, while retaining the desirable flexibility in the tail dependence structure, by modifying an established class of models based on scale mixtures Gaussian processes. We show that the desired extremal dependence properties from the original models are preserved under the modification, and demonstrate that the corresponding Bayesian hierarchical model does not involve the expensive computation of the multivariate Gaussian distribution function. We fit our model to exceedances of a high threshold, and perform coverage analyses and cross-model checks to validate its ability to capture different types of tail characteristics. We use a standard adaptive Metropolis algorithm for model fitting, and further accelerate the computation via parallelization and Rcpp. Lastly, we apply the model to a dataset of a fire threat index on the Great Plains region of the US, which is vulnerable to massively destructive wildfires. We find that the joint tail of the fire threat index exhibits a decaying dependence structure that cannot be captured by limiting extreme value models. △ Less

Submitted 9 December, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

arXiv:1812.11699 [pdf, other]

A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index

Authors: Arnab Hazra, Brian J. Reich, Benjamin A. Shaby, Ana-Maria Staicu

Abstract: Large wildfires pose a major environmental concern, and precise maps of fire risk can improve disaster relief planning. Fosberg Fire Weather Index (FFWI) is often used to measure wildfire risk; FFWI exhibits non-Gaussian marginal distributions as well as strong spatiotemporal extremal dependence and thus, modeling FFWI using geostatistical models like Gaussian processes is questionable. Extreme va… ▽ More Large wildfires pose a major environmental concern, and precise maps of fire risk can improve disaster relief planning. Fosberg Fire Weather Index (FFWI) is often used to measure wildfire risk; FFWI exhibits non-Gaussian marginal distributions as well as strong spatiotemporal extremal dependence and thus, modeling FFWI using geostatistical models like Gaussian processes is questionable. Extreme value theory (EVT)-driven models like max-stable processes are theoretically appealing but are computationally demanding and applicable only for threshold exceedances or block maxima. Disaster management policies often consider moderate-to-extreme quantiles of climate parameters and hence, joint modeling of the bulk and the tail of the data is required. In this paper, we consider a Dirichlet process mixture of spatial skew-t processes that can flexibly model the bulk as well as the tail. The proposed model has nonstationary mean and covariance structure, and also nonzero spatiotemporal extremal dependence. A simulation study demonstrates that the proposed model has better spatial prediction performance compared to some competing models. We develop spatial maps of FFWI medians and extremes, and discuss the wildfire risk throughout the Santa Ana region of California. △ Less

Submitted 16 November, 2020; v1 submitted 31 December, 2018; originally announced December 2018.

Comments: 69 pages, 16 Figures

arXiv:1805.06084 [pdf, other]

A Hierarchical Max-Infinitely Divisible Spatial Model for Extreme Precipitation

Authors: Gregory P. Bopp, Benjamin A. Shaby, Raphaël Huser

Abstract: Understanding the spatial extent of extreme precipitation is necessary for determining flood risk and adequately designing infrastructure (e.g., stormwater pipes) to withstand such hazards. While environmental phenomena typically exhibit weakening spatial dependence at increasingly extreme levels, limiting max-stable process models for block maxima have a rigid dependence structure that does not c… ▽ More Understanding the spatial extent of extreme precipitation is necessary for determining flood risk and adequately designing infrastructure (e.g., stormwater pipes) to withstand such hazards. While environmental phenomena typically exhibit weakening spatial dependence at increasingly extreme levels, limiting max-stable process models for block maxima have a rigid dependence structure that does not capture this type of behavior. We propose a flexible Bayesian model from a broader family of (conditionally) max-infinitely divisible processes that allows for weakening spatial dependence at increasingly extreme levels, and due to a hierarchical representation of the likelihood in terms of random effects, our inference approach scales to large datasets. Therefore, our model not only has a flexible dependence structure, but it also allows for fast, fully Bayesian inference, prediction and conditional simulation in high dimensions. The proposed model is constructed using flexible random basis functions that are estimated from the data, allowing for straightforward inspection of the predominant spatial patterns of extremes. In addition, the described process possesses (conditional) max-stability as a special case, making inference on the tail dependence class possible. We apply our model to extreme precipitation in North-Eastern America, and show that the proposed model adequately captures the extremal behavior of the data. Interestingly, we find that the principal modes of spatial variation estimated from our model resemble observed patterns in extreme precipitation events occurring along the coast (e.g., with localized tropical cyclones and convective storms) and mountain range borders. Our model, which can easily be adapted to other types of environmental datasets, is therefore useful to identify extreme weather patterns and regions at risk. △ Less

Submitted 23 March, 2020; v1 submitted 15 May, 2018; originally announced May 2018.

arXiv:1405.3904 [pdf, ps, other]

doi 10.1214/15-AOAS873

A Markov-switching model for heat waves

Authors: Benjamin A. Shaby, Brian J. Reich, Daniel Cooley, Cari G. Kaufman

Abstract: Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal depen… ▽ More Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal dependence of the latent variable controls the frequency and persistence of heat waves. Within each heat wave, temperatures are modeled using extreme value distributions, with extremal dependence across time accomplished through an extreme value Markov model. One important virtue of interpretability is that model parameters directly translate into quantities of interest for risk management, so that questions like whether heat waves are becoming longer, more severe or more frequent are easily answered by querying an appropriate fitted model. We demonstrate the latent state model on two recent, calamitous, examples: the European heat wave of 2003 and the Russian heat wave of 2010. △ Less

Submitted 23 June, 2016; v1 submitted 15 May, 2014; originally announced May 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS873 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS873

Journal ref: Annals of Applied Statistics 2016, Vol. 10, No. 1, 74-93

arXiv:1301.1530 [pdf, ps, other]

doi 10.1214/12-AOAS591

A hierarchical max-stable spatial model for extreme precipitation

Authors: Brian J. Reich, Benjamin A. Shaby

Abstract: Extreme environmental phenomena such as major precipitation events manifestly exhibit spatial dependence. Max-stable processes are a class of asymptotically-justified models that are capable of representing spatial dependence among extreme values. While these models satisfy modeling requirements, they are limited in their utility because their corresponding joint likelihoods are unknown for more t… ▽ More Extreme environmental phenomena such as major precipitation events manifestly exhibit spatial dependence. Max-stable processes are a class of asymptotically-justified models that are capable of representing spatial dependence among extreme values. While these models satisfy modeling requirements, they are limited in their utility because their corresponding joint likelihoods are unknown for more than a trivial number of spatial locations, preventing, in particular, Bayesian analyses. In this paper, we propose a new random effects model to account for spatial dependence. We show that our specification of the random effect distribution leads to a max-stable process that has the popular Gaussian extreme value process (GEVP) as a limiting case. The proposed model is used to analyze the yearly maximum precipitation from a regional climate model. △ Less

Submitted 8 January, 2013; originally announced January 2013.

Comments: Published in at http://dx.doi.org/10.1214/12-AOAS591 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS591

Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 4, 1430-1451

arXiv:1204.3687 [pdf, other]

doi 10.1080/10618600.2013.842174

The open-faced sandwich adjustment for MCMC using estimating functions

Authors: Benjamin A Shaby

Abstract: The situation frequently arises where working with the likelihood function is problematic. This can happen for several reasons---perhaps the likelihood is prohibitively computationally expensive, perhaps it lacks some robustness property, or perhaps it is simply not known for the model under consideration. In these cases, it is often possible to specify alternative functions of the parameters and… ▽ More The situation frequently arises where working with the likelihood function is problematic. This can happen for several reasons---perhaps the likelihood is prohibitively computationally expensive, perhaps it lacks some robustness property, or perhaps it is simply not known for the model under consideration. In these cases, it is often possible to specify alternative functions of the parameters and the data that can be maximized to obtain asymptotically normal estimates. However, these scenarios present obvious problems if one is interested in applying Bayesian techniques. Here we describe open-faced sandwich adjustment, a way to incorporate a wide class of non-likelihood objective functions within Bayesian-like models to obtain asymptotically valid parameter estimates and inference via MCMC. Two simulation examples show that the method provides accurate frequentist uncertainty estimates. The open-faced sandwich adjustment is applied to a Poisson spatio-temporal model to analyze an ornithology dataset from the citizen science initiative eBird. △ Less

Submitted 16 April, 2012; originally announced April 2012.

Showing 1–13 of 13 results for author: Shaby, B A