-
Modeling First Arrival of Migratory Birds using a Hierarchical Max-infinitely Divisible Process
Authors:
Dhanushi A. Wijeyakulasuriya,
Ephraim M. Hanks,
Benjamin A. Shaby
Abstract:
Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dyn…
▽ More
Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically-coherent way, and to project arrival dynamics into the future by conditioning on climatic variables.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
A New Framework for Inference on Markov Population Models
Authors:
Adam Walder,
Ephraim M. Hanks
Abstract:
In this work we construct a joint Gaussian likelihood for approximate inference on Markov population models. We demonstrate that Markov population models can be approximated by a system of linear stochastic differential equations with time-varying coefficients. We show that the system of stochastic differential equations converges to a set of ordinary differential equations. We derive our proposed…
▽ More
In this work we construct a joint Gaussian likelihood for approximate inference on Markov population models. We demonstrate that Markov population models can be approximated by a system of linear stochastic differential equations with time-varying coefficients. We show that the system of stochastic differential equations converges to a set of ordinary differential equations. We derive our proposed joint Gaussian deterministic limiting approximation (JGDLA) model from the limiting system of ordinary differential equations. The results is a method for inference on Markov population models that relies solely on the solution to a system deterministic equations. We show that our method requires no stochastic infill and exhibits improved predictive power in comparison to the Euler-Maruyama scheme on simulated susceptible-infected-recovered data sets. We use the JGDLA to fit a stochastic susceptible-exposed-infected-recovered system to the Princess Diamond COVID-19 cruise ship data set.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
A Mechanistic Model of Annual Sulfate Concentrations in the United States
Authors:
Nathan B. Wikle,
Ephraim M. Hanks,
Lucas R. F. Henneman,
Corwin M. Zigler
Abstract:
We develop a mechanistic model to analyze the impact of sulfur dioxide emissions from coal-fired power plants on average sulfate concentrations in the central United States. A multivariate Ornstein-Uhlenbeck (OU) process is used to approximate the dynamics of the underlying space-time chemical transport process, and its distributional properties are leveraged to specify novel probability models fo…
▽ More
We develop a mechanistic model to analyze the impact of sulfur dioxide emissions from coal-fired power plants on average sulfate concentrations in the central United States. A multivariate Ornstein-Uhlenbeck (OU) process is used to approximate the dynamics of the underlying space-time chemical transport process, and its distributional properties are leveraged to specify novel probability models for spatial data (i.e., spatially-referenced data with no temporal replication) that are viewed as either a snapshot or a time-averaged observation of the OU process. Air pollution transport dynamics determine the mean and covariance structure of our atmospheric sulfate model, allowing us to infer which process dynamics are driving observed air pollution concentrations. We use these inferred dynamics to assess the regulatory impact of flue-gas desulfurization (FGD) technologies on human exposure to sulfate aerosols.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Privacy for Spatial Point Process Data
Authors:
Adam Walder,
Ephraim M. Hanks,
Aleksandra Slavković
Abstract:
In this work we develop methods for privatizing spatial location data, such as spatial locations of individual disease cases. We propose two novel Bayesian methods for generating synthetic location data based on log-Gaussian Cox processes (LGCPs). We show that conditional predictive ordinate (CPO) estimates can easily be obtained for point process data. We construct a novel risk metric that utiliz…
▽ More
In this work we develop methods for privatizing spatial location data, such as spatial locations of individual disease cases. We propose two novel Bayesian methods for generating synthetic location data based on log-Gaussian Cox processes (LGCPs). We show that conditional predictive ordinate (CPO) estimates can easily be obtained for point process data. We construct a novel risk metric that utilizes CPO estimates to evaluate individual disclosure risks. We adapt the propensity mean square error (pMSE) data utility metric for LGCPs. We demonstrate that our synthesis methods offer an improved risk vs. utility balance in comparison to radial synthesis with a case study of Dr. John Snow's cholera outbreak data.
△ Less
Submitted 28 April, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Bayesian Analysis of Spatial Generalized Linear Mixed Models with Laplace Random Fields
Authors:
Adam Walder,
Ephraim M. Hanks
Abstract:
Gaussian random field (GRF) models are widely used in spatial statistics to capture spatially correlated error. We investigate the results of replacing Gaussian processes with Laplace moving averages (LMAs) in spatial generalized linear mixed models (SGLMMs). We demonstrate that LMAs offer improved predictive power when the data exhibits localized spikes in the response. SGLMMs with LMAs are shown…
▽ More
Gaussian random field (GRF) models are widely used in spatial statistics to capture spatially correlated error. We investigate the results of replacing Gaussian processes with Laplace moving averages (LMAs) in spatial generalized linear mixed models (SGLMMs). We demonstrate that LMAs offer improved predictive power when the data exhibits localized spikes in the response. SGLMMs with LMAs are shown to maintain analogous parameter inference and similar computing to Gaussian SGLMMs. We propose a novel discrete space LMA model for irregular lattices and construct conjugate samplers for LMAs with georeferenced and areal support. We provide a Bayesian analysis of SGLMMs with LMAs and GRFs over multiple data support and response types.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Identifying and characterizing extrapolation in multivariate response data
Authors:
Meridith L Bartley,
Ephraim M Hanks,
Erin M Schliep,
Patricia A Soranno,
Tyler Wagner
Abstract:
Extrapolation is defined as making predictions beyond the range of the data used to estimate a statistical model. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods are not directly applicable to multivariate response…
▽ More
Extrapolation is defined as making predictions beyond the range of the data used to estimate a statistical model. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods are not directly applicable to multivariate response data, which are more and more common in ecological investigations. In this paper, we extend previous work that identified extrapolation by applying the predictive variance from the univariate setting to the multivariate case. We illustrate our approach through an analysis of jointly modeled lake nutrients and indicators of algal biomass and water clarity in over 7000 inland lakes from across the Northeast and Mid-west US. In addition, we illustrate novel exploratory approaches for identifying regions of covariate space where extrapolation is more likely to occur using classification and regression trees.
△ Less
Submitted 12 November, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
On the Relationship between Conditional (CAR) and Simultaneous (SAR) Autoregressive Models
Authors:
Jay M. Ver Hoef,
Ephraim M. Hanks,
Mevin B. Hooten
Abstract:
We clarify relationships between conditional (CAR) and simultaneous (SAR) autoregressive models. We review the literature on this topic and find that it is mostly incomplete. Our main result is that a SAR model can be written as a unique CAR model, and while a CAR model can be written as a SAR model, it is not unique. In fact, we show how any multivariate Gaussian distribution on a finite set of p…
▽ More
We clarify relationships between conditional (CAR) and simultaneous (SAR) autoregressive models. We review the literature on this topic and find that it is mostly incomplete. Our main result is that a SAR model can be written as a unique CAR model, and while a CAR model can be written as a SAR model, it is not unique. In fact, we show how any multivariate Gaussian distribution on a finite set of points with a positive-definite covariance matrix can be written as either a CAR or a SAR model. We illustrate how to obtain any number of SAR covariance matrices from a single CAR covariance matrix by using Givens rotation matrices on a simulated example. We also discuss sparseness in the original CAR construction, and for the resulting SAR weights matrix. For a real example, we use crime data in 49 neighborhoods from Columbus, Ohio, and show that a geostatistical model optimizes the likelihood much better than typical first-order CAR models. We then use the implied weights from the geostatistical model to estimate CAR model parameters that provides the best overall optimization.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
Hierarchical animal movement models for population-level inference
Authors:
Mevin B. Hooten,
Frances E. Buderman,
Brian M. Brost,
Ephraim M. Hanks,
Jacob S. Ivan
Abstract:
New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population-level are eit…
▽ More
New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population-level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for a completely automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
Flexible discrete space models of animal movement
Authors:
Ephraim M. Hanks,
David A. Hughes
Abstract:
Movement drives the spread of infectious disease, gene flow, and other critical ecological processes. To study these processes we need models for movement that capture complex behavior that changes over time and space in response to biotic and abiotic factors. Penalized likelihood approaches, such as penalized semiparametric spline expansions and LASSO regression, allow inference on complex models…
▽ More
Movement drives the spread of infectious disease, gene flow, and other critical ecological processes. To study these processes we need models for movement that capture complex behavior that changes over time and space in response to biotic and abiotic factors. Penalized likelihood approaches, such as penalized semiparametric spline expansions and LASSO regression, allow inference on complex models without overfitting. Continuous-time Markov chains (CTMCs) have been recently introduced as a flexible discrete-space model for animal movement. Modeling with CTMCs involves discretizing an animal's path to the resolution of a raster grid. The resulting stochastic process model can easily incorporate environmental and other covariates, represented as raster layers, that affect directional bias and overall movement rate. We introduce a weighted likelihood approach that allows for modeling movement using CTMCs, with path uncertainty due to missing data modeled by imputing continuous-time paths between telemetry locations. The framework we introduce allows for inference on CTMC movement models using existing software for fitting Poisson regression models, including penalized versions of Poisson regression. The result is a flexible, powerful, and accessible framework for modeling a wide range of animal movement behavior.
△ Less
Submitted 25 June, 2016;
originally announced June 2016.
-
A Spatially-Varying Stochastic Differential Equation Model for Animal Movement
Authors:
James C. Russell,
Ephraim M. Hanks,
Murali Haran,
David P. Hughes
Abstract:
Animal movement exhibits complex behavior which can be influenced by unobserved environmental conditions. We propose a model which allows for a spatially-varying movement rate and spatially-varying drift through a semiparametric potential surface and a separate motility surface. These surfaces are embedded in a stochastic differential equation framework which allows for complex animal movement pat…
▽ More
Animal movement exhibits complex behavior which can be influenced by unobserved environmental conditions. We propose a model which allows for a spatially-varying movement rate and spatially-varying drift through a semiparametric potential surface and a separate motility surface. These surfaces are embedded in a stochastic differential equation framework which allows for complex animal movement patterns in space. The resulting model is used to analyze the spatially-varying behavior of ants to provide insight into the spatial structure of ant movement in the nest.
△ Less
Submitted 26 February, 2017; v1 submitted 24 March, 2016;
originally announced March 2016.
-
A Constructive Spatio-Temporal Approach to Modeling Spatial Covariance
Authors:
Ephraim M. Hanks
Abstract:
I present an approach for modeling areal spatial covariance by considering the stationary distribution of a spatio-temporal Markov random walk. In the areal data case, this stationary distribution corresponds to an intrinsic simultaneous autoregressive (SAR) model for spatial correlation, and provides a principled approach to specifying areal spatial models when a spatio-temporal generating proces…
▽ More
I present an approach for modeling areal spatial covariance by considering the stationary distribution of a spatio-temporal Markov random walk. In the areal data case, this stationary distribution corresponds to an intrinsic simultaneous autoregressive (SAR) model for spatial correlation, and provides a principled approach to specifying areal spatial models when a spatio-temporal generating process can be assumed. I apply the approach to a study of spatial genetic variation of trout in a stream network in Connecticut, USA, and a study of crime rates in neighborhoods of Columbus, OH, USA.
△ Less
Submitted 3 July, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.
-
Dynamic Models of Animal Movement with Spatial Point Process Interactions
Authors:
James C. Russell,
Ephraim M. Hanks,
Murali Haran
Abstract:
When analyzing animal movement, it is important to account for interactions between individuals. However, statistical models for incorporating interaction behavior in movement models are limited. We propose an approach that models dependent movement by augmenting a dynamic marginal movement model with a spatial point process interaction function within a weighted distribution framework. The approa…
▽ More
When analyzing animal movement, it is important to account for interactions between individuals. However, statistical models for incorporating interaction behavior in movement models are limited. We propose an approach that models dependent movement by augmenting a dynamic marginal movement model with a spatial point process interaction function within a weighted distribution framework. The approach is flexible, as marginal movement behavior and interaction behavior can be modeled independently. Inference for model parameters is complicated by intractable normalizing constants. We develop a double Metropolis-Hastings algorithm to perform Bayesian inference. We illustrate our approach through the analysis of movement tracks of guppies (Poecilia reticulata)
△ Less
Submitted 31 July, 2015; v1 submitted 30 March, 2015;
originally announced March 2015.
-
Continuous-time discrete-space models for animal movement
Authors:
Ephraim M. Hanks,
Mevin B. Hooten,
Mat W. Alldredge
Abstract:
The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of…
▽ More
The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.
△ Less
Submitted 28 May, 2015; v1 submitted 8 November, 2012;
originally announced November 2012.