-
Mixture of Directed Graphical Models for Discrete Spatial Random Fields
Authors:
J. Brandon Carter,
Catherine A. Calder
Abstract:
Current approaches for modeling discrete-valued outcomes associated with spatially-dependent areal units incur computational and theoretical challenges, especially in the Bayesian setting when full posterior inference is desired. As an alternative, we propose a novel statistical modeling framework for this data setting, namely a mixture of directed graphical models (MDGMs). The components of the m…
▽ More
Current approaches for modeling discrete-valued outcomes associated with spatially-dependent areal units incur computational and theoretical challenges, especially in the Bayesian setting when full posterior inference is desired. As an alternative, we propose a novel statistical modeling framework for this data setting, namely a mixture of directed graphical models (MDGMs). The components of the mixture, directed graphical models, can be represented by directed acyclic graphs (DAGs) and are computationally quick to evaluate. The DAGs representing the mixture components are selected to correspond to an undirected graphical representation of an assumed spatial contiguity/dependence structure of the areal units, which underlies the specification of traditional modeling approaches for discrete spatial processes such as Markov random fields (MRFs). We introduce the concept of compatibility to show how an undirected graph can be used as a template for the structural dependencies between areal units to create sets of DAGs which, as a collection, preserve the structural dependencies represented in the template undirected graph. We then introduce three classes of compatible DAGs and corresponding algorithms for fitting MDGMs based on these classes. In addition, we compare MDGMs to MRFs and a popular Bayesian MRF model approximation used in high-dimensional settings in a series of simulations and an analysis of ecometrics data collected as part of the Adolescent Health and Development in Context Study.
△ Less
Submitted 27 June, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Statistical Inference for Complete and Incomplete Mobility Trajectories under the Flight-Pause Model
Authors:
Marcin Jurek,
Catherine A. Calder,
Corwin Zigler
Abstract:
We formulate a statistical flight-pause model for human mobility, represented by a collection of random objects, called motions, appropriate for mobile phone tracking (MPT) data. We develop the statistical machinery for parameter inference and trajectory imputation under various forms of missing data. We show that common assumptions about the missing data mechanism for MPT are not valid for the me…
▽ More
We formulate a statistical flight-pause model for human mobility, represented by a collection of random objects, called motions, appropriate for mobile phone tracking (MPT) data. We develop the statistical machinery for parameter inference and trajectory imputation under various forms of missing data. We show that common assumptions about the missing data mechanism for MPT are not valid for the mechanism governing the random motions underlying the flight-pause model, representing an understudied missing data phenomenon. We demonstrate the consequences of missing data and our proposed adjustments in both simulations and real data, outlining implications for MPT data collection and design.
△ Less
Submitted 30 June, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Land-Use Filtering for Nonstationary Prediction of Collective Efficacy in an Urban Environment
Authors:
J. Brandon Carter,
Christopher R. Browning,
Bethany Boettner,
Nicolo Pinchak,
Catherine Calder
Abstract:
Collective efficacy -- the capacity of communities to exert social control toward the realization of their shared goals -- is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstr…
▽ More
Collective efficacy -- the capacity of communities to exert social control toward the realization of their shared goals -- is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper, we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.
△ Less
Submitted 29 March, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Restricted Spatial Regression Methods: Implications for Inference
Authors:
Kori Khan,
Catherine A. Calder
Abstract:
The issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature. Multiple authors have offered perspectives and potential solutions. In this paper, for the areal spatial data setting, we show that many of the methods designed to alleviate spatial confounding can be viewed as special cases…
▽ More
The issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature. Multiple authors have offered perspectives and potential solutions. In this paper, for the areal spatial data setting, we show that many of the methods designed to alleviate spatial confounding can be viewed as special cases of a general class of models. We refer to this class as Restricted Spatial Regression (RSR) models, extending terminology currently in use. We offer a mathematically based exploration of the impact that RSR methods have on inference for regression coefficients for the linear model. We then explore whether these results hold in the generalized linear model setting for count data using simulations. We show that the use of these methods have counterintuitive consequences which defy the general expectations in the literature. In particular, our results and the accompanying simulations suggest that RSR methods will typically perform worse than non-spatial methods. These results have important implications for dimension reduction strategies in spatial regression modeling. Specifically, we demonstrate that the problems with RSR models cannot be fixed with a selection of "better" spatial basis vectors or dimension reduction techniques.
△ Less
Submitted 18 August, 2020; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Beyond Activity Space: Detecting Communities in Ecological Networks
Authors:
Wenna Xi,
Catherine A. Calder,
Christopher R. Browning
Abstract:
Emerging research suggests that the extent to which activity spaces -- the collection of an individual's routine activity locations -- overlap provides important information about the functioning of a city and its neighborhoods. To study patterns of overlap** activity spaces, we draw on the notion of an ecological network, a type of two-mode network with the two modes being individuals and the g…
▽ More
Emerging research suggests that the extent to which activity spaces -- the collection of an individual's routine activity locations -- overlap provides important information about the functioning of a city and its neighborhoods. To study patterns of overlap** activity spaces, we draw on the notion of an ecological network, a type of two-mode network with the two modes being individuals and the geographic locations where individuals perform routine activities. We describe a method for detecting "ecological communities" within these networks based on shared activity locations among individuals. Specifically, we identify latent activity pattern profiles, which, for each community, summarize its members' probability distribution of going to each location, and community assignment vectors, which, for each individual, summarize his/her probability distribution of belonging to each community. Using data from the Adolescent Health and Development in Context (AHDC) Study, we employ latent Dirichlet allocation (LDA) to identify activity pattern profiles and communities. We then explore differences across neighborhoods in the strength, and within-neighborhood consistency of community assignment. We hypothesize that these aspects of the neighborhood structure of ecological community membership capture meaningful dimensions of neighborhood functioning likely to co-vary with economic and racial composition. We discuss the implications of a focus on ecological communities for the conduct of "neighborhood effects" research more broadly.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
The Geometry of Continuous Latent Space Models for Network Data
Authors:
Anna L. Smith,
Dena M. Asta,
Catherine A. Calder
Abstract:
We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding n…
▽ More
We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding network nodes in a continuous space equipped with a geometry that facilitates the description of dependence between random dyadic ties. Specifically, these models naturally capture homophilous tendencies and triadic clustering, among other common properties of observed networks. In addition to reviewing the literature on continuous latent space models from a geometric perspective, we highlight the important role the geometry of the latent space plays on properties of networks arising from these models via intuition and simulation. Finally, we discuss results from spectral graph theory that allow us to explore the role of the geometry of the latent space, independent of network size. We conclude with conjectures about how these results might be used to infer the appropriate latent space geometry from observed networks.
△ Less
Submitted 25 March, 2019; v1 submitted 22 December, 2017;
originally announced December 2017.
-
Nonstationary Spatial Prediction of Soil Organic Carbon: Implications for Stock Assessment Decision Making
Authors:
Mark D. Risser,
Catherine A. Calder,
Veronica J. Berrocal,
Candace Berrett
Abstract:
The Rapid Carbon Assessment (RaCA) project was conducted by the US Department of Agriculture's National Resources Conservation Service between 2010-2012 in order to provide contemporaneous measurements of soil organic carbon (SOC) across the US. Despite the broad extent of the RaCA data collection effort, direct observations of SOC are not available at the high spatial resolution needed for studyi…
▽ More
The Rapid Carbon Assessment (RaCA) project was conducted by the US Department of Agriculture's National Resources Conservation Service between 2010-2012 in order to provide contemporaneous measurements of soil organic carbon (SOC) across the US. Despite the broad extent of the RaCA data collection effort, direct observations of SOC are not available at the high spatial resolution needed for studying carbon storage in soil and its implications for important problems in climate science and agriculture. As a result, there is a need for predicting SOC at spatial locations not included as part of the RaCA project. In this paper, we compare spatial prediction of SOC using a subset of the RaCA data for a variety of statistical methods. We investigate the performance of methods with off-the-shelf software available (both stationary and nonstationary) as well as a novel nonstationary approach based on partitioning relevant spatially-varying covariate processes. Our new method addresses open questions regarding (1) how to partition the spatial domain for segmentation-based nonstationary methods, (2) incorporating partially observed covariates into a spatial model, and (3) accounting for uncertainty in the partitioning. In applying the various statistical methods we find that there are minimal differences in out-of-sample criteria for this particular data set, however, there are major differences in maps of uncertainty in SOC predictions. We argue that the spatially-varying measures of prediction uncertainty produced by our new approach are valuable to decision makers, as they can be used to better benchmark mechanistic models, identify target areas for soil restoration projects, and inform carbon sequestration projects.
△ Less
Submitted 10 June, 2018; v1 submitted 19 August, 2016;
originally announced August 2016.
-
Empirical Reference Distributions for Networks of Different Size
Authors:
Anna Smith,
Catherine A. Calder,
Christopher R. Browning
Abstract:
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparis…
▽ More
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparison of raw and normalized statistics is often inappropriate. We examine a recent suggestion to normalize network statistics relative to Erdos-Renyi random graphs and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of co-location networks derived from the Los Angeles Family and Neighborhood Survey activity location data.
△ Less
Submitted 4 March, 2016; v1 submitted 10 September, 2015;
originally announced September 2015.
-
Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R
Authors:
Mark D. Risser,
Catherine A. Calder
Abstract:
In spite of the interest in and appeal of convolution-based approaches for nonstationary spatial modeling, off-the-shelf software for model fitting does not as of yet exist. Convolution-based models are highly flexible yet notoriously difficult to fit, even with relatively small data sets. The general lack of pre-packaged options for model fitting makes it difficult to compare new methodology in n…
▽ More
In spite of the interest in and appeal of convolution-based approaches for nonstationary spatial modeling, off-the-shelf software for model fitting does not as of yet exist. Convolution-based models are highly flexible yet notoriously difficult to fit, even with relatively small data sets. The general lack of pre-packaged options for model fitting makes it difficult to compare new methodology in nonstationary modeling with other existing methods, and as a result most new models are simply compared to stationary models. Using a convolution-based approach, we present a new nonstationary covariance function for spatial Gaussian process models that allows for efficient computing in two ways: first, by representing the spatially-varying parameters via a discrete mixture or "mixture component" model, and second, by estimating the mixture component parameters through a local likelihood approach. In order to make computation for a convolution-based nonstationary spatial model readily available, this paper also presents and describes the convoSPAT package for R. The nonstationary model is fit to both a synthetic data set and a real data application involving annual precipitation to demonstrate the capabilities of the package.
△ Less
Submitted 3 February, 2017; v1 submitted 30 July, 2015;
originally announced July 2015.
-
Regression-based covariance functions for nonstationary spatial modeling
Authors:
Mark D. Risser,
Catherine A. Calder
Abstract:
In many environmental applications involving spatially-referenced data, limitations on the number and locations of observations motivate the need for practical and efficient models for spatial interpolation, or kriging. A key component of models for continuously-indexed spatial data is the covariance function, which is traditionally assumed to belong to a parametric class of stationary models. How…
▽ More
In many environmental applications involving spatially-referenced data, limitations on the number and locations of observations motivate the need for practical and efficient models for spatial interpolation, or kriging. A key component of models for continuously-indexed spatial data is the covariance function, which is traditionally assumed to belong to a parametric class of stationary models. However, stationarity is rarely a realistic assumption. Alternative methods which more appropriately model the nonstationarity present in environmental processes often involve high-dimensional parameter spaces, which lead to difficulties in model fitting and interpretability. To overcome this issue, we build on the growing literature of covariate-driven nonstationary spatial modeling. Using process convolution techniques, we propose a Bayesian model for continuously-indexed spatial data based on a flexible parametric covariance regression structure for a convolution-kernel covariance matrix. The resulting model is a parsimonious representation of the kernel process, and we explore properties of the implied model, including a description of the resulting nonstationary covariance function and the interpretational benefits in the kernel parameters. Furthermore, we demonstrate that our model provides a practical compromise between stationary and highly parameterized nonstationary spatial covariance functions that do not perform well in practice. We illustrate our approach through an analysis of annual precipitation data.
△ Less
Submitted 4 February, 2015; v1 submitted 6 October, 2014;
originally announced October 2014.
-
Bilinear Mixed-Effects Models for Affiliation Networks
Authors:
Yanan Jia,
Catherine A. Calder,
Christopher R. Browning
Abstract:
An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. O…
▽ More
An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. One way to analyze affiliation networks is to consider one-mode network matrices that are derived from an affiliation network, but this approach may lead to the loss of important structural features of the data. The most comprehensive approach is to study both actors and events simultaneously. In this paper, we extend the bilinear mixed-effects model, a type of latent space model developed for one-mode networks, to the affiliation network setting by considering the dependence patterns in the interactions between actors and events and describe a Markov chain Monte Carlo algorithm for Bayesian inference. We use our model to explore patterns in extracurricular activity membership of students in a racially-diverse high school in a Midwestern metropolitan area. Using techniques from spatial point pattern analysis, we show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents.
△ Less
Submitted 10 June, 2015; v1 submitted 23 June, 2014;
originally announced June 2014.
-
Bayesian Spatial Binary Classification
Authors:
Candace Berrett,
Catherine A. Calder
Abstract:
In analyses of spatially-referenced data, researchers often have one of two goals: to quantify relationships between a response variable and covariates while accounting for residual spatial dependence or to predict the value of a response variable at unobserved locations. In this second case, when the response variable is categorical, prediction can be viewed as a classification problem. Many clas…
▽ More
In analyses of spatially-referenced data, researchers often have one of two goals: to quantify relationships between a response variable and covariates while accounting for residual spatial dependence or to predict the value of a response variable at unobserved locations. In this second case, when the response variable is categorical, prediction can be viewed as a classification problem. Many classification methods either ignore response-variable/covariate relationships and rely only on spatially proximate observations for classification, or they ignore spatial dependence and use only the covariates for classification. The Bayesian spatial generalized linear (mixed) model offers a tool to accommodate both spatial and covariate sources of information in classification problems. In this paper, we formally define spatial classification rules based on these models. We also take a close look at two of these models that have been proposed in the literature, namely the probit versions of the spatial generalized linear model (SGLM) and the Bayesian spatial generalized linear mixed model (SGLMM). We describe the implications of the seemingly slight differences between these models for spatial classification and explore the issue of robustness to model misspecification through a simulation study. We also provide an overview of alternatives to the SGLM/SGLMM-based classifiers and illustrate the various methods using satellite-derived land cover data from Southeast Asia.
△ Less
Submitted 8 January, 2016; v1 submitted 13 June, 2014;
originally announced June 2014.