-
Model-Based Inference and Experimental Design for Interference Using Partial Network Data
Authors:
Steven Wilkins Reeves,
Shane Lubold,
Arun G. Chandrasekhar,
Tyler H. McCormick
Abstract:
The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete…
▽ More
The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Robustly estimating heterogeneity in factorial data using Rashomon Partitions
Authors:
Aparajithan Venkateswaran,
Anirudh Sankar,
Arun G. Chandrasekhar,
Tyler H. McCormick
Abstract:
Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into "pools" of covariate combinations where the outco…
▽ More
Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into "pools" of covariate combinations where the outcome differs across the pools (but not within a pool). Existing approaches (i) search for a single "optimal" partition under assumptions about the association between covariates or (ii) sample from the entire set of possible partitions. Both these approaches ignore the reality that, especially with correlation structure in covariates, many ways to partition the covariate space may be statistically indistinguishable, despite very different implications for policy or science. We develop an alternative perspective, called Rashomon Partition Sets (RPSs). Each item in the RPS partitions the space of covariates using a tree-like geometry. RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations, and do so using a prior that makes no assumptions about associations between covariates. This prior is the $\ell_0$ prior, which we show is minimax optimal. Given the RPS we calculate the posterior of any measurable function of the feature effects vector on outcomes, conditional on being in the RPS. We also characterize approximation error relative to the entire posterior and provide bounds on the size of the RPS. Simulations demonstrate this framework allows for robust conclusions relative to conventional regularization techniques. We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance.
△ Less
Submitted 25 June, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Non-robustness of diffusion estimates on networks with measurement error
Authors:
Arun G. Chandrasekhar,
Paul Goldsmith-Pinkham,
Tyler H. McCormick,
Samuel Thau,
Jerry Wei
Abstract:
Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small,…
▽ More
Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth. Second, a small mismeasurement in the identity of the initial seed generates a large shift in the locations of expected diffusion path. We show that both of these results still hold when the vanishing measurement error is only local in nature. Such non-robustness in forecasting exists even under conditions where the basic reproductive number is consistently estimable. Possible solutions, such as estimating the measurement error or implementing widespread detection efforts, still face difficulties because the number of missed links are so small. Finally, we conduct Monte Carlo simulations on simulated networks, and real networks from three settings: travel data from the COVID-19 pandemic in the western US, a mobile phone marketing campaign in rural India, and in an insurance experiment in China.
△ Less
Submitted 11 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Identifying the latent space geometry of network models through analysis of curvature
Authors:
Shane Lubold,
Arun G. Chandrasekhar,
Tyler H. McCormick
Abstract:
A common approach to modeling networks assigns each node to a position on a low-dimensional manifold where distance is inversely proportional to connection likelihood. More positive manifold curvature encourages more and tighter communities; negative curvature induces repulsion. We consistently estimate manifold type, dimension, and curvature from simply connected, complete Riemannian manifolds of…
▽ More
A common approach to modeling networks assigns each node to a position on a low-dimensional manifold where distance is inversely proportional to connection likelihood. More positive manifold curvature encourages more and tighter communities; negative curvature induces repulsion. We consistently estimate manifold type, dimension, and curvature from simply connected, complete Riemannian manifolds of constant curvature. We represent the graph as a noisy distance matrix based on the ties between cliques, then develop hypothesis tests to determine whether the observed distances could plausibly be embedded isometrically in each of the candidate geometries. We apply our approach to data-sets from economics and neuroscience.
△ Less
Submitted 30 December, 2022; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Consistently estimating network statistics using Aggregated Relational Data
Authors:
Emily Breza,
Arun G. Chandrasekhar,
Shane Lubold,
Tyler H. McCormick,
Mengjie Pan
Abstract:
Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which capture information about a social network by asking a respondent questions of the form ``How many people with trait X do you know?'' provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of in…
▽ More
Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which capture information about a social network by asking a respondent questions of the form ``How many people with trait X do you know?'' provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collects the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We do this by first providing consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation behind these results is that cross-group link probabilities for a collection of (possibly unobserved) groups identifies the model parameters, meaning ARD is sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality or response functions by or of the unobserved network, such as regression coefficients.
△ Less
Submitted 21 October, 2022; v1 submitted 26 August, 2019;
originally announced August 2019.
-
When Celebrities Speak: A Nationwide Twitter Experiment Promoting Vaccination in Indonesia
Authors:
Vivi Alatas,
Arun G. Chandrasekhar,
Markus Mobius,
Benjamin A. Olken,
Cindy Paladines
Abstract:
Celebrity endorsements are often sought to influence public opinion. We ask whether celebrity endorsement per se has an effect beyond the fact that their statements are seen by many, and whether on net their statements actually lead people to change their beliefs. To do so, we conducted a nationwide Twitter experiment in Indonesia with 46 high-profile celebrities and organizations, with a total of…
▽ More
Celebrity endorsements are often sought to influence public opinion. We ask whether celebrity endorsement per se has an effect beyond the fact that their statements are seen by many, and whether on net their statements actually lead people to change their beliefs. To do so, we conducted a nationwide Twitter experiment in Indonesia with 46 high-profile celebrities and organizations, with a total of 7.8 million followers, who agreed to let us randomly tweet or retweet content promoting immunization from their accounts. Our design exploits the structure of what information is passed on along a retweet chain on Twitter to parse reach versus endorsement effects. Endorsements matter: tweets that users can identify as being originated by a celebrity are far more likely to be liked or retweeted by users than similar tweets seen by the same users but without the celebrities' imprimatur. By contrast, explicitly citing sources in the tweets actually reduces diffusion. By randomizing which celebrities tweeted when, we find suggestive evidence that overall exposure to the campaign may influence beliefs about vaccination and knowledge of immunization-seeking behavior by one's network. Taken together, the findings suggest an important role for celebrity endorsement.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
A Network Formation Model Based on Subgraphs
Authors:
Arun G. Chandrasekhar,
Matthew O. Jackson
Abstract:
We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimates in empirically relevant cases. We…
▽ More
We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimates in empirically relevant cases. We show that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): (i) stochastic block models; (ii) models with node-level unobserved heterogeneity; (iii) latent space models; (iv) exponential random graphs. We illustrate the framework's value via several applications using networks from rural India. We study whether network structure helps enforce risk-sharing and whether cross-caste interactions are more likely to be private. We also develop a new central limit theorem for correlated random variables, which is required to prove our results and is of independent interest.
△ Less
Submitted 9 November, 2023; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Using Gossips to Spread Information: Theory and Evidence from a Randomized Controlled Trial
Authors:
Abhijit Banerjee,
Arun G. Chandrasekhar,
Esther Duflo,
Matthew O. Jackson
Abstract:
Is it possible to identify individuals who are highly central in a community without gathering any network information, simply by asking a few people? If we use people's nominees as seeds for a diffusion process, will it be successful? We explore these questions theoretically, via surveys, and via field experiments. We show via a model of information flow how members of a community can, just by tr…
▽ More
Is it possible to identify individuals who are highly central in a community without gathering any network information, simply by asking a few people? If we use people's nominees as seeds for a diffusion process, will it be successful? We explore these questions theoretically, via surveys, and via field experiments. We show via a model of information flow how members of a community can, just by tracking gossip about others, identify highly central individuals in their network. Asking villagers in rural Indian villages to name good seeds for diffusion, we find that they accurately nominate those who are central according to a measure tailored for diffusion - not just those with many friends or in powerful positions. Finally, we run a randomized field experiment in 213 other villages that tests how effective it is to use such nominations as seeds for a diffusion process. Relative to random seeds or those with high social status, hitting at least one seed nominated by villagers leads to more than a 65% increase in the spread of information.
△ Less
Submitted 8 May, 2017; v1 submitted 9 June, 2014;
originally announced June 2014.
-
Tractable and Consistent Random Graph Models
Authors:
Arun G. Chandrasekhar,
Matthew O. Jackson
Abstract:
We define a general class of network formation models, Statistical Exponential Random Graph Models (SERGMs), that nest standard exponential random graph models (ERGMs) as a special case. We provide the first general results on when these models' (including ERGMs) parameters estimated from the observation of a single network are consistent (i.e., become accurate as the number of nodes grows). Next,…
▽ More
We define a general class of network formation models, Statistical Exponential Random Graph Models (SERGMs), that nest standard exponential random graph models (ERGMs) as a special case. We provide the first general results on when these models' (including ERGMs) parameters estimated from the observation of a single network are consistent (i.e., become accurate as the number of nodes grows). Next, addressing the problem that standard techniques of estimating ERGMs have been shown to have exponentially slow mixing times for many specifications, we show that by reformulating network formation as a distribution over the space of sufficient statistics instead of the space of networks, the size of the space of estimation can be greatly reduced, making estimation practical and easy. We also develop a related, but distinct, class of models that we call subgraph generation models (SUGMs) that are useful for modeling sparse networks and whose parameter estimates are also directly and easily estimable, consistent, and asymptotically normally distributed. Finally, we show how choice-based (strategic) network formation models can be written as SERGMs and SUGMs, and apply our models and techniques to network data from rural Indian villages.
△ Less
Submitted 25 June, 2014; v1 submitted 27 October, 2012;
originally announced October 2012.