-
Going beyond accuracy: estimating homophily in social networks using predictions
Authors:
George Berry,
Antonio Sirianni,
Ingmar Weber,
Jisun An,
Michael Macy
Abstract:
In online social networks, it is common to use predictions of node categories to estimate measures of homophily and other relational properties. However, online social network data often lacks basic demographic information about the nodes. Researchers must rely on predicted node attributes to estimate measures of homophily, but little is known about the validity of these measures. We show that est…
▽ More
In online social networks, it is common to use predictions of node categories to estimate measures of homophily and other relational properties. However, online social network data often lacks basic demographic information about the nodes. Researchers must rely on predicted node attributes to estimate measures of homophily, but little is known about the validity of these measures. We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Node-level prediction models, such as the use of names to classify ethnicity or gender, do not generally have this property and can introduce large biases into homophily estimates. Bias occurs due to error autocorrelation along dyads. Importantly, node-level classification performance is not a reliable indicator of estimation accuracy for homophily. We compare estimation strategies that make predictions at the node and dyad levels, evaluating performance in different settings. We propose a novel "ego-alter" modeling approach that outperforms standard node and dyad classification strategies. While this paper focuses on homophily, results generalize to other relational measures which aggregate predictions along the dyads in a network. We conclude with suggestions for research designs to study homophily in online networks. Code for this paper is available at https://github.com/georgeberry/autocorr.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Towards Coq-verified Esterel Semantics and Compiling
Authors:
Gérard Berry,
Lionel Rieg
Abstract:
This paper focuses on formally specifying and verifying the chain of formal semantics of the Esterel synchronous programming language using the Coq proof assistant. In particular, in addition to the standard logical (LBS) semantics, constructive semantics (CBS) and constructive state semantics (CSS), we introduce a novel microstep semantics that gets rid of the Must/Can potential function pair of…
▽ More
This paper focuses on formally specifying and verifying the chain of formal semantics of the Esterel synchronous programming language using the Coq proof assistant. In particular, in addition to the standard logical (LBS) semantics, constructive semantics (CBS) and constructive state semantics (CSS), we introduce a novel microstep semantics that gets rid of the Must/Can potential function pair of the constructive semantics and can be viewed as an abstract version of Esterel's circuit semantics used by compilers to generate software code and hardware designs. The paper also provides formal proofs in Coq of the equivalence beween the CBS and CSS semantics and of the refinement of the CSS by the microstep semantics.
△ Less
Submitted 23 September, 2022; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Role action embeddings: scalable representation of network positions
Authors:
George Berry
Abstract:
We consider the question of embedding nodes with similar local neighborhoods together in embedding space, commonly referred to as "role embeddings." We propose RAE, an unsupervised framework that learns role embeddings. It combines a within-node loss function and a graph neural network (GNN) architecture to place nodes with similar local neighborhoods close in embedding space. We also propose a fa…
▽ More
We consider the question of embedding nodes with similar local neighborhoods together in embedding space, commonly referred to as "role embeddings." We propose RAE, an unsupervised framework that learns role embeddings. It combines a within-node loss function and a graph neural network (GNN) architecture to place nodes with similar local neighborhoods close in embedding space. We also propose a faster way of generating negative examples called neighbor shuffling, which quickly creates negative examples directly within batches. These techniques can be easily combined with existing GNN methods to create unsupervised role embeddings at scale. We then explore role action embeddings, which summarize the non-structural features in a node's neighborhood, leading to better performance on node classification tasks. We find that the model architecture proposed here provides strong performance on both graph and node classification tasks, in some cases competitive with semi-supervised methods.
△ Less
Submitted 2 December, 2018; v1 submitted 19 November, 2018;
originally announced November 2018.
-
Estimating group properties in online social networks with a classifier
Authors:
George Berry,
Antonio Sirianni,
Nathan High,
Agrippa Kellum,
Ingmar Weber,
Michael Macy
Abstract:
We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: the network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a…
▽ More
We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: the network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: 1) walking the graph starting from an arbitrary node; 2) learning a classifier on the nodes in the walk; and 3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: the proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman's homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.
-
Discussion quality diffuses in the digital public square
Authors:
George Berry,
Sean J. Taylor
Abstract:
Studies of online social influence have demonstrated that friends have important effects on many types of behavior in a wide variety of settings. However, we know much less about how influence works among relative strangers in digital public squares, despite important conversations happening in such spaces. We present the results of a study on large public Facebook pages where we randomly used two…
▽ More
Studies of online social influence have demonstrated that friends have important effects on many types of behavior in a wide variety of settings. However, we know much less about how influence works among relative strangers in digital public squares, despite important conversations happening in such spaces. We present the results of a study on large public Facebook pages where we randomly used two different methods--most recent and social feedback--to order comments on posts. We find that the social feedback condition results in higher quality viewed comments and response comments. After measuring the average quality of comments written by users before the study, we find that social feedback has a positive effect on response quality for both low and high quality commenters. We draw on a theoretical framework of social norms to explain this empirical result. In order to examine the influence mechanism further, we measure the similarity between comments viewed and written during the study, finding that similarity increases for the highest quality contributors under the social feedback condition. This suggests that, in addition to norms, some individuals may respond with increased relevance to high-quality comments.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
The Opacity Problem in Social Contagion
Authors:
George Berry,
Christopher J. Cameron,
Patrick Park,
Michael W. Macy
Abstract:
Fads, product adoption, mobs, rumors, memes, and emergent norms are diverse social contagions that have been modeled as network cascades. Empirical study of these cascades is vulnerable to what we describe as the "opacity problem": the inability to observe the critical level of peer influence required to trigger an individual's behavioral change. Even with maximal information, network cascades rev…
▽ More
Fads, product adoption, mobs, rumors, memes, and emergent norms are diverse social contagions that have been modeled as network cascades. Empirical study of these cascades is vulnerable to what we describe as the "opacity problem": the inability to observe the critical level of peer influence required to trigger an individual's behavioral change. Even with maximal information, network cascades reveal intervals that bound critical levels of peer exposure, rather than critical values themselves. Existing practice uses interval maxima, which systematically over-estimates the social influence required for behavioral change. Simulations reveal that the over-estimation is likely common and large in magnitude. This is confirmed by an empirical study of hashtag cascades among 3.2 million Twitter users: one in five hashtag adoptions suffers critical value uncertainty due to the opacity problem. Different assumptions about these intervals lead to qualitatively different conclusions about the role of peer reinforcement in diffusion. We introduce a solution that combines identifying tightly bounded intervals with predicting uncertain critical values using node-level information.
△ Less
Submitted 19 November, 2018; v1 submitted 8 February, 2017;
originally announced February 2017.
-
Hop and HipHop : Multitier Web Orchestration
Authors:
Gérard Berry,
Manuel Serrano
Abstract:
Rich applications merge classical computing, client-server concurrency, web-based interfaces, and the complex time- and event-based reactive programming found in embedded systems. To handle them, we extend the Hop web programming platform by HipHop, a domain-specific language dedicated to event-based process orchestration. Borrowing the synchronous reactive model of Esterel, HipHop is based on syn…
▽ More
Rich applications merge classical computing, client-server concurrency, web-based interfaces, and the complex time- and event-based reactive programming found in embedded systems. To handle them, we extend the Hop web programming platform by HipHop, a domain-specific language dedicated to event-based process orchestration. Borrowing the synchronous reactive model of Esterel, HipHop is based on synchronous concurrency and preemption primitives that are known to be key components for the modular design of complex reactive behaviors. HipHop departs from Esterel by its ability to handle the dynamicity of Web applications, thanks to the reflexivity of Hop. Using a music player example, we show how to modularly build a non-trivial Hop application using HipHop orchestration code.
△ Less
Submitted 30 November, 2013;
originally announced December 2013.