Search | arXiv e-print repository

doi 10.1038/s41598-022-21720-4

The language of opinion change on social media under the lens of communicative action

Authors: Corrado Monti, Luca Maria Aiello, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dim… ▽ More Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called "delta". Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: Main paper: 13 pages, 1 figure, 3 tables. Supplementary material: 9 pages, 6 figures, 8 tables

ACM Class: H.4.0; K.4.0

Journal ref: Nature Scientific Reports 12, 17920 (2022)

arXiv:2208.04620 [pdf, other]

Cascade-based Echo Chamber Detection

Authors: Marco Minici, Federico Cinus, Corrado Monti, Francesco Bonchi, Giuseppe Manco

Abstract: Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-… ▽ More Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-chamber behavior and by an opinion polarity. Specifically, echo chambers are modeled as communities that are permeable to pieces of information with similar ideological polarity, and impermeable to information of opposed leaning: this allows discriminating echo chambers from communities that lack a clear ideological alignment. To learn the model parameters we propose a scalable, stochastic adaptation of the Generalized Expectation Maximization algorithm, that optimizes the joint likelihood of observing social connections and information propagation. Experiments on synthetic data show that our algorithm is able to correctly reconstruct ground-truth latent communities with their degree of echo-chamber behavior and opinion polarity. Experiments on real-world data about polarized social and political debates, such as the Brexit referendum or the COVID-19 vaccine campaign, confirm the effectiveness of our proposal in detecting echo chambers. Finally, we show how our model can improve accuracy in auxiliary predictive tasks, such as stance detection and prediction of future propagations. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted for publication at ACM CIKM 2022

arXiv:2205.05052 [pdf, other]

On learning agent-based models from data

Authors: Corrado Monti, Marco Pangallo, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the lat… ▽ More Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. The first step of our protocol is to reduce an ABM to a probabilistic model, characterized by a computationally tractable likelihood. This reduction follows two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. Then, our protocol proceeds by maximizing the likelihood of the latent variables via a gradient-based expectation maximization algorithm. We demonstrate our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We demonstrate that the obtained model allows accurate estimates of the latent variables, while preserving the general behavior of the ABM. We also show that our estimates can be used for out-of-sample forecasting. Our protocol can be seen as an alternative to black-box data assimilation methods, that forces the modeler to lay bare the assumptions of the model, to think about the inferential process, and to spot potential identification problems. △ Less

Submitted 23 November, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

arXiv:2112.00626 [pdf, other]

The Effect of People Recommenders on Echo Chambers and Polarization

Authors: Federico Cinus, Marco Minici, Corrado Monti, Francesco Bonchi

Abstract: The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the in… ▽ More The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the information and the opinions users are exposed to. In this paper, we propose a framework to assess the effect of people recommenders on the evolution of opinions. Our proposal is based on Monte Carlo simulations combining link recommendation and opinion-dynamics models. In order to control initial conditions, we define a random network model to generate graphs with opinions, with tunable amounts of modularity and homophily. We join these elements into a methodology to study the effects of the recommender system on echo chambers and polarization. We also show how to use our framework to measure, by means of simulations, the impact of different intervention strategies. Our thorough experimentation shows that people recommenders can in fact lead to a significant increase in echo chambers. However, this happens only if there is considerable initial homophily in the network. Also, we find that if the network already contains echo chambers, the effect of the recommendation algorithm is negligible. Such findings are robust to two very different opinion dynamics models, a bounded confidence model and an epistemological model. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: To appear in: Proceedings of the International AAAI Conference on Web and Social Media, vol. 16 (ICWSM '22)

ACM Class: I.6; J.4

arXiv:2003.09377 [pdf, other]

doi 10.1038/s41598-020-69464-3

Relevance of temporal cores for epidemic spread in temporal networks

Authors: Martino Ciaperoni, Edoardo Galimberti, Francesco Bonchi, Ciro Cattuto, Francesco Gullo, Alain Barrat

Abstract: Temporal networks are widely used to represent a vast diversity of systems, including in particular social interactions, and the spreading processes unfolding on top of them. The identification of structures playing important roles in such processes remains largely an open question, despite recent progresses in the case of static networks. Here, we consider as candidate structures the recently int… ▽ More Temporal networks are widely used to represent a vast diversity of systems, including in particular social interactions, and the spreading processes unfolding on top of them. The identification of structures playing important roles in such processes remains largely an open question, despite recent progresses in the case of static networks. Here, we consider as candidate structures the recently introduced concept of span-cores: the span-cores decompose a temporal network into subgraphs of controlled duration and increasing connectivity, generalizing the core-decomposition of static graphs. To assess the relevance of such structures, we explore the effectiveness of strategies aimed either at containing or maximizing the impact of a spread, based respectively on removing span-cores of high cohesiveness or duration to decrease the epidemic risk, or on seeding the process from such structures. The effectiveness of such strategies is assessed in a variety of empirical data sets and compared to baselines that use only static information on the centrality of nodes and static concepts of coreness, as well as to a baseline based on a temporal centrality measure. Our results show that the most stable and cohesive temporal cores play indeed an important role in epidemic processes on temporal networks, and that their nodes are likely to represent influential spreaders. △ Less

Submitted 9 July, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

Journal ref: Sci Rep 10, 12529 (2020)

arXiv:1912.00238 [pdf, other]

Visualizing structural balance in signed networks

Authors: Edoardo Galimberti, Chiara Madeddu, Francesco Bonchi, Giancarlo Ruffo

Abstract: Network visualization has established as a key complement to network analysis since the large variety of existing network layouts are able to graphically highlight different properties of networks. However, signed networks, i.e., networks whose edges are labeled as friendly (positive) or antagonistic (negative), are target of few of such layouts and none, to our knowledge, is able to show structur… ▽ More Network visualization has established as a key complement to network analysis since the large variety of existing network layouts are able to graphically highlight different properties of networks. However, signed networks, i.e., networks whose edges are labeled as friendly (positive) or antagonistic (negative), are target of few of such layouts and none, to our knowledge, is able to show structural balance, i.e., the tendency of cycles towards including an even number of negative edges, which is a well-known theory for studying friction and polarization. In this work we present Structural-balance-viz: a novel visualization method showing whether a connected signed network is balanced or not and, in the latter case, how close the network is to be balanced. Structural-balance-viz exploits spectral computations of the signed Laplacian matrix to place network's nodes in a Cartesian coordinate system resembling a balance (a scale). Moreover, it uses edge coloring and bundling to distinguish positive and negative interactions. The proposed visualization method has characteristics desirable in a variety of network analysis tasks: Structural-balance-viz is able to provide indications of balance/polarization of the whole network and of each node, to identify two factions of nodes on the basis of their polarization, and to show their cumulative characteristics. Moreover, the layout is reproducible and easy to compare. Structural-balance-viz is validated over synthetic-generated networks and applied to a real-world dataset about political debates confirming that it is able to provide meaningful interpretations. △ Less

Submitted 30 November, 2019; originally announced December 2019.

arXiv:1910.03645 [pdf, other]

doi 10.1145/3418226

Span-core Decomposition for Temporal Networks: Algorithms and Applications

Authors: Edoardo Galimberti, Martino Ciaperoni, Alain Barrat, Francesco Bonchi, Ciro Cattuto, Francesco Gullo

Abstract: When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). In this paper we tackle this task by introducing a notion of temporal core decomposition where each core is associated with two quantities, its c… ▽ More When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). In this paper we tackle this task by introducing a notion of temporal core decomposition where each core is associated with two quantities, its coreness, which quantifies how densely it is connected, and its span, which is a temporal interval: we call such cores \emph{span-cores}. For a temporal network defined on a discrete temporal domain $T$, the total number of time intervals included in $T$ is quadratic in $|T|$, so that the total number of span-cores is potentially quadratic in $|T|$ as well. Our first main contribution is an algorithm that, by exploiting containment properties among span-cores, computes all the span-cores efficiently. Then, we focus on the problem of finding only the \emph{maximal span-cores}, i.e., span-cores that are not dominated by any other span-core by both their coreness property and their span. We devise a very efficient algorithm that exploits theoretical findings on the maximality condition to directly extract the maximal ones without computing all span-cores. Finally, as a third contribution, we introduce the problem of \emph{temporal community search}, where a set of query vertices is given as input, and the goal is to find a set of densely-connected subgraphs containing the query vertices and covering the whole underlying temporal domain $T$. We derive a connection between this problem and the problem of finding (maximal) span-cores. Based on this connection, we show how temporal community search can be solved in polynomial-time via dynamic programming, and how the maximal span-cores can be profitably exploited to significantly speed-up the basic algorithm. △ Less

Submitted 31 July, 2020; v1 submitted 6 October, 2019; originally announced October 2019.

Comments: ACM Transactions on Knowledge Discovery from Data (TKDD), 2020. arXiv admin note: substantial text overlap with arXiv:1808.09376

Journal ref: ACM Transactions on Knowledge Discovery from Data 15 (1):2 (2020)

arXiv:1808.09376 [pdf, other]

doi 10.1145/3269206.3271767

Mining (maximal) span-cores from temporal networks

Authors: Edoardo Galimberti, Alain Barrat, Francesco Bonchi, Ciro Cattuto, Francesco Gullo

Abstract: When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). We tackle this task by introducing a notion of temporal core decomposition where each core is associated with its span: we call such cores span-c… ▽ More When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). We tackle this task by introducing a notion of temporal core decomposition where each core is associated with its span: we call such cores span-cores. As the total number of time intervals is quadratic in the size of the temporal domain $T$ under analysis, the total number of span-cores is quadratic in $|T|$ as well. Our first contribution is an algorithm that, by exploiting containment properties among span-cores, computes all the span-cores efficiently. Then, we focus on the problem of finding only the maximal span-cores, i.e., span-cores that are not dominated by any other span-core by both the coreness property and the span. We devise a very efficient algorithm that exploits theoretical findings on the maximality condition to directly compute the maximal ones without computing all span-cores. Experimentation on several real-world temporal networks confirms the efficiency and scalability of our methods. Applications on temporal networks, gathered by a proximity-sensing infrastructure recording face-to-face interactions in schools, highlight the relevance of the notion of (maximal) span-core in analyzing social dynamics and detecting/correcting anomalies in the data. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Journal ref: CIKM 2018, October 22-26, 2018, Torino, Italy

arXiv:1808.02129 [pdf, other]

Probabilistic Causal Analysis of Social Influence

Authors: Francesco Bonchi, Francesco Gullo, Bud Mishra, Daniele Ramazzotti

Abstract: Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recen… ▽ More Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize social influence, and, in general, most data-science analyses focus on correlations, statistical independence, or conditional independence. Only recently, there has been a resurgence of interest in "causal data science", e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in the theory of probabilistic causation. Our approach consists of two phases. In the first one, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes ("Simpson's paradox"), we partition the set of propagation traces into groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. To achieve this goal, we borrow the notion of "agony" and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second phase, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social graph, the effectiveness of predicting influence spread is significantly improved. △ Less

Submitted 29 August, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

Journal ref: CIKM 18, October 22-26, 2018, Torino, Italy

arXiv:1508.05044 [pdf, other]

Cultures in Community Question Answering

Authors: Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, Francesco Bonchi

Abstract: CQA services are collaborative platforms where users ask and answer questions. We investigate the influence of national culture on people's online questioning and answering behavior. For this, we analyzed a sample of 200 thousand users in Yahoo Answers from 67 countries. We measure empirically a set of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert Levine's Pace of Lif… ▽ More CQA services are collaborative platforms where users ask and answer questions. We investigate the influence of national culture on people's online questioning and answering behavior. For this, we analyzed a sample of 200 thousand users in Yahoo Answers from 67 countries. We measure empirically a set of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert Levine's Pace of Life and show that behavioral cultural differences exist in community question answering platforms. We find that national cultures differ in Yahoo Answers along a number of dimensions such as temporal predictability of activities, contribution-related behavioral patterns, privacy concerns, and power inequality. △ Less

Submitted 20 August, 2015; originally announced August 2015.

Comments: Published in the proceedings of the 26th ACM Conference on Hypertext and Social Media (HT'15)

arXiv:1507.04314 [pdf, other]

doi 10.1145/2736277.2741674

The Social World of Content Abusers in Community Question Answering

Authors: Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, Francesco Bonchi

Abstract: Community-based question answering platforms can be rich sources of information on a variety of specialized topics, from finance to cooking. The usefulness of such platforms depends heavily on user contributions (questions and answers), but also on respecting the community rules. As a crowd-sourced service, such platforms rely on their users for monitoring and flagging content that violates commun… ▽ More Community-based question answering platforms can be rich sources of information on a variety of specialized topics, from finance to cooking. The usefulness of such platforms depends heavily on user contributions (questions and answers), but also on respecting the community rules. As a crowd-sourced service, such platforms rely on their users for monitoring and flagging content that violates community rules. Common wisdom is to eliminate the users who receive many flags. Our analysis of a year of traces from a mature Q&A site shows that the number of flags does not tell the full story: on one hand, users with many flags may still contribute positively to the community. On the other hand, users who never get flagged are found to violate community rules and get their accounts suspended. This analysis, however, also shows that abusive users are betrayed by their network properties: we find strong evidence of homophilous behavior and use this finding to detect abusive users who go under the community radar. Based on our empirical observations, we build a classifier that is able to detect abusive users with an accuracy as high as 83%. △ Less

Submitted 15 July, 2015; originally announced July 2015.

Comments: Published in the proceedings of the 24th International World Wide Web Conference (WWW 2015)

ACM Class: K.4.2

arXiv:1302.6276 [pdf, other]

The Role of Information Diffusion in the Evolution of Social Networks

Authors: Lilian Weng, Jacob Ratkiewicz, Nicola Perra, Bruno Gonçalves, Carlos Castillo, Francesco Bonchi, Rossano Schifanella, Filippo Menczer, Alessandro Flammini

Abstract: Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, reveali… ▽ More Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. While the network structure affects the spread of information among users, the network is in turn shaped by this communication activity. This suggests a link creation mechanism whereby Alice is more likely to follow Charlie after seeing many messages by Charlie. We characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach. Triadic closure does have a strong effect on link formation, but shortcuts based on traffic are another key factor in interpreting network evolution. However, individual strategies for following other users are highly heterogeneous. Link creation behaviors can be summarized by classifying users in different categories with distinct structural and behavioral characteristics. Users who are popular, active, and influential tend to create traffic-based shortcuts, making the information diffusion process more efficient in the network. △ Less

Submitted 20 June, 2013; v1 submitted 25 February, 2013; originally announced February 2013.

Comments: 9 pages, 10 figures, 2 tables

ACM Class: H.1; J.4; H.1.2

Journal ref: Proc. 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2013)

arXiv:1104.3791 [pdf, ps, other]

doi 10.1080/15427951.2012.625256

Fast matrix computations for pair-wise and column-wise commute times and Katz scores

Authors: Francesco Bonchi, Pooya Esfandiar, David F. Gleich, Chen Greif, Laks V. S. Lakshmanan

Abstract: We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and K… ▽ More We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance. △ Less

Submitted 19 April, 2011; originally announced April 2011.

Comments: 35 pages, journal version of http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for publication. Please see http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for supplemental codes

Showing 1–13 of 13 results for author: Bonchi, F