-
Variational Inference of Parameters in Opinion Dynamics Models
Authors:
Jacopo Lenti,
Fabrizio Silvestri,
Gianmarco De Francisci Morales
Abstract:
Despite the frequent use of agent-based models (ABMs) for studying social phenomena, parameter estimation remains a challenge, often relying on costly simulation-based heuristics. This work uses variational inference to estimate the parameters of an opinion dynamics ABM, by transforming the estimation problem into an optimization task that can be solved directly.
Our proposal relies on probabili…
▽ More
Despite the frequent use of agent-based models (ABMs) for studying social phenomena, parameter estimation remains a challenge, often relying on costly simulation-based heuristics. This work uses variational inference to estimate the parameters of an opinion dynamics ABM, by transforming the estimation problem into an optimization task that can be solved directly.
Our proposal relies on probabilistic generative ABMs (PGABMs): we start by synthesizing a probabilistic generative model from the ABM rules. Then, we transform the inference process into an optimization problem suitable for automatic differentiation. In particular, we use the Gumbel-Softmax reparameterization for categorical agent attributes and stochastic variational inference for parameter estimation. Furthermore, we explore the trade-offs of using variational distributions with different complexity: normal distributions and normalizing flows.
We validate our method on a bounded confidence model with agent roles (leaders and followers). Our approach estimates both macroscopic (bounded confidence intervals and backfire thresholds) and microscopic ($200$ categorical, agent-level roles) more accurately than simulation-based and MCMC methods. Consequently, our technique enables experts to tune and validate their ABMs against real-world observations, thus providing insights into human behavior in social systems via data-driven analysis.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
A Higher-Order Lens for Social Systems
Authors:
Giulia Preti,
Adriano Fazzone,
Giovanni Petri,
Gianmarco De Francisci Morales
Abstract:
Despite the widespread adoption of higher-order mathematical structures such as hypergraphs, methodological tools for their analysis lag behind those for traditional graphs. This work addresses a critical gap in this context by proposing two micro-canonical random null models for directed hypergraphs: the Directed Hypergraph Configuration Model (DHCM) and the Directed Hypergraph JOINT Model (DHJM)…
▽ More
Despite the widespread adoption of higher-order mathematical structures such as hypergraphs, methodological tools for their analysis lag behind those for traditional graphs. This work addresses a critical gap in this context by proposing two micro-canonical random null models for directed hypergraphs: the Directed Hypergraph Configuration Model (DHCM) and the Directed Hypergraph JOINT Model (DHJM). These models preserve essential structural properties of directed hypergraphs such as node in- and out-degree sequences and hyperedge head and tail size sequences, or their joint tensor. We also describe two efficient MCMC algorithms, NuDHy-Degs and NuDHy-JOINT, to sample random hypergraphs from these ensembles.
To showcase the interdisciplinary applicability of the proposed null models, we present three distinct use cases in sociology, epidemiology, and economics. First, we reveal the oscillatory behavior of increased homophily in opposition parties in the US Congress over a 40-year span, emphasizing the role of higher-order structures in quantifying political group homophily. Second, we investigate non-linear contagion in contact hyper-networks, demonstrating that disparities between simulations and theoretical predictions can be explained by considering higher-order joint degree distributions. Last, we examine the economic complexity of countries in the global trade network, showing that local network properties preserved by NuDHy explain the main structural economic complexity indexes.
This work pioneers the development of null models for directed hypergraphs, addressing the intricate challenges posed by their complex entity relations, and providing a versatile suite of tools for researchers across various domains.
△ Less
Submitted 7 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
What we can learn from TikTok through its Research API
Authors:
Francesco Corso,
Francesco Pierri,
Gianmarco De Francisci Morales
Abstract:
TikTok is a social media platform that has gained immense popularity over the last few years, particularly among younger demographics, due to the viral trends and challenges shared worldwide. The recent release of a free Research API opens the door to collecting data on posted videos, associated comments, and user activities. Our study focuses on evaluating the reliability of the results returned…
▽ More
TikTok is a social media platform that has gained immense popularity over the last few years, particularly among younger demographics, due to the viral trends and challenges shared worldwide. The recent release of a free Research API opens the door to collecting data on posted videos, associated comments, and user activities. Our study focuses on evaluating the reliability of the results returned by the Research API, by collecting and analyzing a random sample of TikTok videos posted in a span of 6 years. Our preliminary results are instrumental for future research that aims to study the platform, highlighting caveats on the geographical distribution of videos and on the global prevalence of viral and conspiratorial hashtags.
△ Less
Submitted 4 April, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Navigating Multidimensional Ideologies with Reddit's Political Compass: Economic Conflict and Social Affinity
Authors:
Ernesto Colacrai,
Federico Cinus,
Gianmarco De Francisci Morales,
Michele Starnini
Abstract:
The prevalent perspective in quantitative research on opinion dynamics flattens the landscape of the online political discourse into a traditional left--right dichotomy. While this approach helps simplify the analysis and modeling effort, it also neglects the intrinsic multidimensional richness of ideologies. In this study, we analyze social interactions on Reddit, under the lens of a multi-dimens…
▽ More
The prevalent perspective in quantitative research on opinion dynamics flattens the landscape of the online political discourse into a traditional left--right dichotomy. While this approach helps simplify the analysis and modeling effort, it also neglects the intrinsic multidimensional richness of ideologies. In this study, we analyze social interactions on Reddit, under the lens of a multi-dimensional ideological framework: the political compass. We examine over 8 million comments posted on the subreddits /r/PoliticalCompass and /r/PoliticalCompassMemes during 2020--2022. By leveraging their self-declarations, we disentangle the ideological dimensions of users into economic (left--right) and social (libertarian--authoritarian) axes. In addition, we characterize users by their demographic attributes (age, gender, and affluence).
We find significant homophily for interactions along the social axis of the political compass and demographic attributes. Compared to a null model, interactions among individuals of similar ideology surpass expectations by 6%. In contrast, we uncover a significant heterophily along the economic axis: left/right interactions exceed expectations by 10%. Furthermore, heterophilic interactions are characterized by a higher language toxicity than homophilic interactions, which hints at a conflictual discourse between every opposite ideology. Our results help reconcile apparent contradictions in recent literature, which found a superposition of homophilic and heterophilic interactions in online political discussions. By disentangling such interactions into the economic and social axes we pave the way for a deeper understanding of opinion dynamics on social media.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Extracting the Multiscale Causal Backbone of Brain Dynamics
Authors:
Gabriele D'Acunto,
Francesco Bonchi,
Gianmarco De Francisci Morales,
Giovanni Petri
Abstract:
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach le…
▽ More
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective.
△ Less
Submitted 19 March, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Measuring Behavior Change with Observational Studies: a Review
Authors:
Arianna Pera,
Gianmarco de Francisci Morales,
Luca Maria Aiello
Abstract:
Exploring behavioral change in the digital age is imperative for societal progress in the context of 21st-century challenges. We analyzed 148 articles (2000-2023) and built a map that categorizes behaviors and change detection methodologies, platforms of reference, and theoretical frameworks that characterize online behavior change. Our findings uncover a focus on sentiment shifts, an emphasis on…
▽ More
Exploring behavioral change in the digital age is imperative for societal progress in the context of 21st-century challenges. We analyzed 148 articles (2000-2023) and built a map that categorizes behaviors and change detection methodologies, platforms of reference, and theoretical frameworks that characterize online behavior change. Our findings uncover a focus on sentiment shifts, an emphasis on API-restricted platforms, and limited theory integration. We call for methodologies able to capture a wider range of behavioral types, diverse data sources, and stronger theory-practice alignment in the study of online behavioral change.
△ Less
Submitted 2 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Generating collective counterfactual explanations in score-based classification via mathematical optimization
Authors:
Emilio Carrizosa,
Jasone Ramírez-Ayerbe,
Dolores Romero Morales
Abstract:
Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modifie…
▽ More
Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Systematic discrepancies in the delivery of political ads on Facebook and Instagram
Authors:
Dominik Bär,
Francesco Pierri,
Gianmarco De Francisci Morales,
Stefan Feuerriegel
Abstract:
Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this paper, we analyze targeted political advertising on social media via a unique, large-scale dataset…
▽ More
Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this paper, we analyze targeted political advertising on social media via a unique, large-scale dataset of over 80000 political ads from Meta during the 2021 German federal election, with more than 1.1 billion impressions. For each political ad, our dataset records granular information about targeting strategies, spending, and actual impressions. We then study (i) the prevalence of targeted ads across the political spectrum; (ii) the discrepancies between targeted and actual audiences due to algorithmic ad delivery; and (iii) which targeting strategies on social media attain a wide reach at low cost. We find that targeted ads are prevalent across the entire political spectrum. Moreover, there are considerable discrepancies between targeted and actual audiences, and systematic differences in the reach of political ads (in impressions-per-EUR) among parties, where the algorithm favors ads from populists over others.
△ Less
Submitted 24 June, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models
Authors:
Jacopo Lenti,
Corrado Monti,
Gianmarco De Francisci Morales
Abstract:
We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects t…
▽ More
We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects the unknown parameters to the observed data in a statistically principled way. We compare these two approaches on the well-known bounded-confidence model of opinion dynamics. We do so on three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. We highlight how identifying observed and latent variables is fundamental for connecting the model to the data. To realize the likelihood-based approach, we first cast the model into a probabilistic generative guise that supports a proper data likelihood. Then, we describe the three scenarios via probabilistic graphical models and show the nuances that go into translating the model. Finally, we implement the resulting probabilistic models in an automatic differentiation framework (PyTorch). This step enables easy and efficient maximum likelihood estimation via gradient descent. Our experimental results show that the maximum likelihood estimates are up to 4x more accurate and require up to 200x less computational time.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Narratives of War: Ukrainian Memetic Warfare on Twitter
Authors:
Yelena Mejova,
Arthur Capozzi,
Corrado Monti,
Gianmarco De Francisci Morales
Abstract:
The 2022 Russian invasion of Ukraine has seen an intensification in the use of social media by governmental actors in cyber warfare. Wartime communication via memes has been a successful strategy used not only by independent accounts such as @uamemesforces, but also-for the first time in a full-scale interstate war-by official Ukrainian government accounts such as @Ukraine and @DefenceU. We study…
▽ More
The 2022 Russian invasion of Ukraine has seen an intensification in the use of social media by governmental actors in cyber warfare. Wartime communication via memes has been a successful strategy used not only by independent accounts such as @uamemesforces, but also-for the first time in a full-scale interstate war-by official Ukrainian government accounts such as @Ukraine and @DefenceU. We study this prominent example of memetic warfare through the lens of its narratives, and find them to be a key component of success: tweets with a 'victim' narrative garner twice as many retweets. However, malevolent narratives focusing on the enemy resonate more than those about heroism or victims with countries providing more assistance to Ukraine. Our findings present a nuanced examination of Ukraine's influence operations and of the worldwide response to it, thus contributing new insights into the evolution of socio-technical systems in times of war.
△ Less
Submitted 23 January, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
An impossibility result for Markov Chain Monte Carlo sampling from micro-canonical bipartite graph ensembles
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Matteo Riondato
Abstract:
Markov Chain Monte Carlo (MCMC) algorithms are commonly used to sample from graph ensembles. Two graphs are neighbors in the state space if one can be obtained from the other with only a few modifications, e.g., edge rewirings. For many common ensembles, e.g., those preserving the degree sequences of bipartite graphs, rewiring operations involving two edges are sufficient to create a fully-connect…
▽ More
Markov Chain Monte Carlo (MCMC) algorithms are commonly used to sample from graph ensembles. Two graphs are neighbors in the state space if one can be obtained from the other with only a few modifications, e.g., edge rewirings. For many common ensembles, e.g., those preserving the degree sequences of bipartite graphs, rewiring operations involving two edges are sufficient to create a fully-connected state space, and they can be performed efficiently. We show that, for ensembles of bipartite graphs with fixed degree sequences and number of butterflies (k2,2 bi-cliques), there is no universal constant c such that a rewiring of at most c edges at every step is sufficient for any such ensemble to be fully connected. Our proof relies on an explicit construction of a family of pairs of graphs with the same degree sequences and number of butterflies, with each pair indexed by a natural c, and such that any sequence of rewiring operations transforming one graph into the other must include at least one rewiring operation involving at least c edges. Whether rewiring these many edges is sufficient to guarantee the full connectivity of the state space of any such ensemble remains an open question. Our result implies the impossibility of develo** efficient, graph-agnostic, MCMC algorithms for these ensembles, as the necessity to rewire an impractically large number of edges may hinder taking a step on the state space.
△ Less
Submitted 19 April, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Hyper-distance Oracles in Hypergraphs
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of…
▽ More
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.
△ Less
Submitted 19 March, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Authority without Care: Moral Values behind the Mask Mandate Response
Authors:
Yelena Mejova,
Kyrieki Kalimeri,
Gianmarco De Francisci Morales
Abstract:
Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's politica…
▽ More
Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience.
△ Less
Submitted 30 March, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit
Authors:
Corrado Monti,
Jacopo D'Ignazi,
Michele Starnini,
Gianmarco De Francisci Morales
Abstract:
We evaluate homophily and heterophily among ideological and demographic groups in a typical opinion formation context: online discussions of current news. We analyze user interactions across five years in the r/news community on Reddit, one of the most visited websites in the United States. Then, we estimate demographic and ideological attributes of these users. Thanks to a comparison with a caref…
▽ More
We evaluate homophily and heterophily among ideological and demographic groups in a typical opinion formation context: online discussions of current news. We analyze user interactions across five years in the r/news community on Reddit, one of the most visited websites in the United States. Then, we estimate demographic and ideological attributes of these users. Thanks to a comparison with a carefully-crafted network null model, we establish which pairs of attributes foster interactions and which ones inhibit them.
Individuals prefer to engage with the opposite ideological side, which contradicts the echo chamber narrative. Instead, demographic groups are homophilic, as individuals tend to interact within their own group - even in an online setting where such attributes are not directly observable. In particular, we observe age and income segregation consistently across years: users tend to avoid interactions when belonging to different groups. These results persist after controlling for the degree of interest by each demographic group in different news topics. Our findings align with the theory that affective polarization - the difficulty in socializing across political boundaries-is more connected with an increasingly divided society, rather than ideological echo chambers on social media.
We publicly release our anonymized data set and all the code to reproduce our results: https://github.com/corradomonti/demographic-homophily
△ Less
Submitted 5 July, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
The Thin Ideology of Populist Advertising on Facebook during the 2019 EU Elections
Authors:
Arthur Capozzi,
Gianmarco De Francisci Morales,
Yelena Mejova,
Corrado Monti,
André Panisson
Abstract:
Social media has been an important tool in the expansion of the populist message, and it is thought to have contributed to the electoral success of populist parties in the past decade. This study compares how populist parties advertised on Facebook during the 2019 European Parliamentary election. In particular, we examine commonalities and differences in which audiences they reach and on which iss…
▽ More
Social media has been an important tool in the expansion of the populist message, and it is thought to have contributed to the electoral success of populist parties in the past decade. This study compares how populist parties advertised on Facebook during the 2019 European Parliamentary election. In particular, we examine commonalities and differences in which audiences they reach and on which issues they focus. By using data from Meta (previously Facebook) Ad Library, we analyze 45k ad campaigns by 39 parties, both populist and mainstream, in Germany, United Kingdom, Italy, Spain, and Poland. While populist parties represent just over 20% of the total expenditure on political ads, they account for 40% of the total impressions$\unicode{x2013}$most of which from Eurosceptic and far-right parties$\unicode{x2013}$thus hinting at a competitive advantage for populist parties on Facebook. We further find that ads posted by populist parties are more likely to reach male audiences, and sometimes much older ones. In terms of issues, populist politicians focus on monetary policy, state bureaucracy and reforms, and security, while the focus on EU and Brexit is on par with non-populist, mainstream parties. However, issue preferences are largely country-specific, thus supporting the view in political science that populism is a "thin ideology", that does not have a universal, coherent policy agenda. This study illustrates the usefulness of publicly available advertising data for monitoring the populist outreach to, and engagement with, millions of potential voters, while outlining the limitations of currently available data.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Supervised Feature Compression based on Counterfactual Analysis
Authors:
Veronica Piccialli,
Dolores Romero Morales,
Cecilia Salvatore
Abstract:
Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision bo…
▽ More
Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity.
△ Less
Submitted 24 November, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
The language of opinion change on social media under the lens of communicative action
Authors:
Corrado Monti,
Luca Maria Aiello,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dim…
▽ More
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called "delta". Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Learning Multiscale Non-stationary Causal Structures
Authors:
Gabriele D'Acunto,
Gianmarco De Francisci Morales,
Paolo Bajardi,
Francesco Bonchi
Abstract:
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative mode…
▽ More
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative model by leveraging results from spectral and causality theories. Our model allows sampling an MN-DAG according to user-specified priors on the time-dependence and multiscale properties of the causal graph. Secondly, we devise a Bayesian method named Multiscale Non-stationary Causal Structure Learner (MN-CASTLE) that uses stochastic variational inference to estimate MN-DAGs. The method also exploits information from the local partial correlation between time series over different time resolutions. The data generated from an MN-DAG reproduces well-known features of time series in different domains, such as volatility clustering and serial correlation. Additionally, we show the superior performance of MN-CASTLE on synthetic data with different multiscale and non-stationary properties compared to baseline models. Finally, we apply MN-CASTLE to identify the drivers of the natural gas prices in the US market. Causal relationships have strengthened during the COVID-19 outbreak and the Russian invasion of Ukraine, a fact that baseline methods fail to capture. MN-CASTLE identifies the causal impact of critical economic drivers on natural gas prices, such as seasonal factors, economic uncertainty, oil prices, and gas storage deviations.
△ Less
Submitted 17 November, 2023; v1 submitted 31 August, 2022;
originally announced August 2022.
-
On the Relation Between Opinion Change and Information Consumption on Reddit
Authors:
Flavio Petruzzellis,
Corrado Monti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopt…
▽ More
While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopts a well-codified schema for explicitly self-reporting opinion change. Starting from this ground truth, we analyze changes in future online information consumption behavior that arise after a self-reported opinion change on sociopolitical topics; and in particular, operationalized in this work as the participation to sociopolitical subreddits. Such participation profile is important as it represents one's information diet, and is a reliable proxy for, e.g., political affiliation or health choices.
We find that people who report an opinion change are significantly more likely to change their future participation in a specific subset of online communities. We characterize which communities are more likely to be abandoned after opinion change, and find a significant association (r=0.46) between propaganda-like language used in a community and the increase in chances of leaving it. We find comparable results (r=0.39) for the opposite direction, i.e., joining a community. This finding suggests how propagandistic communities act as a first gateway to internalize a shift in one's sociopolitical opinion. Finally, we show that the textual content of the discussion associated with opinion change is indicative of which communities are going to be subject to a participation change. In fact, a predictive model based only on the opinion change post is able to pinpoint these communities with an AP@5 of 0.20, similar to what can be reached by using all the past history of participation in communities.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
On learning agent-based models from data
Authors:
Corrado Monti,
Marco Pangallo,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the lat…
▽ More
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. The first step of our protocol is to reduce an ABM to a probabilistic model, characterized by a computationally tractable likelihood. This reduction follows two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. Then, our protocol proceeds by maximizing the likelihood of the latent variables via a gradient-based expectation maximization algorithm. We demonstrate our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We demonstrate that the obtained model allows accurate estimates of the latent variables, while preserving the general behavior of the ABM. We also show that our estimates can be used for out-of-sample forecasting. Our protocol can be seen as an alternative to black-box data assimilation methods, that forces the modeler to lay bare the assumptions of the model, to think about the inferential process, and to spot potential identification problems.
△ Less
Submitted 23 November, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Modeling Political Activism around Gun Debate via Social Media
Authors:
Yelena Mejova,
Jisun An,
Gianmarco De Francisci Morales,
Haewoon Kwak
Abstract:
The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible to classify the stance of users on the gun issue, e…
▽ More
The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible to classify the stance of users on the gun issue, especially accurately when network information is available. Alongside socioeconomic variables, network information such as the relative size of the two sides of the debate is also predictive of state-level gun policy. On individual level, we build a statistical model using network, content, and psycho-linguistic features that predicts real-life political action, and explore the most predictive linguistic features. Thus, we argue that, alongside demographics and socioeconomic indicators, social media provides useful signals in the holistic modeling of political engagement around the gun debate.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
FreSCo: Mining Frequent Patterns in Simplicial Complexes
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that fi…
▽ More
Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that finding the occurrences of simplets in a complex can be reduced to a bipartite graph isomorphism problem, in linear time and at most quadratic space. We then propose an anti-monotonic frequency measure that allows us to start the exploration from small simplets and stop expanding a simplet as soon as its frequency falls below the minimum frequency threshold. Equipped with these ideas and a clever data structure, we develop a memory-conscious algorithm that, by carefully exploiting the relationships among the simplices in the complex and among the simplets, achieves efficiency and scalability for our complex mining task. Our algorithm, FreSCo, comes in two flavors: it can compute the exact frequency of the simplets or, more quickly, it can determine whether a simplet is frequent, without having to compute the exact frequency. Experimental results prove the ability of FreSCo to mine frequent simplets in complexes of various size and dimension, and the significance of the simplets with respect to the traditional graph patterns.
△ Less
Submitted 26 January, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
The Evolving Causal Structure of Equity Risk Factors
Authors:
Gabriele D'Acunto,
Paolo Bajardi,
Francesco Bonchi,
Gianmarco De Francisci Morales
Abstract:
In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between…
▽ More
In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between these factors, which might enhance risk contagion among multi-factor portfolios during financial crises. Therefore, it is of fundamental importance to better understand the relationships among factors.
Empowered by recent advances in causal structure learning methods, this paper presents a study of the causal structure of financial risk factors and its evolution over time. In particular, the data we analyze covers 11 risk factors concerning the US equity market, spanning a period of 29 years at daily frequency.
Our results show a statistically significant sparsifying trend of the underlying causal structure. However, this trend breaks down during periods of financial stress, in which we can observe a densification of the causal network driven by a growth of the out-degree of the market factor node. Finally, we present a comparison with the analysis of factors cross-correlations, which further confirms the importance of causal analysis for gaining deeper insights in the dynamics of the factor system, particularly during economic downturns.
Our findings are especially significant from a risk-management perspective. They link the evolution of the causal structure of equity risk factors with market volatility and a worsening macroeconomic environment, and show that, in times of financial crisis, exposure to different factors boils down to exposure to the market risk factor.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Optimal randomized classification trees
Authors:
Rafael Blanquero,
Emilio Carrizosa,
Cristina Molero-Río,
Dolores Romero Morales
Abstract:
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold. This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other st…
▽ More
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold. This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures. Moreover, controlling critical issues, such as the misclassification rates in each of the classes, is difficult. To address these shortcomings, optimal decision trees have been recently proposed in the literature, which use discrete decision variables to model the path each observation will follow in the tree. Instead, we propose a new approach based on continuous optimization. Our classifier can be seen as a randomized tree, since at each node of the decision tree a random decision is made. The computational experience reported demonstrates the good performance of our procedure.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
On Clustering Categories of Categorical Predictors in Generalized Linear Models
Authors:
Emilio Carrizosa,
Marcela Galvis Restrepo,
Dolores Romero Morales
Abstract:
We propose a method to reduce the complexity of Generalized Linear Models in the presence of categorical predictors. The traditional one-hot encoding, where each category is represented by a dummy variable, can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors. This paper addresses these challenges by finding a reduc…
▽ More
We propose a method to reduce the complexity of Generalized Linear Models in the presence of categorical predictors. The traditional one-hot encoding, where each category is represented by a dummy variable, can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors. This paper addresses these challenges by finding a reduced representation of the categorical predictors by clustering their categories. This is done through a numerical method which aims to preserve (or even, improve) accuracy, while reducing the number of coefficients to be estimated for the categorical predictors. Thanks to its design, we are able to derive a proximity measure between categories of a categorical predictor that can be easily visualized. We illustrate the performance of our approach in real-world classification and count-data datasets where we see that clustering the categorical predictors reduces complexity substantially without harming accuracy.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
From Reddit to Wall Street: The role of committed minorities in financial collective action
Authors:
Lorenzo Lucchini,
Luca Maria Aiello,
Laura Alessandretti,
Gianmarco De Francisci Morales,
Michele Starnini,
Andrea Baronchelli
Abstract:
In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recen…
▽ More
In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalise the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit predated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.
△ Less
Submitted 13 September, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Clandestino or Rifugiato? Anti-immigration Facebook Ad Targeting in Italy
Authors:
Arthur Capozzi,
Gianmarco De Francisci Morales,
Yelena Mejova,
Corrado Monti,
André Panisson,
Daniela Paolotti
Abstract:
Monitoring advertising around controversial issues is an important step in ensuring accountability and transparency of political processes. To that end, we use the Facebook Ads Library to collect 2312 migration-related advertising campaigns in Italy over one year. Our pro- and anti-immigration classifier (F1=0.85) reveals a partisan divide among the major Italian political parties, with anti-immig…
▽ More
Monitoring advertising around controversial issues is an important step in ensuring accountability and transparency of political processes. To that end, we use the Facebook Ads Library to collect 2312 migration-related advertising campaigns in Italy over one year. Our pro- and anti-immigration classifier (F1=0.85) reveals a partisan divide among the major Italian political parties, with anti-immigration ads accounting for nearly 15M impressions. Although composing 47.6% of all migration-related ads, anti-immigration ones receive 65.2% of impressions. We estimate that about two thirds of all captured campaigns use some kind of demographic targeting by location, gender, or age. We find sharp divides by age and gender: for instance, anti-immigration ads from major parties are 17% more likely to be seen by a male user than a female. Unlike pro-migration parties, we find that anti-immigration ones reach a similar demographic to their own voters. However their audience change with topic: an ad from anti-immigration parties is 24% more likely to be seen by a male user when the ad speaks about migration, than if it does not. Furthermore, the viewership of such campaigns tends to follow the volume of mainstream news around immigration, supporting the theory that political advertisers try to "ride the wave" of current news. We conclude with policy implications for political communication: since the Facebook Ads Library does not allow to distinguish between advertisers intentions and algorithmic targeting, we argue that more details should be shared by platforms regarding the targeting configuration of socio-political campaigns.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
STruD: Truss Decomposition of Simplicial Complexes
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to t…
▽ More
A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to the graph-based one. This power, however, comes with important computational challenges derived from the combinatorial explosion caused by the downward closure property of complexes. Drawing upon ideas from itemset mining and similarity search, we design a memory-aware algorithm, dubbed STruD, which is able to efficiently compute the truss decomposition of a simplicial complex. STruD adapts its behavior to the amount of available memory by storing intermediate data in a compact way. We then devise a variant that computes directly the n simplices of maximum trussness. By applying STruD to several datasets, we prove its scalability, and provide an analysis of their structure. Finally, we show that the truss decomposition can be seen as a filtration, and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at different spatial resolutions, prominent in Topological Data Analysis.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
No Echo in the Chambers of Political Interactions on Reddit
Authors:
Gianmarco De Francisci Morales,
Corrado Monti,
Michele Starnini
Abstract:
Echo chambers in online social networks, whereby users' beliefs are reinforced by interactions with like-minded peers and insulation from others' points of view, have been decried as a cause of political polarization. Here, we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for the success of Donald Trump. We identify Trump vs Clinton supporters…
▽ More
Echo chambers in online social networks, whereby users' beliefs are reinforced by interactions with like-minded peers and insulation from others' points of view, have been decried as a cause of political polarization. Here, we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for the success of Donald Trump. We identify Trump vs Clinton supporters and reconstruct their political interaction network. We observe a preference for cross-cutting political interactions between the two communities rather than within-group interactions, thus contradicting the echo chamber narrative. Furthermore, these interactions are asymmetrical: Clinton supporters are particularly eager to answer comments by Trump supporters. Beside asymmetric heterophily, users show assortative behavior for activity, and disassortative, asymmetric behavior for popularity. Our findings are tested against a null model of random interactions, by using two different approaches: a network rewiring which preserves the activity of nodes, and a logit regression which takes into account possible confounding factors. Finally, we explore possible socio-demographic implications. Users show a tendency for geographical homophily and a small positive correlation between cross-interactions and voter abstention. Our findings shed light on public opinion formation on social media, calling for a better understanding of the social dynamics at play in this context.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques
Authors:
David Morales,
Estefania Talavera,
Beatriz Remeseiro
Abstract:
The field of deep learning is evolving in different directions, with still the need for more efficient training strategies. In this work, we present a novel and robust training scheme that integrates visual explanation techniques in the learning process. Unlike the attention mechanisms that focus on the relevant parts of images, we aim to improve the robustness of the model by making it pay attent…
▽ More
The field of deep learning is evolving in different directions, with still the need for more efficient training strategies. In this work, we present a novel and robust training scheme that integrates visual explanation techniques in the learning process. Unlike the attention mechanisms that focus on the relevant parts of images, we aim to improve the robustness of the model by making it pay attention to other regions as well. Broadly speaking, the idea is to distract the classifier in the learning process to force it to focus not only on relevant regions but also on those that, a priori, are not so informative for the discrimination of the class. We tested the proposed approach by embedding it into the learning process of a convolutional neural network for the analysis and classification of two well-known datasets, namely Stanford cars and FGVC-Aircraft. Furthermore, we evaluated our model on a real-case scenario for the classification of egocentric images, allowing us to obtain relevant information about peoples' lifestyles. In particular, we work on the challenging EgoFoodPlaces dataset, achieving state-of-the-art results with a lower level of complexity. The obtained results indicate the suitability of our proposed training scheme for image classification, improving the robustness of the final model.
△ Less
Submitted 29 July, 2021; v1 submitted 28 December, 2020;
originally announced December 2020.
-
Facebook Ads: Politics of Migration in Italy
Authors:
Arthur Capozzi,
Gianmarco De Francisci Morales,
Yelena Mejova,
Corrado Monti,
Andre Panisson,
Daniela Paolotti
Abstract:
Targeted online advertising is on the forefront of political communication, allowing hyper-local advertising campaigns around elections and issues. In this study, we employ a new resource for political ad monitoring -- Facebook Ads Library -- to examine advertising concerning the issue of immigration in Italy. A crucial topic in Italian politics, it has recently been a focus of several populist mo…
▽ More
Targeted online advertising is on the forefront of political communication, allowing hyper-local advertising campaigns around elections and issues. In this study, we employ a new resource for political ad monitoring -- Facebook Ads Library -- to examine advertising concerning the issue of immigration in Italy. A crucial topic in Italian politics, it has recently been a focus of several populist movements, some of which have adopted social media as a powerful tool for voter engagement. Indeed, we find evidence of targeting by the parties both in terms of geography and demographics (age and gender). For instance, Five Star Movement reaches a younger audience when advertising about immigration, while other parties' ads have a more male audience when advertising on this issue. We also notice a marked rise in advertising volume around elections, as well as a shift to more general audience. Thus, we illustrate political advertising targeting that likely has an impact on public opinion on a topic involving potentially vulnerable populations, and urge the research community to include online advertising in the monitoring of public discourse.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Deep learning for gravitational-wave data analysis: A resampling white-box approach
Authors:
Manuel D. Morales,
Javier M. Antelis,
Claudia Moreno,
Alexander I. Nesterov
Abstract:
In this work, we apply Convolutional Neural Networks (CNNs) to detect gravitational wave (GW) signals of compact binary coalescences, using single-interferometer data from LIGO detectors. As novel contribution, we adopted a resampling white-box approach to advance towards a statistical understanding of uncertainties intrinsic to CNNs in GW data analysis. Resampling is performed by repeated $k$-fol…
▽ More
In this work, we apply Convolutional Neural Networks (CNNs) to detect gravitational wave (GW) signals of compact binary coalescences, using single-interferometer data from LIGO detectors. As novel contribution, we adopted a resampling white-box approach to advance towards a statistical understanding of uncertainties intrinsic to CNNs in GW data analysis. Resampling is performed by repeated $k$-fold cross-validation experiments, and for a white-box approach, behavior of CNNs is mathematically described in detail. Through a Morlet wavelet transform, strain time series are converted to time-frequency images, which in turn are reduced before generating input datasets. Moreover, to reproduce more realistic experimental conditions, we worked only with data of non-Gaussian noise and hardware injections, removing freedom to set signal-to-noise ratio (SNR) values in GW templates by hand. After hyperparameter adjustments, we found that resampling smooths stochasticity of mini-batch stochastic gradient descend by reducing mean accuracy perturbations in a factor of $3.6$. CNNs were quite precise to detect noise but not sensitive enough to recall GW signals, meaning that CNNs are better for noise reduction than generation of GW triggers. However, applying a post-analysis, we found that for GW signals of SNR $\geq 21.80$ with H1 data and SNR $\geq 26.80$ with L1 data, CNNs could remain as tentative alternatives for detecting GW signals. Besides, with receiving operating characteristic curves we found that CNNs show much better performances than those of Naive Bayes and Support Vector Machines models and, with a significance level of $5\%$, we estimated that predictions of CNNs are significant different from those of a random classifier. Finally, we elucidated that performance of CNNs is highly class dependent because of the distribution of probabilistic scores outputted by the softmax layer.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Learning Opinion Dynamics From Social Traces
Authors:
Corrado Monti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, the…
▽ More
Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, their predictive power is limited. Moreover, parameter calibration and model selection are manual and difficult tasks.
In this work we propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces. Given a set of observables (e.g., actions and interactions between agents), our model can recover the most-likely latent opinion trajectories that are compatible with the assumptions about the process dynamics. This type of model retains the benefits of agent-based ones (i.e., causal interpretation), while adding the ability to perform model selection and hypothesis testing on real data.
We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart. We then design an inference algorithm based on online expectation maximization to learn the latent parameters of the model. Such algorithm can recover the latent opinion trajectories from traces generated by the classical agent-based model. In addition, it can identify the most likely set of macro parameters used to generate a data trace, thus allowing testing of sociological hypotheses. Finally, we apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect. Our results suggest a low prominence of the effect in Reddit's political conversation.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit
Authors:
Joan Massachs,
Corrado Monti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
We study the emergence of support for Donald Trump in Reddit's political discussion. With almost 800k subscribers, "r/The_Donald" is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump i…
▽ More
We study the emergence of support for Donald Trump in Reddit's political discussion. With almost 800k subscribers, "r/The_Donald" is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump in 2016, the year of the last US presidential elections. To characterize the behavior of Trump supporters, we draw from three different sociological hypotheses: homophily, social influence, and social feedback. We operationalize each hypothesis as a set of features for each user, and train classifiers to predict their participation in r/The_Donald.
We find that homophily-based and social feedback-based features are the most predictive signals. Conversely, we do not observe a strong impact of social influence mechanisms. We also perform an introspection of the best-performing model to build a "persona" of the typical supporter of Donald Trump on Reddit. We find evidence that the most prominent traits include a predominance of masculine interests, a conservative and libertarian political leaning, and links with politically incorrect and conspiratorial content.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
Echo Chambers on Social Media: A comparative analysis
Authors:
Matteo Cinelli,
Gianmarco De Francisci Morales,
Alessandro Galeazzi,
Walter Quattrociocchi,
Michele Starnini
Abstract:
Recent studies have shown that online users tend to select information adhering to their system of beliefs, ignore information that does not, and join groups - i.e., echo chambers - around a shared narrative. Although a quantitative methodology for their identification is still missing, the phenomenon of echo chambers is widely debated both at scientific and political level. To shed light on this…
▽ More
Recent studies have shown that online users tend to select information adhering to their system of beliefs, ignore information that does not, and join groups - i.e., echo chambers - around a shared narrative. Although a quantitative methodology for their identification is still missing, the phenomenon of echo chambers is widely debated both at scientific and political level. To shed light on this issue, we introduce an operational definition of echo chambers and perform a massive comparative analysis on more than 1B pieces of contents produced by 1M users on four social media platforms: Facebook, Twitter, Reddit, and Gab. We infer the leaning of users about controversial topics - ranging from vaccines to abortion - and reconstruct their interaction networks by analyzing different features, such as shared links domain, followed pages, follower relationship and commented posts. Our method quantifies the existence of echo-chambers along two main dimensions: homophily in the interaction networks and bias in the information diffusion toward likely-minded peers. We find peculiar differences across social media. Indeed, while Facebook and Twitter present clear-cut echo chambers in all the observed dataset, Reddit and Gab do not. Finally, we test the role of the social media platform on news consumption by comparing Reddit and Facebook. Again, we find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Falling into the Echo Chamber: the Italian Vaccination Debate on Twitter
Authors:
Alessandro Cossard,
Gianmarco De Francisci Morales,
Kyriaki Kalimeri,
Yelena Mejova,
Daniela Paolotti,
Michele Starnini
Abstract:
The reappearance of measles in the US and Europe, a disease considered eliminated in early 2000s, has been accompanied by a growing debate on the merits of vaccination on social media. In this study we examine the extent to which the vaccination debate on Twitter is conductive to potential outreach to the vaccination hesitant. We focus on Italy, one of the countries most affected by the latest mea…
▽ More
The reappearance of measles in the US and Europe, a disease considered eliminated in early 2000s, has been accompanied by a growing debate on the merits of vaccination on social media. In this study we examine the extent to which the vaccination debate on Twitter is conductive to potential outreach to the vaccination hesitant. We focus on Italy, one of the countries most affected by the latest measles outbreaks. We discover that the vaccination skeptics, as well as the advocates, reside in their own distinct "echo chambers". The structure of these communities differs as well, with skeptics arranged in a tightly connected cluster, and advocates organizing themselves around few authoritative hubs. At the center of these echo chambers we find the ardent supporters, for which we build highly accurate network- and content-based classifiers (attaining 95% cross-validated accuracy). Insights of this study provide several avenues for potential future interventions, including network-guided targeting, accounting for the political context, and monitoring of alternative sources of information.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Aion: Better Late than Never in Event-Time Streams
Authors:
Sérgio Esteves,
Gianmarco De Francisci Morales,
Rodrigo Rodrigues,
Marco Serafini,
Luís Veiga
Abstract:
Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which ei…
▽ More
Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which either imposes a maximum period of tolerated lateness, or causes the system to degrade performance or even crash when the system memory runs out.
In this paper, we propose AION, a comprehensive solution for handling late events in an efficient manner, implemented on top of Flink. In designing AION, we go beyond a naive solution that transfers state between memory and persistent storage on demand. In particular, we introduce a proactive caching scheme, where we leverage the semantics of stream processing to anticipate the need for bringing data to memory. Furthermore, we propose a predictive cleanup scheme to permanently discard window state based on the likelihood of receiving more late events, to prevent storage consumption from growing without bounds.
Our evaluation shows that AION is capable of maintaining sustainable levels of memory utilization while still preserving high throughput, low latency, and low staleness.
△ Less
Submitted 22 April, 2020; v1 submitted 7 March, 2020;
originally announced March 2020.
-
Sparsity in Optimal Randomized Classification Trees
Authors:
Rafael Blanquero,
Emilio Carrizosa,
Cristina Molero-Río,
Dolores Romero Morales
Abstract:
Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better…
▽ More
Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Predicting the Role of Political Trolls in Social Media
Authors:
Atanas Atanasov,
Gianmarco De Francisci Morales,
Preslav Nakov
Abstract:
We investigate the political roles of "Internet trolls" in social media. Political trolls, such as the ones linked to the Russian Internet Research Agency (IRA), have recently gained enormous attention for their ability to sway public opinion and even influence elections. Analysis of the online traces of trolls has shown different behavioral patterns, which target different slices of the populatio…
▽ More
We investigate the political roles of "Internet trolls" in social media. Political trolls, such as the ones linked to the Russian Internet Research Agency (IRA), have recently gained enormous attention for their ability to sway public opinion and even influence elections. Analysis of the online traces of trolls has shown different behavioral patterns, which target different slices of the population. However, this analysis is manual and labor-intensive, thus making it impractical as a first-response tool for newly-discovered troll farms. In this paper, we show how to automate this analysis by using machine learning in a realistic setting. In particular, we show how to classify trolls according to their political role ---left, news feed, right--- by using features extracted from social media, i.e., Twitter, in two scenarios: (i) in a traditional supervised learning scenario, where labels for trolls are available, and (ii) in a distant supervision scenario, where labels for trolls are not available, and we rely on more-commonly-available labels for news outlets mentioned by the trolls. Technically, we leverage the community structure and the text of the messages in the online social network of trolls represented as a graph, from which we extract several types of learned representations, i.e.,~embeddings, for the trolls. Experiments on the "IRA Russian Troll" dataset show that our methodology improves over the state-of-the-art in the first scenario, while providing a compelling case for the second scenario, which has not been explored in the literature thus far.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Link Prediction via Higher-Order Motif Features
Authors:
Ghadeer Abuoda,
Gianmarco De Francisci Morales,
Ashraf Aboulnaga
Abstract:
Link prediction requires predicting which new links are likely to appear in a graph. Being able to predict unseen links with good accuracy has important applications in several domains such as social media, security, transportation, and recommendation systems. A common approach is to use features based on the common neighbors of an unconnected pair of nodes to predict whether the pair will form a…
▽ More
Link prediction requires predicting which new links are likely to appear in a graph. Being able to predict unseen links with good accuracy has important applications in several domains such as social media, security, transportation, and recommendation systems. A common approach is to use features based on the common neighbors of an unconnected pair of nodes to predict whether the pair will form a link in the future. In this paper, we present an approach for link prediction that relies on higher-order analysis of the graph topology, well beyond common neighbors. We treat the link prediction problem as a supervised classification problem, and we propose a set of features that depend on the patterns or motifs that a pair of nodes occurs in. By using motifs of sizes 3, 4, and 5, our approach captures a high level of detail about the graph topology within the neighborhood of the pair of nodes, which leads to a higher classification accuracy. In addition to proposing the use of motif-based features, we also propose two optimizations related to constructing the classification dataset from the graph. First, to ensure that positive and negative examples are treated equally when extracting features, we propose adding the negative examples to the graph as an alternative to the common approach of removing the positive ones. Second, we show that it is important to control for the shortest-path distance when sampling pairs of nodes to form negative examples, since the difficulty of prediction varies with the shortest-path distance. We experimentally demonstrate that using off-the-shelf classifiers with a well constructed classification dataset results in up to 10 percentage points increase in accuracy over prior topology-based and feature learning methods.
△ Less
Submitted 4 June, 2020; v1 submitted 8 February, 2019;
originally announced February 2019.
-
Mining Frequent Patterns in Evolving Graphs
Authors:
Cigdem Aslay,
Muhammad Anis Uddin Nasir,
Gianmarco De Francisci Morales,
Aristides Gionis
Abstract:
Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the $k$-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming natu…
▽ More
Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the $k$-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem.
In this paper, we initiate the study of the approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent $k$-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of $k$-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and empirically demonstrate that the proposed algorithms generate high-quality results compared to baselines.
△ Less
Submitted 10 September, 2018; v1 submitted 2 September, 2018;
originally announced September 2018.
-
Large-Scale Learning from Data Streams with Apache SAMOA
Authors:
Nicolas Kourtellis,
Gianmarco De Francisci Morales,
Albert Bifet
Abstract:
Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine le…
▽ More
Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at https://samoa.incubator.apache.org under the Apache Software License version 2.0.
△ Less
Submitted 26 May, 2018;
originally announced May 2018.
-
Road Network Fusion for Incremental Map Updates
Authors:
Rade Stanojevic,
Sofiane Abbar,
Saravanan Thirumuruganathan,
Gianmarco De Francisci Morales,
Sanjay Chawla,
Fethi Filali,
Ahid Aleimat
Abstract:
In the recent years a number of novel, automatic map-inference techniques have been proposed, which derive road-network from a cohort of GPS traces collected by a fleet of vehicles. In spite of considerable attention, these maps are imperfect in many ways: they create an abundance of spurious connections, have poor coverage, and are visually confusing. Hence, commercial and crowd-sourced map** s…
▽ More
In the recent years a number of novel, automatic map-inference techniques have been proposed, which derive road-network from a cohort of GPS traces collected by a fleet of vehicles. In spite of considerable attention, these maps are imperfect in many ways: they create an abundance of spurious connections, have poor coverage, and are visually confusing. Hence, commercial and crowd-sourced map** services heavily use human annotation to minimize the map** errors. Consequently, their response to changes in the road network is inevitably slow. In this paper we describe \mapfuse, a system which fuses a human-annotated map (e.g., OpenStreetMap) with any automatically inferred map, thus effectively enabling quick map updates. In addition to new road creation, we study in depth road closure, which have not been examined in the past. By leveraging solid, human-annotated maps with minor corrections, we derive maps which minimize the trajectory matching errors due to both road network change and imperfect map inference of fully-automatic approaches.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
Political Discourse on Social Media: Echo Chambers, Gatekeepers, and the Price of Bipartisanship
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
Echo chambers, i.e., situations where one is exposed only to opinions that agree with their own, are an increasing concern for the political discourse in many democratic countries. This paper studies the phenomenon of political echo chambers on social media. We identify the two components in the phenomenon: the opinion that is shared ('echo'), and the place that allows its exposure ('chamber' ---…
▽ More
Echo chambers, i.e., situations where one is exposed only to opinions that agree with their own, are an increasing concern for the political discourse in many democratic countries. This paper studies the phenomenon of political echo chambers on social media. We identify the two components in the phenomenon: the opinion that is shared ('echo'), and the place that allows its exposure ('chamber' --- the social network), and examine closely at how these two components interact. We define a production and consumption measure for social-media users, which captures the political leaning of the content shared and received by them. By comparing the two, we find that Twitter users are, to a large degree, exposed to political opinions that agree with their own. We also find that users who try to bridge the echo chambers, by sharing content with diverse leaning, have to pay a 'price of bipartisanship' in terms of their network centrality and content appreciation. In addition, we study the role of 'gatekeepers', users who consume content with diverse leaning but produce partisan content (with a single-sided leaning), in the formation of echo chambers. Finally, we apply these findings to the task of predicting partisans and gatekeepers from social and content features. While partisan users turn out relatively easy to identify, gatekeepers prove to be more challenging.
△ Less
Submitted 19 February, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Factors in Recommending Contrarian Content on Social Media
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
Polarization is a troubling phenomenon that can lead to societal divisions and hurt the democratic process. It is therefore important to develop methods to reduce it.
We propose an algorithmic solution to the problem of reducing polarization. The core idea is to expose users to content that challenges their point of view, with the hope broadening their perspective, and thus reduce their polarity…
▽ More
Polarization is a troubling phenomenon that can lead to societal divisions and hurt the democratic process. It is therefore important to develop methods to reduce it.
We propose an algorithmic solution to the problem of reducing polarization. The core idea is to expose users to content that challenges their point of view, with the hope broadening their perspective, and thus reduce their polarity. Our method takes into account several aspects of the problem, such as the estimated polarity of the user, the probability of accepting the recommendation, the polarity of the content, and popularity of the content being recommended.
We evaluate our recommendations via a large-scale user study on Twitter users that were actively involved in the discussion of the US elections results. Results shows that, in most cases, the factors taken into account in the recommendation affect the users as expected, and thus capture the essential features of the problem.
△ Less
Submitted 16 May, 2017;
originally announced May 2017.
-
The Effect of Collective Attention on Controversial Debates on Social Media
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
We study the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016. Specifically, we explore how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic. Spikes in the volume of users typically correspond to external events that increase the public attention on t…
▽ More
We study the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016. Specifically, we explore how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic. Spikes in the volume of users typically correspond to external events that increase the public attention on the topic -- as, for instance, discussions about `gun control' often erupt after a mass shooting.
This work is the first to study the dynamic evolution of polarized online debates at such scale. By employing a wide array of network and content analysis measures, we find consistent evidence that increased collective attention is associated with increased network polarization and network concentration within each side of the debate; and overall more uniform lexicon usage across all users.
△ Less
Submitted 16 May, 2017;
originally announced May 2017.
-
Exposing Twitter Users to Contrarian News
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
Polarized topics often spark discussion and debate on social media. Recent studies have shown that polarized debates have a specific clustered structure in the endorsement net- work, which indicates that users direct their endorsements mostly to ideas they already agree with. Understanding these polarized discussions and exposing social media users to content that broadens their views is of paramo…
▽ More
Polarized topics often spark discussion and debate on social media. Recent studies have shown that polarized debates have a specific clustered structure in the endorsement net- work, which indicates that users direct their endorsements mostly to ideas they already agree with. Understanding these polarized discussions and exposing social media users to content that broadens their views is of paramount importance.
The contribution of this demonstration is two-fold. (i) A tool to visualize retweet networks about controversial issues on Twitter. By using our visualization, users can understand how polarized discussions are shaped on Twitter, and explore the positions of the various actors. (ii) A solution to reduce polarization of such discussions. We do so by exposing users to information which presents a contrarian point of view. Users can visually inspect our recommendations and understand why and how these would play out in terms of the retweet network.
Our demo (https://users.ics.aalto.fi/kiran/reducingControversy/ homepage) provides one of the first steps in develo** automated tools that help users explore, and possibly escape, their echo chambers. The ideas in the demo can also help content providers design tools to broaden their reach to people with different political and ideological backgrounds.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
The Ebb and Flow of Controversial Debates on Social Media
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
We explore how the polarization around controversial topics evolves on Twitter - over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity. We find that increased activity is typically associated with increased polarization; however, we find no consistent long-term trend in polarization over time among the topics we study.
We explore how the polarization around controversial topics evolves on Twitter - over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity. We find that increased activity is typically associated with increased polarization; however, we find no consistent long-term trend in polarization over time among the topics we study.
△ Less
Submitted 17 March, 2017;
originally announced March 2017.
-
Reducing Controversy by Connecting Opposing Views
Authors:
Kiran Garimella,
Gianmarco De Francisci Morales,
Aristides Gionis,
Michael Mathioudakis
Abstract:
Society is often polarized by controversial issues, that split the population into groups of opposing views. When such issues emerge on social media, we often observe the creation of 'echo chambers', i.e., situations where like-minded people reinforce each other's opinion, but do not get exposed to the views of the opposing side. In this paper we study algorithmic techniques for bridging these cha…
▽ More
Society is often polarized by controversial issues, that split the population into groups of opposing views. When such issues emerge on social media, we often observe the creation of 'echo chambers', i.e., situations where like-minded people reinforce each other's opinion, but do not get exposed to the views of the opposing side. In this paper we study algorithmic techniques for bridging these chambers, and thus, reducing controversy. Specifically, we represent the discussion on a controversial issue with an endorsement graph, and cast our problem as an edge-recommendation problem on this graph. The goal of the recommendation is to reduce the controversy score of the graph, which is measured by a recently-developed metric based on random walks. At the same time, we take into account the acceptance probability of the recommended edge, which represents how likely the edge is to materialize in the endorsement graph. We propose a simple model based on a recently-developed user-level controversy score, that is competitive with state-of- the-art link-prediction algorithms. We thus aim at finding the edges that produce the largest reduction in the controversy score, in expectation. To solve this problem, we propose an efficient algorithm, which considers only a fraction of all the combinations of possible edges. Experimental results show that our algorithm is more efficient than a simple greedy heuristic, while producing comparable score reduction. Finally, a comparison with other state-of-the-art edge-addition algorithms shows that this problem is fundamentally different from what has been studied in the literature.
△ Less
Submitted 24 May, 2018; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Fully Dynamic Algorithm for Top-$k$ Densest Subgraphs
Authors:
Muhammad Anis Uddin Nasir,
Aristides Gionis,
Gianmarco De Francisci Morales,
Sarunas Girdzijauskas
Abstract:
Given a large graph, the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-$k$ version of this problem, a naïve solution is to iteratively find the densest subgraph and remove it in each iteration. However, such a solution is impractical due to high processing cost. The problem is further complicated when dealing with dynamic graphs, since addin…
▽ More
Given a large graph, the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-$k$ version of this problem, a naïve solution is to iteratively find the densest subgraph and remove it in each iteration. However, such a solution is impractical due to high processing cost. The problem is further complicated when dealing with dynamic graphs, since adding or removing an edge requires re-running the algorithm. In this paper, we study the top-$k$ densest-subgraph problem in the sliding-window model and propose an efficient fully-dynamic algorithm. The input of our algorithm consists of an edge stream, and the goal is to find the node-disjoint subgraphs that maximize the sum of their densities. In contrast to existing state-of-the-art solutions that require iterating over the entire graph upon any update, our algorithm profits from the observation that updates only affect a limited region of the graph. Therefore, the top-$k$ densest subgraphs are maintained by only applying local updates. We provide a theoretical analysis of the proposed algorithm and show empirically that the algorithm often generates denser subgraphs than state-of-the-art competitors. Experiments show an improvement in efficiency of up to five orders of magnitude compared to state-of-the-art solutions.
△ Less
Submitted 29 August, 2017; v1 submitted 19 October, 2016;
originally announced October 2016.