Search | arXiv e-print repository

A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil

Authors: Philipe Melo, João M. M. Couto, Daniel Kansaon, Vitor Mafra, Júlio C. S. Reis, Fabrício Benevenuto

Abstract: With the increasing use of smartphones, instant messaging platforms turned into important communication tools. According to WhatsApp, more than 100 billion messages are sent each day on the app. Communication on these platforms has allowed individuals to express themselves in other types of media, rather than simple text, including audio, videos, images, and stickers. Particularly, stickers are a… ▽ More With the increasing use of smartphones, instant messaging platforms turned into important communication tools. According to WhatsApp, more than 100 billion messages are sent each day on the app. Communication on these platforms has allowed individuals to express themselves in other types of media, rather than simple text, including audio, videos, images, and stickers. Particularly, stickers are a new multimedia format that emerged with messaging apps, promoting new forms of interactions among users, especially in the Brazilian context, transcending their role as a mere form of humor to become a key element in political strategy. In this regard, we investigate how stickers are being used, unveiling unique characteristics that these media bring to WhatsApp chats and the political use of this new media format. To achieve that, we collected a large sample of messages from WhatsApp public political discussion groups in Brazil and analyzed the sticker messages shared in this context △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2401.12720 [pdf, other]

A Comprehensive View of the Biases of Toxicity and Sentiment Analysis Methods Towards Utterances with African American English Expressions

Authors: Guilherme H. Resende, Luiz F. Nery, Fabrício Benevenuto, Savvas Zannettou, Flavio Figueiredo

Abstract: Language is a dynamic aspect of our culture that changes when expressed in different technologies/communities. Online social networks have enabled the diffusion and evolution of different dialects, including African American English (AAE). However, this increased usage is not without barriers. One particular barrier is how sentiment (Vader, TextBlob, and Flair) and toxicity (Google's Perspective a… ▽ More Language is a dynamic aspect of our culture that changes when expressed in different technologies/communities. Online social networks have enabled the diffusion and evolution of different dialects, including African American English (AAE). However, this increased usage is not without barriers. One particular barrier is how sentiment (Vader, TextBlob, and Flair) and toxicity (Google's Perspective and the open-source Detoxify) methods present biases towards utterances with AAE expressions. Consider Google's Perspective to understand bias. Here, an utterance such as ``All n*ggers deserve to die respectfully. The police murder us.'' it reaches a higher toxicity than ``African-Americans deserve to die respectfully. The police murder us.''. This score difference likely arises because the tool cannot understand the re-appropriation of the term ``n*gger''. One explanation for this bias is that AI models are trained on limited datasets, and using such a term in training data is more likely to appear in a toxic utterance. While this may be plausible, the tool will make mistakes regardless. Here, we study bias on two Web-based (YouTube and Twitter) datasets and two spoken English datasets. Our analysis shows how most models present biases towards AAE in most settings. We isolate the impact of AAE expression usage via linguistic control features from the Linguistic Inquiry and Word Count (LIWC) software, grammatical control features extracted via Part-of-Speech (PoS) tagging from Natural Language Processing (NLP) models, and the semantic of utterances by comparing sentence embeddings from recent language models. We present consistent results on how a heavy usage of AAE expressions may cause the speaker to be considered substantially more toxic, even when speaking about nearly the same subject. Our study complements similar analyses focusing on small datasets and/or one method only. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Under peer review

arXiv:2308.14782 [pdf, other]

Hel** Fact-Checkers Identify Fake News Stories Shared through Images on WhatsApp

Authors: Julio C. S. Reis, Philipe Melo, Fabiano Belém, Fabricio Murai, Jussara M. Almeida, Fabricio Benevenuto

Abstract: WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unp… ▽ More WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unprecedented deluge of information generated on the Internet today. In this work, we explore automatic ranking-based strategies to propose a "fakeness score" model as a means to help fact-checking agencies identify fake news stories shared through images on WhatsApp. Based on the results, we design a tool and integrate it into a real system that has been used extensively for monitoring content during the 2018 Brazilian general election. Our experimental evaluation shows that this tool can reduce by up to 40% the amount of effort required to identify 80% of the fake news in the data when compared to current mechanisms practiced by the fact-checking agencies for the selection of news stories to be checked. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: This is a preprint version of an accepted manuscript on the Brazilian Symposium on Multimedia and the Web (WebMedia). Please, consider to cite it instead of this one

arXiv:2304.05274 [pdf, other]

YouNICon: YouTube's CommuNIty of Conspiracy Videos

Authors: Shaoyi Liaw, Fan Huang, Fabricio Benevenuto, Haewoon Kwak, Jisun An

Abstract: Conspiracy theories are widely propagated on social media. Among various social media services, YouTube is one of the most influential sources of news and entertainment. This paper seeks to develop a dataset, YOUNICON, to enable researchers to perform conspiracy theory detection as well as classification of videos with conspiracy theories into different topics. YOUNICON is a dataset with a large c… ▽ More Conspiracy theories are widely propagated on social media. Among various social media services, YouTube is one of the most influential sources of news and entertainment. This paper seeks to develop a dataset, YOUNICON, to enable researchers to perform conspiracy theory detection as well as classification of videos with conspiracy theories into different topics. YOUNICON is a dataset with a large collection of videos from suspicious channels that were identified to contain conspiracy theories in a previous study (Ledwich and Zaitsev 2020). Overall, YOUNICON will enable researchers to study trends in conspiracy theories and understand how individuals can interact with the conspiracy theory producing community or channel. Our data is available at: https://doi.org/10.5281/zenodo.7466262. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2301.11850 [pdf, other]

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Authors: Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabrício Benevenuto

Abstract: Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text… ▽ More Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese. △ Less

Submitted 28 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2202.04737 [pdf, other]

Telegram Monitor: Monitoring Brazilian Political Groups and Channels on Telegram

Authors: Manoel Júnior, Philipe Melo, Daniel Kansaon, Vitor Mafra, Kaio Sá, Fabrício Benevenuto

Abstract: Instant messaging platforms such as Telegram became one of the main means of communication used by people all over the world. Most of them are home of several groups and channels that connect thousands of people focused on political topics. However, they have suffered with misinformation campaigns with a direct impact on electoral processes around the world. While some platforms, such as WhatsApp,… ▽ More Instant messaging platforms such as Telegram became one of the main means of communication used by people all over the world. Most of them are home of several groups and channels that connect thousands of people focused on political topics. However, they have suffered with misinformation campaigns with a direct impact on electoral processes around the world. While some platforms, such as WhatsApp, took restrictive policies and measures to attenuate the issues arising from the abuse of their systems, others have emerged as alternatives, presenting little or no restrictions on content moderation or actions in combating misinformation. Telegram is one of those systems, which has been attracting more users and gaining popularity. In this work, we present the "Telegram Monitor", a web-based system that monitors the political debate in this environment and enables the analysis of the most shared content in multiple channels and public groups. Our system aims to allow journalists, researchers, and fact-checking agencies to identify trending conspiracy theories, misinformation campaigns, or simply to monitor the political debate in this space along the 2022 Brazilian elections. We hope our system can assist the combat of misinformation spreading through Telegram in Brazil. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: 4 pages, TheWebConf 2022

arXiv:2109.09322 [pdf, other]

Can online attention signals help fact-checkers fact-check?

Authors: Manoel Horta Ribeiro, Savvas Zannettou, Oana Goga, Fabrício Benevenuto, Robert West

Abstract: Recent research suggests that not all fact-checking efforts are equal: when and what is fact-checked plays a pivotal role in effectively correcting misconceptions. In that context, signals capturing how much attention specific topics receive on the Internet have the potential to study (and possibly support) fact-checking efforts. This paper proposes a framework to study fact-checking with online a… ▽ More Recent research suggests that not all fact-checking efforts are equal: when and what is fact-checked plays a pivotal role in effectively correcting misconceptions. In that context, signals capturing how much attention specific topics receive on the Internet have the potential to study (and possibly support) fact-checking efforts. This paper proposes a framework to study fact-checking with online attention signals. The framework consists of: 1) extracting claims from fact-checking efforts; 2) linking such claims with knowledge graph entities; and 3) estimating the online attention these entities receive. We use this framework to conduct a preliminary study of a dataset of 879 COVID-19-related fact-checks done in 2020 by 81 international organizations. Our findings suggest that there is often a disconnect between online attention and fact-checking efforts. For example, in around 40% of countries that fact-checked ten or more claims, half or more than half of the ten most popular claims were not fact-checked. Our analysis also shows that claims are first fact-checked after receiving, on average, 35% of the total online attention they would eventually receive in 2020. Yet, there is a considerable variation among claims: some were fact-checked before receiving a surge of misinformation-induced online attention; others are fact-checked much later. Overall, our work suggests that the incorporation of online attention signals may help organizations assess their fact-checking efforts and choose what and when to fact-check claims or stories. Also, in the context of international collaboration, where claims are fact-checked multiple times across different countries, online attention could help organizations keep track of which claims are "migrating" between countries. △ Less

Submitted 7 May, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: This paper has been accepted at the MEDIATE workshop (ICWSM 2022), please cite accordingly

arXiv:2105.13020 [pdf, other]

On the Globalization of the QAnon Conspiracy Theory Through Telegram

Authors: Mohamad Hoseini, Philipe Melo, Fabricio Benevenuto, Anja Feldmann, Savvas Zannettou

Abstract: QAnon is a far-right conspiracy theory that became popular and mainstream over the past few years. Worryingly, the QAnon conspiracy theory has implications in the real world, with supporters of the theory participating in real-world violent acts like the US capitol attack in 2021. At the same time, the QAnon theory started evolving into a global phenomenon by attracting followers across the globe… ▽ More QAnon is a far-right conspiracy theory that became popular and mainstream over the past few years. Worryingly, the QAnon conspiracy theory has implications in the real world, with supporters of the theory participating in real-world violent acts like the US capitol attack in 2021. At the same time, the QAnon theory started evolving into a global phenomenon by attracting followers across the globe and, in particular, in Europe. Therefore, it is imperative to understand how the QAnon theory became a worldwide phenomenon and how this dissemination has been happening in the online space. This paper performs a large-scale data analysis of QAnon through Telegram by collecting 4.5M messages posted in 161 QAnon groups/channels. Using Google's Perspective API, we analyze the toxicity of QAnon content across languages and over time. Also, using a BERT-based topic modeling approach, we analyze the QAnon discourse across multiple languages. Among other things, we find that the German language is prevalent in QAnon groups/channels on Telegram, even overshadowing English after 2020. Also, we find that content posted in German and Portuguese tends to be more toxic compared to English. Our topic modeling indicates that QAnon supporters discuss various topics of interest within far-right movements, including world politics, conspiracy theories, COVID-19, and the anti-vaccination movement. Taken all together, we perform the first multilingual study on QAnon through Telegram and paint a nuanced overview of the globalization of the QAnon theory. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2104.12265 [pdf, other]

Contextual-Lexicon Approach for Abusive Language Detection

Authors: Francielle Vargas, Fabiana Rodrigues de Góes, Isabelle Carvalho, Fabrício Benevenuto, Thiago Alexandre Salgueiro Pardo

Abstract: Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media. Our approach embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the sever… ▽ More Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media. Our approach embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the models. Nevertheless, our method may be applied to any other language. The conducted experiments show the effectiveness of the proposed approach, outperforming the current baseline methods for the Portuguese language. △ Less

Submitted 20 December, 2022; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: Please cite: https://aclanthology.org/2021.ranlp-1.161/

arXiv:2103.14972 [pdf, other]

HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection

Authors: Francielle Alves Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Fabrício Benevenuto, Thiago Alexandre Salgueiro Pardo

Abstract: Due to the severity of the social media offensive and hateful comments in Brazil, and the lack of research in Portuguese, this paper provides the first large-scale expert annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection. The HateBR corpus was collected from the comment section of Brazilian politicians' accounts on Instagram and manually annotated by… ▽ More Due to the severity of the social media offensive and hateful comments in Brazil, and the lack of research in Portuguese, this paper provides the first large-scale expert annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection. The HateBR corpus was collected from the comment section of Brazilian politicians' accounts on Instagram and manually annotated by specialists, reaching a high inter-annotator agreement. The corpus consists of 7,000 documents annotated according to three different layers: a binary classification (offensive versus non-offensive comments), offensiveness-level classification (highly, moderately, and slightly offensive), and nine hate speech groups (xenophobia, racism, homophobia, sexism, religious intolerance, partyism, apology for the dictatorship, antisemitism, and fatphobia). We also implemented baseline experiments for offensive language and hate speech detection and compared them with a literature baseline. Results show that the baseline experiments on our corpus outperform the current state-of-the-art for the Portuguese language. △ Less

Submitted 27 December, 2022; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Published at LREC 2022 Proceedings

Journal ref: https://aclanthology.org/2022.lrec-1.777/

arXiv:2101.00963 [pdf, other]

doi 10.1109/ASONAM49781.2020.9381327

Characterizing (Un)moderated Textual Data in Social Systems

Authors: Lucas Henrique Costa de Lima, Julio Reis, Philipe Melo, Fabricio Murai, Fabricio Benevenuto

Abstract: Despite the valuable social interactions that online media promote, these systems provide space for speech that would be potentially detrimental to different groups of people. The moderation of content imposed by many social media has motivated the emergence of a new social system for free speech named Gab, which lacks moderation of content. This article characterizes and compares moderated textua… ▽ More Despite the valuable social interactions that online media promote, these systems provide space for speech that would be potentially detrimental to different groups of people. The moderation of content imposed by many social media has motivated the emergence of a new social system for free speech named Gab, which lacks moderation of content. This article characterizes and compares moderated textual data from Twitter with a set of unmoderated data from Gab. In particular, we analyze distinguishing characteristics of moderated and unmoderated content in terms of linguistic features, evaluate hate speech and its different forms in both environments. Our work shows that unmoderated content presents different psycholinguistic features, more negative sentiment and higher toxicity. Our findings support that unmoderated environments may have proportionally more online hate speech. We hope our analysis and findings contribute to the debate about hate speech and benefit systems aiming at deploying hate speech detection approaches. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: Accepted to IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM, 2020)

arXiv:2006.02471 [pdf, other]

Can WhatsApp Benefit from Debunked Fact-Checked Stories to Reduce Misinformation?

Authors: Julio C. S. Reis, Philipe de Freitas Melo, Kiran Garimella, Fabrício Benevenuto

Abstract: WhatsApp was alleged to be widely used to spread misinformation and propaganda during elections in Brazil and India. Due to the private encrypted nature of the messages on WhatsApp, it is hard to track the dissemination of misinformation at scale. In this work, using public WhatsApp data, we observe that misinformation has been largely shared on WhatsApp public groups even after they were already… ▽ More WhatsApp was alleged to be widely used to spread misinformation and propaganda during elections in Brazil and India. Due to the private encrypted nature of the messages on WhatsApp, it is hard to track the dissemination of misinformation at scale. In this work, using public WhatsApp data, we observe that misinformation has been largely shared on WhatsApp public groups even after they were already fact-checked by popular fact-checking agencies. This represents a significant portion of misinformation spread in both Brazil and India in the groups analyzed. We posit that such misinformation content could be prevented if WhatsApp had a means to flag already fact-checked content. To this end, we propose an architecture that could be implemented by WhatsApp to counter such misinformation. Our proposal respects the current end-to-end encryption architecture on WhatsApp, thus protecting users' privacy while providing an approach to detect the misinformation that benefits from fact-checking efforts. △ Less

Submitted 5 August, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: This is a preprint version of an accepted manuscript on The Harvard Kennedy School (HKS) Misinformation Review. Please, consider to cite it instead of this one

arXiv:2005.08065 [pdf, other]

How Biased is the Population of Facebook Users? Comparing the Demographics of Facebook Users with Census Data to Generate Correction Factors

Authors: Filipe N. Ribeiro, Fabrício Benevenuto, Emilio Zagheni

Abstract: Censuses around the world are key sources of data to guide government investments and public policies. However, these sources are very expensive to obtain and are collected relatively infrequently. Over the last decade, there has been growing interest in the use of data from social media to complement traditional data sources. However, social media users are not representative of the general popul… ▽ More Censuses around the world are key sources of data to guide government investments and public policies. However, these sources are very expensive to obtain and are collected relatively infrequently. Over the last decade, there has been growing interest in the use of data from social media to complement traditional data sources. However, social media users are not representative of the general population. Thus, analyses based on social media data require statistical adjustments, like post-stratification, in order to remove the bias and make solid statistical claims. These adjustments are possible only when we have information about the frequency of demographic groups using social media. These data, when compared with official statistics, enable researchers to produce appropriate statistical correction factors. In this paper, we leverage the Facebook advertising platform to compile the equivalent of an aggregate-level census of Facebook users. Our compilation includes the population distribution for seven demographic attributes such as gender and age at different geographic levels for the US. By comparing the Facebook counts with official reports provided by the US Census and Gallup, we found very high correlations, especially for political leaning and race. We also identified instances where official statistics may be underestimating population counts as in the case of immigration. We use the information collected to calculate bias correction factors for all computed attributes in order to evaluate the extent to which different demographic groups are more or less represented on Facebook. We provide the first comprehensive analysis for assessing biases in Facebook users across several dimensions. This information can be used to generate bias-adjusted population estimates and demographic counts in a timely way and at fine geographic granularity in between data releases of official statistics △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: This is a preprint of a full paper accepted at Websci '20 (12th ACM Web Science Conference 2020). Please cite that version instead

arXiv:2005.06591 [pdf, other]

Neutrality May Matter: Sentiment Analysis in Reviews of Airbnb, Booking, and Couchsurfing in Brazil and USA

Authors: Gustavo Santos, Vinicius F. S. Mota, Fabricio Benevenuto, Thiago H. Silva

Abstract: Information and communications technologies have enabled the rise of the phenomenon named sharing economy, which represents activities between people, coordinated by online platforms, to obtain, provide, or share access to goods and services. In hosting services of the sharing economy, it is common to have a personal contact between the host and guest, and this may affect users' decision to do neg… ▽ More Information and communications technologies have enabled the rise of the phenomenon named sharing economy, which represents activities between people, coordinated by online platforms, to obtain, provide, or share access to goods and services. In hosting services of the sharing economy, it is common to have a personal contact between the host and guest, and this may affect users' decision to do negative reviews, as negative reviews can damage the offered services. To evaluate this issue, we collected reviews from two sharing economy platforms, Airbnb and Couchsurfing, and from one platform that works mostly with hotels (traditional economy), Booking.com, for some cities in Brazil and the USA. Trough a sentiment analysis, we found that reviews in the sharing economy tend to be considerably more positive than those in the traditional economy. This can represent a problem in those systems, as an experiment with volunteers performed in this study suggests. In addition, we discuss how to exploit the results obtained to help improve users' decision making. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2005.02443 [pdf, other]

A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections

Authors: Julio C. S. Reis, Philipe de Freitas Melo, Kiran Garimella, Jussara M. Almeida, Dean Eckles, Fabrício Benevenuto

Abstract: Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency w… ▽ More Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 7 pages. This is a preprint version of an accepted paper on ICWSM'20. Please, consider to cite the conference version instead of this one

arXiv:2001.10581 [pdf, other]

Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook

Authors: Márcio Silva, Lucas Santos de Oliveira, Athanasios Andreou, Pedro Olmo Vaz de Melo, Oana Goga, Fabrício Benevenuto

Abstract: The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using… ▽ More The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms. △ Less

Submitted 31 January, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:1909.08740 [pdf, other]

Can WhatsApp Counter Misinformation by Limiting Message Forwarding?

Authors: Philipe de Freitas Melo, Carolina Coimbra Vieira, Kiran Garimella, Pedro O. S. Vaz de Melo, Fabrício Benevenuto

Abstract: WhatsApp is the most popular messaging app in the world. The closed nature of the app, in addition to the ease of transferring multimedia and sharing information to large-scale groups make WhatsApp unique among other platforms, where an anonymous encrypted messages can become viral, reaching multiple users in a short period of time. The personal feeling and immediacy of messages directly delivered… ▽ More WhatsApp is the most popular messaging app in the world. The closed nature of the app, in addition to the ease of transferring multimedia and sharing information to large-scale groups make WhatsApp unique among other platforms, where an anonymous encrypted messages can become viral, reaching multiple users in a short period of time. The personal feeling and immediacy of messages directly delivered to the user's phone on WhatsApp was extensively abused to spread unfounded rumors and create misinformation campaigns during recent elections in Brazil and India. WhatsApp has been deploying measures to mitigate this problem, such as reducing the limit for forwarding a message to at most five users at once. Despite the welcomed effort to counter the problem, there is no evidence so far on the real effectiveness of such restrictions. In this work, we propose a methodology to evaluate the effectiveness of such measures on the spreading of misinformation circulating on WhatsApp. We use an epidemiological model and real data gathered from WhatsApp in Brazil, India and Indonesia to assess the impact of limiting virality features in this kind of network. Our results suggest that the current efforts deployed by WhatsApp can offer significant delays on the information spread, but they are ineffective in blocking the propagation of misinformation campaigns through public groups when the content has a high viral nature. △ Less

Submitted 23 September, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

Comments: 12 pages

arXiv:1808.09218 [pdf, other]

doi 10.1145/3287560.3287580

On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook

Authors: Filipe N. Ribeiro, Koustuv Saha, Mahmoudreza Babaei, Lucas Henrique, Johnnatan Messias, Fabricio Benevenuto, Oana Goga, Krishna P. Gummadi, Elissa M. Redmiles

Abstract: Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and t… ▽ More Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and their effects undetected. This work examines a specific case of malicious advertising, exploring the extent to which political ads from the Russian Intelligence Research Agency (IRA) run prior to 2016 U.S. elections exploited Facebook's targeted advertising infrastructure to efficiently target ads on divisive or polarizing topics (e.g., immigration, race-based policing) at vulnerable sub-populations. In particular, we do the following: (a) We conduct U.S. census-representative surveys to characterize how users with different political ideologies report, approve, and perceive truth in the content of the IRA ads. Our surveys show that many ads are "divisive": they elicit very different reactions from people belonging to different socially salient groups. (b) We characterize how these divisive ads are targeted to sub-populations that feel particularly aggrieved by the status quo. Our findings support existing calls for greater transparency of content and targeting of political ads. (c) We particularly focus on how the Facebook ad API facilitates such targeting. We show how the enormous amount of personal data Facebook aggregates about users and makes available to advertisers enables such malicious targeting. △ Less

Submitted 21 November, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

Comments: This is a preprint of a full paper accepted at ACM FAT*'19 (ACM Conference on Fairness, Accountability, and Transparency). Please cite that version instead

arXiv:1807.03688 [pdf, other]

Inside the Right-Leaning Echo Chambers: Characterizing Gab, an Unmoderated Social System

Authors: Lucas Lima, Julio C. S. Reis, Philipe Melo, Fabricio Murai, Leandro Araújo, Pantelis Vikatos, Fabrício Benevenuto

Abstract: The moderation of content in many social media systems, such as Twitter and Facebook, motivated the emergence of a new social network system that promotes free speech, named Gab. Soon after that, Gab has been removed from Google Play Store for violating the company's hate speech policy and it has been rejected by Apple for similar reasons. In this paper we characterize Gab, aiming at understanding… ▽ More The moderation of content in many social media systems, such as Twitter and Facebook, motivated the emergence of a new social network system that promotes free speech, named Gab. Soon after that, Gab has been removed from Google Play Store for violating the company's hate speech policy and it has been rejected by Apple for similar reasons. In this paper we characterize Gab, aiming at understanding who are the users who joined it and what kind of content they share in this system. Our findings show that Gab is a very politically oriented system that hosts banned users from other social networks, some of them due to possible cases of hate speech and association with extremism. We provide the first measurement of news dissemination inside a right-leaning echo chamber, investigating a social media where readers are rarely exposed to content that cuts across ideological lines, but rather are fed with content that reinforces their current political or social views. △ Less

Submitted 10 July, 2018; originally announced July 2018.

Comments: This is a preprint of a paper that will appear on ASONAM'18

arXiv:1712.09601 [pdf, other]

Building the Brazilian Academic Genealogy Tree

Authors: Wellington Dores, Elias Soares, Fabrício Benevenuto, Alberto H. F. Laender

Abstract: Along the history, many researchers provided remarkable contributions to science, not only advancing knowledge but also in terms of mentoring new scientists. Currently, identifying and studying the formation of researchers over the years is a challenging task as current repositories of theses and dissertations are cataloged in a decentralized way through many local digital libraries. Following our… ▽ More Along the history, many researchers provided remarkable contributions to science, not only advancing knowledge but also in terms of mentoring new scientists. Currently, identifying and studying the formation of researchers over the years is a challenging task as current repositories of theses and dissertations are cataloged in a decentralized way through many local digital libraries. Following our previous work in which we created and analyzed a large collection of genealogy trees extracted from NDLTD, in this paper we focus our attention on building such trees for the Brazilian research community. For this, we use data from the Lattes Platform, an internationally renowned initiative from CNPq, the Brazilian National Council for Scientific and Technological Development, for managing information about individual researchers and research groups in Brazil. △ Less

Submitted 27 December, 2017; originally announced December 2017.

arXiv:1711.07915 [pdf, ps, other]

10Sent: A Stable Sentiment Analysis Method Based on the Combination of Off-The-Shelf Approaches

Authors: Philipe F. Melo, Daniel H. Dalip, Manoel M. Junior, Marcos A. Gonçalves, Fabrício Benevenuto

Abstract: Sentiment analysis has become a very important tool for analysis of social media data. There are several methods developed for this research field, many of them working very differently from each other, covering distinct aspects of the problem and disparate strategies. Despite the large number of existent techniques, there is no single one which fits well in all cases or for all data sources. Supe… ▽ More Sentiment analysis has become a very important tool for analysis of social media data. There are several methods developed for this research field, many of them working very differently from each other, covering distinct aspects of the problem and disparate strategies. Despite the large number of existent techniques, there is no single one which fits well in all cases or for all data sources. Supervised approaches may be able to adapt to specific situations but they require manually labeled training, which is very cumbersome and expensive to acquire, mainly for a new application. In this context, in here, we propose to combine several very popular and effective state-of-the-practice sentiment analysis methods, by means of an unsupervised bootstrapped strategy for polarity classification. One of our main goals is to reduce the large variability (lack of stability) of the unsupervised methods across different domains (datasets). Our solution was thoroughly tested considering thirteen different datasets in several domains such as opinions, comments, and social media. The experimental results demonstrate that our combined method (aka, 10SENT) improves the effectiveness of the classification task, but more importantly, it solves a key problem in the field. It is consistently among the best methods in many data types, meaning that it can produce the best (or close to best) results in almost all considered contexts, without any additional costs (e.g., manual labeling). Our self-learning approach is also very independent of the base methods, which means that it is highly extensible to incorporate any new additional method that can be envisioned in the future. Finally, we also investigate a transfer learning approach for sentiment analysis as a means to gather additional (unsupervised) information for the proposed approach and we show the potential of this technique to improve our results. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1706.08619 [pdf, other]

doi 10.1145/3106426.3106472

White, Man, and Highly Followed: Gender and Race Inequalities in Twitter

Authors: Johnnatan Messias, Pantelis Vikatos, Fabricio Benevenuto

Abstract: Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter… ▽ More Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter users located in U.S. using advanced image processing algorithms from Face++. Then, we investigate how different demographic groups (i.e. male/female, Asian/Black/White) connect with other. We quantify to what extent one group follow and interact with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. Our analysis shows that users identified as White and male tend to attain higher positions in Twitter, in terms of the number of followers and number of times in user's lists. We hope our effort can stimulate the development of new theories of demographic information in the online space. △ Less

Submitted 26 June, 2017; originally announced June 2017.

Comments: In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'17). Leipzig, Germany. August 2017

arXiv:1705.04045 [pdf, other]

Using Facebook Ads Audiences for Global Lifestyle Disease Surveillance: Promises and Limitations

Authors: Matheus Araujo, Yelena Mejova, Ingmar Weber, Fabricio Benevenuto

Abstract: Every day, millions of users reveal their interests on Facebook, which are then monetized via targeted advertisement marketing campaigns. In this paper, we explore the use of demographically rich Facebook Ads audience estimates for tracking non-communicable diseases around the world. Across 47 countries, we compute the audiences of marker interests, and evaluate their potential in tracking health… ▽ More Every day, millions of users reveal their interests on Facebook, which are then monetized via targeted advertisement marketing campaigns. In this paper, we explore the use of demographically rich Facebook Ads audience estimates for tracking non-communicable diseases around the world. Across 47 countries, we compute the audiences of marker interests, and evaluate their potential in tracking health conditions associated with tobacco use, obesity, and diabetes, compared to the performance of placebo interests. Despite its huge potential, we find that, for modeling prevalence of health conditions across countries, differences in these interest audiences are only weakly indicative of the corresponding prevalence rates. Within the countries, however, our approach provides interesting insights on trends of health awareness across demographic groups. Finally, we provide a temporal error analysis to expose the potential pitfalls of using Facebook's Marketing API as a black box. △ Less

Submitted 11 May, 2017; originally announced May 2017.

Comments: Please cite the article published at WebSci'17 instead of this arxiv version

arXiv:1705.03972 [pdf, other]

doi 10.1145/3078714.3078734

Demographics of News Sharing in the U.S. Twittersphere

Authors: Julio C. S. Reis, Haewoon Kwak, Jisun An, Johnnatan Messias, Fabricio Benevenuto

Abstract: The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterizati… ▽ More The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach. Among our main findings, we show that males and white users tend to be more active in terms of sharing news, biasing the news audience to the interests of these demographic groups. Our results also quantify differences in interests of news sharing across demographics, which has implications for personalized news digests. △ Less

Submitted 10 May, 2017; originally announced May 2017.

arXiv:1705.03926 [pdf, other]

doi 10.1145/3078714.3078742

Linguistic Diversities of Demographic Groups in Twitter

Authors: Pantelis Vikatos, Johnnatan Messias, Manoel Miranda, Fabricio Benevenuto

Abstract: The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algor… ▽ More The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space. △ Less

Submitted 10 May, 2017; originally announced May 2017.

Comments: Proceedings of the 28th ACM Conference on Hypertext and Social Media 2017 (HT '17)

arXiv:1704.00139 [pdf, other]

Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations

Authors: Abhijnan Chakraborty, Johnnatan Messias, Fabricio Benevenuto, Saptarshi Ghosh, Niloy Ganguly, Krishna P. Gummadi

Abstract: Users of social media sites like Facebook and Twitter rely on crowdsourced content recommendation systems (e.g., Trending Topics) to retrieve important and useful information. Contents selected for recommendation indirectly give the initial users who promoted (by liking or posting) the content an opportunity to propagate their messages to a wider audience. Hence, it is important to understand the… ▽ More Users of social media sites like Facebook and Twitter rely on crowdsourced content recommendation systems (e.g., Trending Topics) to retrieve important and useful information. Contents selected for recommendation indirectly give the initial users who promoted (by liking or posting) the content an opportunity to propagate their messages to a wider audience. Hence, it is important to understand the demographics of people who make a content worthy of recommendation, and explore whether they are representative of the media site's overall population. In this work, using extensive data collected from Twitter, we make the first attempt to quantify and explore the demographic biases in the crowdsourced recommendations. Our analysis, focusing on the selection of trending topics, finds that a large fraction of trends are promoted by crowds whose demographics are significantly different from the overall Twitter population. More worryingly, we find that certain demographic groups are systematically under-represented among the promoters of the trending topics. To make the demographic biases in Twitter trends more transparent, we developed and deployed a Web-based service 'Who-Makes-Trends' at twitter-app.mpi-sws.org/who-makes-trends. △ Less

Submitted 1 April, 2017; originally announced April 2017.

Comments: 11th AAAI International Conference on Web and Social Media (ICWSM 2017)

arXiv:1703.08365 [pdf, other]

The Emergence of Crowdsourcing among Pokémon Go Players

Authors: Priscila Martins, Manoel Miranda, Fabrício Benevenuto, Jussara Almeida

Abstract: Since its launching, Pok{é}mon Go has been pointed as the largest gaming phenomenon of the smartphone age. As the game requires the user to walk in the real world to see and capture Pok{é}mons, a new wave of crowdsourcing apps have emerged to allow users to collaborate with each other, sharing where and when Pok{é}mons were found. In this paper we characterize one of such initiatives, called PokeC… ▽ More Since its launching, Pok{é}mon Go has been pointed as the largest gaming phenomenon of the smartphone age. As the game requires the user to walk in the real world to see and capture Pok{é}mons, a new wave of crowdsourcing apps have emerged to allow users to collaborate with each other, sharing where and when Pok{é}mons were found. In this paper we characterize one of such initiatives, called PokeCrew. Our analyses uncover a set of aspects of user behavior and system usage in such emerging crowdsourcing task, hel** unveil some problems and benefits. We hope our effort can inspire the design of new crowdsourcing systems. △ Less

Submitted 24 March, 2017; originally announced March 2017.

arXiv:1607.00421 [pdf, other]

From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies

Authors: Johnnatan Messias, Fabricio Benevenuto, Ingmar Weber, Emilio Zagheni

Abstract: Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millio… ▽ More Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millions of Google+ users in order to study migration "clusters", i.e. groups of countries in which individuals have lived. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before. △ Less

Submitted 1 July, 2016; originally announced July 2016.

Comments: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

arXiv:1604.02612 [pdf, other]

Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos

Authors: Moisés H. R. Pereira, Flávio L. C. Pádua, Adriano C. M. Pereira, Fabrício Benevenuto, Daniel H. Dalip

Abstract: This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To a… ▽ More This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial expressions, (2) extraction of modulations in the participants' speeches and (3) sentiment analysis from the closed caption associated to the videos of interest. More specifically, we compute features, such as, visual intensities of recognized emotions, field sizes of participants, voicing probability, sound loudness, speech fundamental frequencies and the sentiment scores (polarities) from text sentences in the closed caption. Experimental results with a dataset containing 520 annotated news videos from three Brazilian and one American popular TV newscasts show that our approach achieves an accuracy of up to 84% in the sentiments (tension levels) classification task, thus demonstrating its high potential to be used by media analysts in several applications, especially, in the journalistic domain. △ Less

Submitted 9 April, 2016; originally announced April 2016.

Comments: 5 pages, 1 figure, International AAAI Conference on Web and Social Media

arXiv:1603.07709 [pdf, ps, other]

Analyzing the Targets of Hate in Online Social Media

Authors: Leandro Silva, Mainack Mondal, Denzil Correa, Fabricio Benevenuto, Ingmar Weber

Abstract: Social media systems allow Internet users a congenial platform to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Online hate speech is an archetypal example of such challenges. Despite its magnitude and scale, there is a significant gap in understanding the nature of hate sp… ▽ More Social media systems allow Internet users a congenial platform to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Online hate speech is an archetypal example of such challenges. Despite its magnitude and scale, there is a significant gap in understanding the nature of hate speech on social media. In this paper, we provide the first of a kind systematic large scale measurement study of the main targets of hate speech in online social media. To do that, we gather traces from two social media systems: Whisper and Twitter. We then develop and validate a methodology to identify hate speech on both these systems. Our results identify online hate speech forms and offer a broader understanding of the phenomenon, providing directions for prevention and detection approaches. △ Less

Submitted 24 March, 2016; originally announced March 2016.

Comments: Short paper, 4 pages, 4 tables

arXiv:1512.01818 [pdf, other]

SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Authors: Filipe Nunes Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Fabrício Benevenuto, Marcos André Gonçalves

Abstract: In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide… ▽ More In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods. △ Less

Submitted 14 July, 2016; v1 submitted 6 December, 2015; originally announced December 2015.

arXiv:1512.00770 [pdf, other]

Bayesian Social Influence in the Online Realm

Authors: Przemyslaw A. Grabowicz, Francisco Romero-Ferrero, Theo Lins, Fabrício Benevenuto, Krishna P. Gummadi, Gonzalo G. de Polavieja

Abstract: Our opinions, which things we like or dislike, depend on the opinions of those around us. Nowadays, we are influenced by the opinions of online strangers, expressed in comments and ratings on online platforms. Here, we perform novel "academic A/B testing" experiments with over 2,500 participants to measure the extent of that influence. In our experiments, the participants watch and evaluate videos… ▽ More Our opinions, which things we like or dislike, depend on the opinions of those around us. Nowadays, we are influenced by the opinions of online strangers, expressed in comments and ratings on online platforms. Here, we perform novel "academic A/B testing" experiments with over 2,500 participants to measure the extent of that influence. In our experiments, the participants watch and evaluate videos on mirror proxies of YouTube and Vimeo. We control the comments and ratings that are shown underneath each of these videos. Our study shows that from 5$\%$ up to 40$\%$ of subjects adopt the majority opinion of strangers expressed in the comments. Using Bayes' theorem, we derive a flexible and interpretable family of models of social influence, in which each individual forms posterior opinions stochastically following a logit model. The variants of our mixture model that maximize Akaike information criterion represent two sub-populations, i.e., non-influenceable and influenceable individuals. The prior opinions of the non-influenceable individuals are strongly correlated with the external opinions and have low standard error, whereas the prior opinions of influenceable individuals have high standard error and become correlated with the external opinions due to social influence. Our findings suggest that opinions are random variables updated via Bayes' rule whose standard deviation is correlated with opinion influenceability. Based on these findings, we discuss how to hinder opinion manipulation and misinformation diffusion in the online realm. △ Less

Submitted 26 February, 2020; v1 submitted 2 December, 2015; originally announced December 2015.

Comments: 15 pages, 22 figures

ACM Class: H.1.2; I.2.11; J.4

arXiv:1510.04767 [pdf, other]

The H-index Paradox: Your Coauthors Have a Higher H-index than You Do

Authors: Fabrício Benevenuto, Alberto H. F. Laender, Bruno L. Alves

Abstract: One interesting phenomenon that emerges from the typical structure of social networks is the friendship paradox. It states that your friends have on average more friends than you do. Recent efforts have explored variations of it, with numerous implications for the dynamics of social networks. However, the friendship paradox and its variations consider only the topological structure of the networks… ▽ More One interesting phenomenon that emerges from the typical structure of social networks is the friendship paradox. It states that your friends have on average more friends than you do. Recent efforts have explored variations of it, with numerous implications for the dynamics of social networks. However, the friendship paradox and its variations consider only the topological structure of the networks and neglect many other characteristics that are correlated with node degree. In this article, we take the case of scientific collaborations to investigate whether a similar paradox also arises in terms of a researcher's scientific productivity as measured by her H-index. The H-index is a widely used metric in academia to capture both the quality and the quantity of a researcher's scientific output. It is likely that a researcher may use her coauthors' H-indexes as a way to infer whether her own H-index is adequate in her research area. Nevertheless, in this article, we show that the average H-index of a researcher's coauthors is usually higher than her own H-index. We present empirical evidence of this paradox and discuss some of its potential consequences. △ Less

Submitted 19 October, 2015; v1 submitted 15 October, 2015; originally announced October 2015.

arXiv:1503.07921 [pdf, other]

Breaking the News: First Impressions Matter on Online News

Authors: Julio Reis, Fabrıcio Benevenuto, Pedro O. S. Vaz de Melo, Raquel Prates, Haewoon Kwak, Jisun An

Abstract: A growing number of people are changing the way they consume news, replacing the traditional physical newspapers and magazines by their virtual online versions or/and weblogs. The interactivity and immediacy present in online news are changing the way news are being produced and exposed by media corporations. News websites have to create effective strategies to catch people's attention and attract… ▽ More A growing number of people are changing the way they consume news, replacing the traditional physical newspapers and magazines by their virtual online versions or/and weblogs. The interactivity and immediacy present in online news are changing the way news are being produced and exposed by media corporations. News websites have to create effective strategies to catch people's attention and attract their clicks. In this paper we investigate possible strategies used by online news corporations in the design of their news headlines. We analyze the content of 69,907 headlines produced by four major global media corporations during a minimum of eight consecutive months in 2014. In order to discover strategies that could be used to attract clicks, we extracted features from the text of the news headlines related to the sentiment polarity of the headline. We discovered that the sentiment of the headline is strongly related to the popularity of the news and also with the dynamics of the posted comments on that particular news. △ Less

Submitted 16 April, 2015; v1 submitted 26 March, 2015; originally announced March 2015.

Comments: The paper appears in ICWSM 2015

arXiv:1406.0032 [pdf, other]

doi 10.1145/2512938.2512951

Comparing and Combining Sentiment Analysis Methods

Authors: Pollyanna Gonçalves, Matheus Araújo, Fabrício Benevenuto, Meeyoung Cha

Abstract: Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Netwo… ▽ More Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Networks (OSNs). There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the wide use and popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message as the current literature does not provide a method of comparison among existing methods. Such a comparison is crucial for understanding the potential limitations, advantages, and disadvantages of popular methods in analyzing the content of OSNs messages. Our study aims at filling this gap by presenting comparisons of eight popular sentiment analysis methods in terms of coverage (i.e., the fraction of messages whose sentiment is identified) and agreement (i.e., the fraction of identified sentiments that are in tune with ground truth). We develop a new method that combines existing approaches, providing the best coverage results and competitive agreement. We also present a free Web service called iFeel, which provides an open API for accessing and comparing results across different sentiment methods for a given text. △ Less

Submitted 30 May, 2014; originally announced June 2014.

Comments: Proceedings of the first ACM conference on Online social networks (2013) 27-38

arXiv:1405.4927 [pdf, other]

Reverse Engineering Socialbot Infiltration Strategies in Twitter

Authors: Carlos A. Freitas, Fabrício Benevenuto, Saptarshi Ghosh, Adriano Veloso

Abstract: Data extracted from social networks like Twitter are increasingly being used to build applications and services that mine and summarize public reactions to events, such as traffic monitoring platforms, identification of epidemic outbreaks, and public perception about people and brands. However, such services are vulnerable to attacks from socialbots $-$ automated accounts that mimic real users… ▽ More Data extracted from social networks like Twitter are increasingly being used to build applications and services that mine and summarize public reactions to events, such as traffic monitoring platforms, identification of epidemic outbreaks, and public perception about people and brands. However, such services are vulnerable to attacks from socialbots $-$ automated accounts that mimic real users $-$ seeking to tamper statistics by posting messages generated automatically and interacting with legitimate users. Potentially, if created in large scale, socialbots could be used to bias or even invalidate many existing services, by infiltrating the social networks and acquiring trust of other users with time. This study aims at understanding infiltration strategies of socialbots in the Twitter microblogging platform. To this end, we create 120 socialbot accounts with different characteristics and strategies (e.g., gender specified in the profile, how active they are, the method used to generate their tweets, and the group of users they interact with), and investigate the extent to which these bots are able to infiltrate the Twitter social network. Our results show that even socialbots employing simple automated mechanisms are able to successfully infiltrate the network. Additionally, using a $2^k$ factorial design, we quantify infiltration effectiveness of different bot strategies. Our analysis unveils findings that are key for the design of detection and counter measurements approaches. △ Less

Submitted 19 May, 2014; originally announced May 2014.

arXiv:1402.2351 [pdf, other]

TrendLearner: Early Prediction of Popularity Trends of User Generated Content

Authors: Flavio Figueiredo, Jussara M. Almeida, Marcos André Gonçalves, Fabrício Benevenuto

Abstract: We here focus on the problem of predicting the popularity trend of user generated content (UGC) as early as possible. Taking YouTube videos as case study, we propose a novel two-step learning approach that: (1) extracts popularity trends from previously uploaded objects, and (2) predicts trends for new content. Unlike previous work, our solution explicitly addresses the inherent tradeoff between p… ▽ More We here focus on the problem of predicting the popularity trend of user generated content (UGC) as early as possible. Taking YouTube videos as case study, we propose a novel two-step learning approach that: (1) extracts popularity trends from previously uploaded objects, and (2) predicts trends for new content. Unlike previous work, our solution explicitly addresses the inherent tradeoff between prediction accuracy and remaining interest in the content after prediction, solving it on a per-object basis. Our experimental results show great improvements of our solution over alternatives, and its applicability to improve the accuracy of state-of-the-art popularity prediction methods. △ Less

Submitted 14 February, 2016; v1 submitted 10 February, 2014; originally announced February 2014.

Comments: To appear at Elsevier Information Sciences Journal

arXiv:1402.1777 [pdf, other]

On the Dynamics of Social Media Popularity: A YouTube Case Study

Authors: Flavio Figueiredo, Jussara M. Almeida, Marcos André Gonçalves, Fabrício Benevenuto

Abstract: Understanding the factors that impact the popularity dynamics of social media can drive the design of effective information services, besides providing valuable insights to content generators and online advertisers. Taking YouTube as case study, we analyze how video popularity evolves since upload, extracting popularity trends that characterize groups of videos. We also analyze the referrers that… ▽ More Understanding the factors that impact the popularity dynamics of social media can drive the design of effective information services, besides providing valuable insights to content generators and online advertisers. Taking YouTube as case study, we analyze how video popularity evolves since upload, extracting popularity trends that characterize groups of videos. We also analyze the referrers that lead users to videos, correlating them, features of the video and early popularity measures with the popularity trend and total observed popularity the video will experience. Our findings provide fundamental knowledge about popularity dynamics and its implications for services such as advertising and search. △ Less

Submitted 17 October, 2014; v1 submitted 7 February, 2014; originally announced February 2014.

Comments: Extended version of a paper published in ACM WSDM 2011. Pre-print of the paper accepted for publication on the ACM Transactions on Internet Tecnology

arXiv:1308.1857 [pdf, other]

PANAS-t: A Pychometric Scale for Measuring Sentiments on Twitter

Authors: Pollyanna Gonçalves, Fabrício Benevenuto, Meeyoung Cha

Abstract: Online social networks have become a major communication platform, where people share their thoughts and opinions about any topic real-time. The short text updates people post in these network contain emotions and moods, which when measured collectively can unveil the public mood at population level and have exciting implications for businesses, governments, and societies. Therefore, there is an u… ▽ More Online social networks have become a major communication platform, where people share their thoughts and opinions about any topic real-time. The short text updates people post in these network contain emotions and moods, which when measured collectively can unveil the public mood at population level and have exciting implications for businesses, governments, and societies. Therefore, there is an urgent need for develo** solid methods for accurately measuring moods from large-scale social media data. In this paper, we propose PANAS-t, which measures sentiments from short text updates in Twitter based on a well-established psychometric scale, PANAS (Positive and Negative Affect Schedule). We test the efficacy of PANAS-t over 10 real notable events drawn from 1.8 billion tweets and demonstrate that it can efficiently capture the expected sentiments of a wide variety of issues spanning tragedies, technology releases, political debates, and healthcare. △ Less

Submitted 8 August, 2013; originally announced August 2013.

Comments: 10 pages, 3 figures

arXiv:0804.4865 [pdf, ps, other]

Characterizing Video Responses in Social Networks

Authors: Fabricio Benevenuto, Fernando Duarte, Tiago Rodrigues, Virgilio Almeida, Jussara Almeida, Keith Ross

Abstract: Video sharing sites, such as YouTube, use video responses to enhance the social interactions among their users. The video response feature allows users to interact and converse through video, by creating a video sequence that begins with an opening video and followed by video responses from other users. Our characterization is over 3.4 million videos and 400,000 video responses collected from Yo… ▽ More Video sharing sites, such as YouTube, use video responses to enhance the social interactions among their users. The video response feature allows users to interact and converse through video, by creating a video sequence that begins with an opening video and followed by video responses from other users. Our characterization is over 3.4 million videos and 400,000 video responses collected from YouTube during a 7-day period. We first analyze the characteristics of the video responses, such as popularity, duration, and geography. We then examine the social networks that emerge from the video response interactions. △ Less

Submitted 30 April, 2008; originally announced April 2008.

ACM Class: J.4; H.3.5

Showing 1–40 of 40 results for author: Benevenuto, F