Search | arXiv e-print repository

A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil

Authors: Philipe Melo, João M. M. Couto, Daniel Kansaon, Vitor Mafra, Júlio C. S. Reis, Fabrício Benevenuto

Abstract: With the increasing use of smartphones, instant messaging platforms turned into important communication tools. According to WhatsApp, more than 100 billion messages are sent each day on the app. Communication on these platforms has allowed individuals to express themselves in other types of media, rather than simple text, including audio, videos, images, and stickers. Particularly, stickers are a… ▽ More With the increasing use of smartphones, instant messaging platforms turned into important communication tools. According to WhatsApp, more than 100 billion messages are sent each day on the app. Communication on these platforms has allowed individuals to express themselves in other types of media, rather than simple text, including audio, videos, images, and stickers. Particularly, stickers are a new multimedia format that emerged with messaging apps, promoting new forms of interactions among users, especially in the Brazilian context, transcending their role as a mere form of humor to become a key element in political strategy. In this regard, we investigate how stickers are being used, unveiling unique characteristics that these media bring to WhatsApp chats and the political use of this new media format. To achieve that, we collected a large sample of messages from WhatsApp public political discussion groups in Brazil and analyzed the sticker messages shared in this context △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.08465 [pdf, other]

How to Surprisingly Consider Recommendations? A Knowledge-Graph-based Approach Relying on Complex Network Metrics

Authors: Oliver Baumann, Durgesh Nandini, Anderson Rossanez, Mirco Schoenfeld, Julio Cesar dos Reis

Abstract: Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited… ▽ More Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited to incorporate relational information and suggest items with a user-defined degree of surprise. We propose a Knowledge Graph (KG) based recommender system by encoding user interactions on item catalogs. Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations. We hypothesize that surprisingness correlates with certain network metrics, treating user profiles as subgraphs within a larger catalog KG. The achieved solution reranks recommendations based on their impact on structural graph metrics. Our research contributes to optimizing recommendations to reflect the metrics. We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles. We find that reranking items based on complex network metrics leads to a more unexpected and surprising composition of recommendation lists. △ Less

Submitted 14 May, 2024; originally announced May 2024.

ACM Class: H.5.0; H.5.1; H.3.4; H.4.0; I.2.4

arXiv:2403.05756 [pdf, other]

Model-Free Local Recalibration of Neural Networks

Authors: R. Torres, D. J. Nott, S. A. Sisson, T. Rodrigues, J. G. Reis, G. S. Rodrigues

Abstract: Artificial neural networks (ANNs) are highly flexible predictive models. However, reliably quantifying uncertainty for their predictions is a continuing challenge. There has been much recent work on "recalibration" of predictive distributions for ANNs, so that forecast probabilities for events of interest are consistent with certain frequency evaluations of them. Uncalibrated probabilistic forecas… ▽ More Artificial neural networks (ANNs) are highly flexible predictive models. However, reliably quantifying uncertainty for their predictions is a continuing challenge. There has been much recent work on "recalibration" of predictive distributions for ANNs, so that forecast probabilities for events of interest are consistent with certain frequency evaluations of them. Uncalibrated probabilistic forecasts are of limited use for many important decision-making tasks. To address this issue, we propose a localized recalibration of ANN predictive distributions using the dimension-reduced representation of the input provided by the ANN hidden layers. Our novel method draws inspiration from recalibration techniques used in the literature on approximate Bayesian computation and likelihood-free inference methods. Most existing calibration methods for ANNs can be thought of as calibrating either on the input layer, which is difficult when the input is high-dimensional, or the output layer, which may not be sufficiently flexible. Through a simulation study, we demonstrate that our method has good performance compared to alternative approaches, and explore the benefits that can be achieved by localizing the calibration based on different layers of the network. Finally, we apply our proposed method to a diamond price prediction problem, demonstrating the potential of our approach to improve prediction and uncertainty quantification in real-world applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 25 pages, 5 figures

MSC Class: 62G07 (Primary); 68T07; 68T37 (Secondary); 68Q10 ACM Class: G.3; I.5.1; I.6.4

arXiv:2308.14782 [pdf, other]

Hel** Fact-Checkers Identify Fake News Stories Shared through Images on WhatsApp

Authors: Julio C. S. Reis, Philipe Melo, Fabiano Belém, Fabricio Murai, Jussara M. Almeida, Fabricio Benevenuto

Abstract: WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unp… ▽ More WhatsApp has introduced a novel avenue for smartphone users to engage with and disseminate news stories. The convenience of forming interest-based groups and seamlessly sharing content has rendered WhatsApp susceptible to the exploitation of misinformation campaigns. While the process of fact-checking remains a potent tool in identifying fabricated news, its efficacy falters in the face of the unprecedented deluge of information generated on the Internet today. In this work, we explore automatic ranking-based strategies to propose a "fakeness score" model as a means to help fact-checking agencies identify fake news stories shared through images on WhatsApp. Based on the results, we design a tool and integrate it into a real system that has been used extensively for monitoring content during the 2018 Brazilian general election. Our experimental evaluation shows that this tool can reduce by up to 40% the amount of effort required to identify 80% of the fake news in the data when compared to current mechanisms practiced by the fact-checking agencies for the selection of news stories to be checked. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: This is a preprint version of an accepted manuscript on the Brazilian Symposium on Multimedia and the Web (WebMedia). Please, consider to cite it instead of this one

arXiv:2308.01849 [pdf, other]

Curricular Transfer Learning for Sentence Encoded Tasks

Authors: Jader Martins Camboim de Sá, Matheus Ferraroni Sanches, Rafael Roque de Souza, Júlio Cesar dos Reis, Leandro Aparecido Villas

Abstract: Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar… ▽ More Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar analysis that allows further gradual adaptation between pre-training distributions. In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2212.10913 [pdf]

Ensemble learning techniques for intrusion detection system in the context of cybersecurity

Authors: Andricson Abeline Moreira, Carlos A. C. Tojeiro, Carlos J. Reis, Gustavo Henrique Massaro, Igor Andrade Brito e Kelton A. P. da Costa

Abstract: Recently, there has been an interest in improving the resources available in Intrusion Detection System (IDS) techniques. In this sense, several studies related to cybersecurity show that the environment invasions and information kidnap** are increasingly recurrent and complex. The criticality of the business involving operations in an environment using computing resources does not allow the vul… ▽ More Recently, there has been an interest in improving the resources available in Intrusion Detection System (IDS) techniques. In this sense, several studies related to cybersecurity show that the environment invasions and information kidnap** are increasingly recurrent and complex. The criticality of the business involving operations in an environment using computing resources does not allow the vulnerability of the information. Cybersecurity has taken on a dimension within the universe of indispensable technology in corporations, and the prevention of risks of invasions into the environment is dealt with daily by Security teams. Thus, the main objective of the study was to investigate the Ensemble Learning technique using the Stacking method, supported by the Support Vector Machine (SVM) and k-Nearest Neighbour (kNN) algorithms aiming at an optimization of the results for DDoS attack detection. For this, the Intrusion Detection System concept was used with the application of the Data Mining and Machine Learning Orange tool to obtain better results △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: in Portuguese language. CIACA - Conferencia Ibero-Americana Computação Aplicada 2022 Proceedings

arXiv:2209.11172 [pdf, other]

EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers

Authors: Ricardo V. Godoy, Tharik J. S. Reis, Paulo H. Polegato, Gustavo J. G. Lahr, Ricardo L. Saute, Frederico N. Nakano, Helio R. Machado, Americo C. Sakamoto, Marcelo Becker, Glauco A. P. Caurin

Abstract: Epilepsy is one of the most common neurological diseases, characterized by transient and unprovoked events called epileptic seizures. Electroencephalogram (EEG) is an auxiliary method used to perform both the diagnosis and the monitoring of epilepsy. Given the unexpected nature of an epileptic seizure, its prediction would improve patient care, optimizing the quality of life and the treatment of e… ▽ More Epilepsy is one of the most common neurological diseases, characterized by transient and unprovoked events called epileptic seizures. Electroencephalogram (EEG) is an auxiliary method used to perform both the diagnosis and the monitoring of epilepsy. Given the unexpected nature of an epileptic seizure, its prediction would improve patient care, optimizing the quality of life and the treatment of epilepsy. Predicting an epileptic seizure implies the identification of two distinct states of EEG in a patient with epilepsy: the preictal and the interictal. In this paper, we developed two deep learning models called Temporal Multi-Channel Transformer (TMC-T) and Vision Transformer (TMC-ViT), adaptations of Transformer-based architectures for multi-channel temporal signals. Moreover, we accessed the impact of choosing different preictal duration, since its length is not a consensus among experts, and also evaluated how the sample size benefits each model. Our models are compared with fully connected, convolutional, and recurrent networks. The algorithms were patient-specific trained and evaluated on raw EEG signals from the CHB-MIT database. Experimental results and statistical validation demonstrated that our TMC-ViT model surpassed the CNN architecture, state-of-the-art in seizure prediction. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 15 pages, 10 figures

MSC Class: 92C55 (Primary) ACM Class: I.5.4

arXiv:2101.00963 [pdf, other]

doi 10.1109/ASONAM49781.2020.9381327

Characterizing (Un)moderated Textual Data in Social Systems

Authors: Lucas Henrique Costa de Lima, Julio Reis, Philipe Melo, Fabricio Murai, Fabricio Benevenuto

Abstract: Despite the valuable social interactions that online media promote, these systems provide space for speech that would be potentially detrimental to different groups of people. The moderation of content imposed by many social media has motivated the emergence of a new social system for free speech named Gab, which lacks moderation of content. This article characterizes and compares moderated textua… ▽ More Despite the valuable social interactions that online media promote, these systems provide space for speech that would be potentially detrimental to different groups of people. The moderation of content imposed by many social media has motivated the emergence of a new social system for free speech named Gab, which lacks moderation of content. This article characterizes and compares moderated textual data from Twitter with a set of unmoderated data from Gab. In particular, we analyze distinguishing characteristics of moderated and unmoderated content in terms of linguistic features, evaluate hate speech and its different forms in both environments. Our work shows that unmoderated content presents different psycholinguistic features, more negative sentiment and higher toxicity. Our findings support that unmoderated environments may have proportionally more online hate speech. We hope our analysis and findings contribute to the debate about hate speech and benefit systems aiming at deploying hate speech detection approaches. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: Accepted to IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM, 2020)

arXiv:2012.12590 [pdf, other]

doi 10.1007/s10664-021-10110-5

Crowdsmelling: The use of collective knowledge in code smells detection

Authors: José Pereira dos Reis, Fernando Brito e Abreu, Glauco de Figueiredo Carneiro

Abstract: Code smells are seen as major source of technical debt and, as such, should be detected and removed. However, researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. We proposed the crowdsmelling approach based on supervised machine learning techniques, where the wisdom of the crowd (of software developers… ▽ More Code smells are seen as major source of technical debt and, as such, should be detected and removed. However, researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. We proposed the crowdsmelling approach based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. This paper presents the results of a validation experiment for the crowdsmelling approach. In the context of three consecutive years of a Software Engineering course, a total "crowd" of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Obtained results suggest that crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity. △ Less

Submitted 23 December, 2020; originally announced December 2020.

MSC Class: D.2.7

arXiv:2012.08842 [pdf, other]

doi 10.1007/s11831-021-09566-x

Code smells detection and visualization: A systematic literature review

Authors: José Pereira dos Reis, Fernando Brito e Abreu, Glauco de Figueiredo Carneiro, Craig Anslow

Abstract: Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been catalogued with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed i… ▽ More Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been catalogued with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed in the literature, and the second is to analyze to which extent visual techniques have been applied to support the former. Method: Over 83 primary studies indexed in major scientific repositories were identified by our search string in this SLR. Then, following existing best practices for secondary studies, we applied inclusion/exclusion criteria to select the most relevant works, extract their features and classify them. Results: We found that the most commonly used approaches to code smells detection are search-based (30.1%), and metric-based (24.1%). Most of the studies (83.1%) use open-source software, with the Java language occupying the first position (77.1%). In terms of code smells, God Class (51.8%), Feature Envy (33.7%), and Long Method (26.5%) are the most covered ones. Machine learning techniques are used in 35% of the studies. Around 80% of the studies only detect code smells, without providing visualization techniques. In visualization-based approaches several methods are used, such as: city metaphors, 3D visualization techniques. Conclusions: We confirm that the detection of CS is a non trivial task, and there is still a lot of work to be done in terms of: reducing the subjectivity associated with the definition and detection of CS; increasing the diversity of detected CS and of supported programming languages; constructing and sharing oracles and datasets to facilitate the replication of CS detection and visualization techniques validation experiments. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: submitted to ARCO

ACM Class: D.2.7

arXiv:2010.15692 [pdf, other]

doi 10.1016/j.csi.2021.103587

Unveiling process insights from refactoring practices

Authors: João Caldeira, Fernando Brito e Abreu, Jorge Cardoso, José Reis

Abstract: Context : Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective… ▽ More Context : Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective, has a long journey ahead. Objective: To improve the understanding of the liaison of developers' activities and software complexity within a refactoring task, namely by evaluating if process metrics gathered from the IDE, using process mining methods and tools, are suitable to accurately classify different refactoring practices and the resulting software complexity. Method: We mined source code metrics from a software product after a quality improvement task was given in parallel to (117) software developers, organized in (71) teams. Simultaneously, we collected events from their IDE work sessions (320) and used process mining to model their processes and extract the correspondent metrics. Results: Most teams using a plugin for refactoring (JDeodorant) reduced software complexity more effectively and with simpler processes than the ones that performed refactoring using only Eclipse native features. We were able to find moderate correlations (43%) between software cyclomatic complexity and process cyclomatic complexity. The best models found for the refactoring method and cyclomatic complexity level predictions, had an accuracy of 92.95% and 94.36%, respectively. Conclusions: Our approach agnostic to programming languages, geographic location, or development practices. Initial findings are encouraging, and lead us to suggest practitioners may use our method in other development tasks, such as, defect analysis and unit or integration tests. △ Less

Submitted 29 October, 2020; originally announced October 2020.

arXiv:2007.10213 [pdf, other]

doi 10.1007/s11831-022-09864-y

Software Development Analytics in Practice: A Systematic Literature Review

Authors: Joao Caldeira, Fernando Brito e Abreu, Jorge Cardoso, Rachel Simões, Toacy Oliveira, José Reis

Abstract: Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade, with an emphasis… ▽ More Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade, with an emphasis on its application in practical settings. Method:Definition and execution of a search string upon several digital libraries, followed by a quality assessment criteria to identify the most relevant papers. On those, we extracted a set of characteristics (study type, data source, study perspective, development life-cycle activities covered, stakeholders, mining methods, and analytics scope) and classified their impact against a taxonomy. Results:Source code repositories, experimental case studies, and developers are the most common data sources, study types, and stakeholders, respectively. Product and project managers are also often present, but less than expected. Mining methods are evolving rapidly and that is reflected in the long list identified. Descriptive statistics are the most usual method followed by correlation analysis. Being software development an important process in every organization, it was unexpected to find that process mining was present in only one study. Most contributions to the software development life cycle were given in the quality dimension. Time management and costs control were lightly debated. The analysis of security aspects suggests it is an increasing topic of concern for practitioners. Risk management contributions are scarce. Conclusions:There is a wide improvement margin for software development analytics in practice. For instance, mining and analyzing the activities performed by software developers in their actual workbench, the IDE. △ Less

Submitted 29 March, 2022; v1 submitted 20 July, 2020; originally announced July 2020.

Journal ref: Archives of Computational Methods in Engineering (ARCO), pp. 2041-2080, vol. 30, Springer, January 2023

arXiv:2006.02471 [pdf, other]

Can WhatsApp Benefit from Debunked Fact-Checked Stories to Reduce Misinformation?

Authors: Julio C. S. Reis, Philipe de Freitas Melo, Kiran Garimella, Fabrício Benevenuto

Abstract: WhatsApp was alleged to be widely used to spread misinformation and propaganda during elections in Brazil and India. Due to the private encrypted nature of the messages on WhatsApp, it is hard to track the dissemination of misinformation at scale. In this work, using public WhatsApp data, we observe that misinformation has been largely shared on WhatsApp public groups even after they were already… ▽ More WhatsApp was alleged to be widely used to spread misinformation and propaganda during elections in Brazil and India. Due to the private encrypted nature of the messages on WhatsApp, it is hard to track the dissemination of misinformation at scale. In this work, using public WhatsApp data, we observe that misinformation has been largely shared on WhatsApp public groups even after they were already fact-checked by popular fact-checking agencies. This represents a significant portion of misinformation spread in both Brazil and India in the groups analyzed. We posit that such misinformation content could be prevented if WhatsApp had a means to flag already fact-checked content. To this end, we propose an architecture that could be implemented by WhatsApp to counter such misinformation. Our proposal respects the current end-to-end encryption architecture on WhatsApp, thus protecting users' privacy while providing an approach to detect the misinformation that benefits from fact-checking efforts. △ Less

Submitted 5 August, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: This is a preprint version of an accepted manuscript on The Harvard Kennedy School (HKS) Misinformation Review. Please, consider to cite it instead of this one

arXiv:2005.14650 [pdf, ps, other]

WhylSon: Proving your Michelson Smart Contracts in Why3

Authors: Luís Pedro Arrojado da Horta, João Santos Reis, Mário Pereira, Simão Melo de Sousa

Abstract: This paper introduces WhylSon, a deductive verification tool for smart contracts written in Michelson, which is the low-level language of the Tezos blockchain. WhylSon accepts a formally specified Michelson contract and automatically translates it to an equivalent program written in WhyML, the programming and specification language of the Why3 framework. Smart contract instructions are mapped into… ▽ More This paper introduces WhylSon, a deductive verification tool for smart contracts written in Michelson, which is the low-level language of the Tezos blockchain. WhylSon accepts a formally specified Michelson contract and automatically translates it to an equivalent program written in WhyML, the programming and specification language of the Why3 framework. Smart contract instructions are mapped into a corresponding WhyML shallow-embedding of the their axiomatic semantics, which we also developed in the context of this work. One major advantage of this approach is that it allows an out-of-the-box integration with the Why3 framework, namely its VCGen and the backend support for several automated theorem provers. We also discuss the use of WhylSon to automatically prove the correctness of diverse annotated smart contracts. △ Less

Submitted 29 May, 2020; originally announced May 2020.

arXiv:2005.11839 [pdf, other]

Tezla, an Intermediate Representation for Static Analysis of Michelson Smart Contracts

Authors: João Santos Reis, Paul Crocker, Simão Melo de Sousa

Abstract: This paper introduces Tezla, an intermediate representation of Michelson smart contracts that eases the design of static smart contract analysers. This intermediate representation uses a store and preserves the semantics, ow and resource usage of the original smart contract. This enables properties like gas consumption to be statically verified. We provide an automated decompiler of Michelson smar… ▽ More This paper introduces Tezla, an intermediate representation of Michelson smart contracts that eases the design of static smart contract analysers. This intermediate representation uses a store and preserves the semantics, ow and resource usage of the original smart contract. This enables properties like gas consumption to be statically verified. We provide an automated decompiler of Michelson smart contracts to Tezla. In order to support our claim about the adequacy of Tezla, we develop a static analyser that takes advantage of the Tezla representation of Michelson smart contracts to prove simple but non-trivial properties. △ Less

Submitted 24 May, 2020; originally announced May 2020.

arXiv:2005.02443 [pdf, other]

A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections

Authors: Julio C. S. Reis, Philipe de Freitas Melo, Kiran Garimella, Jussara M. Almeida, Dean Eckles, Fabrício Benevenuto

Abstract: Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency w… ▽ More Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 7 pages. This is a preprint version of an accepted paper on ICWSM'20. Please, consider to cite the conference version instead of this one

arXiv:1901.08969 [pdf, other]

A Zero-Shot Learning application in Deep Drawing process using Hyper-Process Model

Authors: João Reis, Gil Gonçalves

Abstract: One of the consequences of passing from mass production to mass customization paradigm in the nowadays industrialized world is the need to increase flexibility and responsiveness of manufacturing companies. The high-mix / low-volume production forces constant accommodations of unknown product variants, which ultimately leads to high periods of machine calibration. The difficulty related with machi… ▽ More One of the consequences of passing from mass production to mass customization paradigm in the nowadays industrialized world is the need to increase flexibility and responsiveness of manufacturing companies. The high-mix / low-volume production forces constant accommodations of unknown product variants, which ultimately leads to high periods of machine calibration. The difficulty related with machine calibration is that experience is required together with a set of experiments to meet the final product quality. Unfortunately, all possible combinations of machine parameters is so high that is difficult to build empirical knowledge. Due to this fact, normally trial and error approaches are taken making one-of-a-kind products not viable. Therefore, a Zero-Shot Learning (ZSL) based approach called hyper-process model (HPM) to learn the relation among multiple tasks is used as a way to shorten the calibration phase. Assuming each product variant is a task to solve, first, a shape analysis on data to learn common modes of deformation between tasks is made, and secondly, a map** between these modes and task descriptions is performed. Ultimately, the present work has two main contributions: 1) Formulation of an industrial problem into a ZSL setting where new process models can be generated for process optimization and 2) the definition of a regression problem in the domain of ZSL. For that purpose, a 2-d deep drawing simulated process was used based on data collected from the Abaqus simulator, where a significant number of process models were collected to test the effectiveness of the approach. The obtained results show that is possible to learn new tasks without any available data (both labeled and unlabeled) by leveraging information about already existing tasks, allowing to speed up the calibration phase and make a quicker integration of new products into manufacturing systems. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: 25 pages, 8 figures, 2 tables and submitted to ACM Transactions on Intelligent Systems and Technology. arXiv admin note: text overlap with arXiv:1810.10330

ACM Class: I.2.6; I.2.1

arXiv:1810.10330 [pdf, other]

Hyper-Process Model: A Zero-Shot Learning algorithm for Regression Problems based on Shape Analysis

Authors: Joao Reis, Gil Gonçalves

Abstract: Zero-shot learning (ZSL) can be defined by correctly solving a task where no training data is available, based on previous acquired knowledge from different, but related tasks. So far, this area has mostly drawn the attention from computer vision community where a new unseen image needs to be correctly classified, assuming the target class was not used in the training procedure. Apart from image c… ▽ More Zero-shot learning (ZSL) can be defined by correctly solving a task where no training data is available, based on previous acquired knowledge from different, but related tasks. So far, this area has mostly drawn the attention from computer vision community where a new unseen image needs to be correctly classified, assuming the target class was not used in the training procedure. Apart from image classification, only a couple of generic methods were proposed that are applicable to both classification and regression. These learn the relation among model coefficients so new ones can be predicted according to provided conditions. So far, up to our knowledge, no methods exist that are applicable only to regression, and take advantage from such setting. Therefore, the present work proposes a novel algorithm for regression problems that uses data drawn from trained models, instead of model coefficients. In this case, a shape analyses on the data is performed to create a statistical shape model and generate new shapes to train new models. The proposed algorithm is tested in a theoretical setting using the beta distribution where main problem to solve is to estimate a function that predicts curves, based on already learned different, but related ones. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: 36 pages, 4 figures, 2 tables, submitted to JMLR

MSC Class: 68T99 ACM Class: I.2.6; I.5.1

arXiv:1807.03688 [pdf, other]

Inside the Right-Leaning Echo Chambers: Characterizing Gab, an Unmoderated Social System

Authors: Lucas Lima, Julio C. S. Reis, Philipe Melo, Fabricio Murai, Leandro Araújo, Pantelis Vikatos, Fabrício Benevenuto

Abstract: The moderation of content in many social media systems, such as Twitter and Facebook, motivated the emergence of a new social network system that promotes free speech, named Gab. Soon after that, Gab has been removed from Google Play Store for violating the company's hate speech policy and it has been rejected by Apple for similar reasons. In this paper we characterize Gab, aiming at understanding… ▽ More The moderation of content in many social media systems, such as Twitter and Facebook, motivated the emergence of a new social network system that promotes free speech, named Gab. Soon after that, Gab has been removed from Google Play Store for violating the company's hate speech policy and it has been rejected by Apple for similar reasons. In this paper we characterize Gab, aiming at understanding who are the users who joined it and what kind of content they share in this system. Our findings show that Gab is a very politically oriented system that hosts banned users from other social networks, some of them due to possible cases of hate speech and association with extremism. We provide the first measurement of news dissemination inside a right-leaning echo chamber, investigating a social media where readers are rarely exposed to content that cuts across ideological lines, but rather are fed with content that reinforces their current political or social views. △ Less

Submitted 10 July, 2018; originally announced July 2018.

Comments: This is a preprint of a paper that will appear on ASONAM'18

arXiv:1803.05957 [pdf, ps, other]

doi 10.1109/JLT.2018.2869245

Interplay of Probabilistic Sha** and the Blind Phase Search Algorithm

Authors: Darli A. A. Mello, Fabio A. Barbosa, Jacklyn D. Reis

Abstract: Probabilistic sha** (PS) is a promising technique to approach the Shannon limit using typical constellation geometries. However, the impact of PS on the chain of signal processing algorithms of a coherent receiver still needs further investigation. In this work we study the interplay of PS and phase recovery using the blind phase search (BPS) algorithm, which is widely used in optical communicat… ▽ More Probabilistic sha** (PS) is a promising technique to approach the Shannon limit using typical constellation geometries. However, the impact of PS on the chain of signal processing algorithms of a coherent receiver still needs further investigation. In this work we study the interplay of PS and phase recovery using the blind phase search (BPS) algorithm, which is widely used in optical communications systems. We first investigate a supervised phase search (SPS) algorithm as a theoretical upper bound on the BPS performance, assuming perfect decisions. It is shown that PS influences the SPS algorithm, but its impact can be alleviated by moderate noise rejection window sizes. On the other hand, BPS is affected by PS even for long windows because of correlated erroneous decisions in the phase recovery scheme. The simulation results also show that the capacity-maximizing sha** is near to the BPS worst-case situation for square-QAM constellations, causing potential implementation penalties. △ Less

Submitted 12 September, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: Accepted for publication in the next available issue of the IEEE/OSA Journal of Lightwave Technology (https://ieeexplore.ieee.org/document/8457202/)

arXiv:1705.03972 [pdf, other]

doi 10.1145/3078714.3078734

Demographics of News Sharing in the U.S. Twittersphere

Authors: Julio C. S. Reis, Haewoon Kwak, Jisun An, Johnnatan Messias, Fabricio Benevenuto

Abstract: The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterizati… ▽ More The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach. Among our main findings, we show that males and white users tend to be more active in terms of sharing news, biasing the news audience to the interests of these demographic groups. Our results also quantify differences in interests of news sharing across demographics, which has implications for personalized news digests. △ Less

Submitted 10 May, 2017; originally announced May 2017.

arXiv:1503.07921 [pdf, other]

Breaking the News: First Impressions Matter on Online News

Authors: Julio Reis, Fabrıcio Benevenuto, Pedro O. S. Vaz de Melo, Raquel Prates, Haewoon Kwak, Jisun An

Abstract: A growing number of people are changing the way they consume news, replacing the traditional physical newspapers and magazines by their virtual online versions or/and weblogs. The interactivity and immediacy present in online news are changing the way news are being produced and exposed by media corporations. News websites have to create effective strategies to catch people's attention and attract… ▽ More A growing number of people are changing the way they consume news, replacing the traditional physical newspapers and magazines by their virtual online versions or/and weblogs. The interactivity and immediacy present in online news are changing the way news are being produced and exposed by media corporations. News websites have to create effective strategies to catch people's attention and attract their clicks. In this paper we investigate possible strategies used by online news corporations in the design of their news headlines. We analyze the content of 69,907 headlines produced by four major global media corporations during a minimum of eight consecutive months in 2014. In order to discover strategies that could be used to attract clicks, we extracted features from the text of the news headlines related to the sentiment polarity of the headline. We discovered that the sentiment of the headline is strongly related to the popularity of the news and also with the dynamics of the posted comments on that particular news. △ Less

Submitted 16 April, 2015; v1 submitted 26 March, 2015; originally announced March 2015.

Comments: The paper appears in ICWSM 2015

Showing 1–22 of 22 results for author: Reis, J