Skip to main content

Showing 1–18 of 18 results for author: Demartini, G

.
  1. arXiv:2406.00046  [pdf, other

    cs.CL cs.LG

    Hate Speech Detection with Generalizable Target-aware Fairness

    Authors: Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca Demartini, Hongzhi Yin

    Abstract: To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: To appear in KDD 2024

  2. arXiv:2401.08993  [pdf, other

    cs.CY cs.IR

    Estimating Gender Completeness in Wikipedia

    Authors: Hrishikesh Patel, Tianwa Chen, Ivano Bongiovanni, Gianluca Demartini

    Abstract: Gender imbalance in Wikipedia content is a known challenge which the editor community is actively addressing. The aim of this paper is to provide the Wikipedia community with instruments to estimate the magnitude of the problem for different entity types (also known as classes) in Wikipedia. To this end, we apply class completeness estimation methods based on the gender attribute. Our results show… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. arXiv:2401.02986  [pdf, other

    cs.CL cs.AI

    Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven Methods

    Authors: Catherine Sai, Shazia Sadiq, Lei Han, Gianluca Demartini, Stefanie Rinderle-Ma

    Abstract: Organizations face the challenge of ensuring compliance with an increasing amount of requirements from various regulatory documents. Which requirements are relevant depends on aspects such as the geographic location of the organization, its domain, size, and business processes. Considering these contextual factors, as a first step, relevant documents (e.g., laws, regulations, directives, policies)… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  4. arXiv:2305.09686  [pdf, other

    cs.LG cs.AI cs.IR

    Data Bias Management

    Authors: Gianluca Demartini, Kevin Roitero, Stefano Mizzaro

    Abstract: Due to the widespread use of data-powered systems in our everyday lives, concepts like bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train supervised machine learning systems. With the commercialization and deployment of such systems t… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted in May 2023 for publication in CACM

  5. arXiv:2305.01595  [pdf, other

    cs.CV cs.CY cs.LG

    On the Impact of Data Quality on Image Classification Fairness

    Authors: Aki Barry, Lei Han, Gianluca Demartini

    Abstract: With the proliferation of algorithmic decision-making, increased scrutiny has been placed on these systems. This paper explores the relationship between the quality of the training data and the overall fairness of the models trained with such data in the context of supervised classification. We measure key fairness metrics across a range of algorithms over multiple image classification datasets th… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  6. Perspectives on Large Language Models for Relevance Judgment

    Authors: Guglielmo Faggioli, Laura Dietz, Charles Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, Henning Wachsmuth

    Abstract: When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for LLMs to support relevance judgments along with concerns and issues that arise. We devise a human--machine collaboration spectrum th… ▽ More

    Submitted 18 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    ACM Class: H.3.3

  7. arXiv:2208.09214  [pdf, other

    cs.IR cs.AI cs.DB

    Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?

    Authors: Mohammed Saeed, Nicolas Traub, Maelle Nicolas, Gianluca Demartini, Paolo Papotti

    Abstract: Fact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of new content to be checked. Methods based on crowdsourcing have been proposed to tackle this challenge, as they can scale with a smaller cost, but, wh… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Journal ref: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022)

  8. arXiv:2110.13504  [pdf, ps, other

    cs.IR

    Managing Bias in Human-Annotated Data: Moving Beyond Bias Removal

    Authors: Gianluca Demartini, Kevin Roitero, Stefano Mizzaro

    Abstract: Due to the widespread use of data-powered systems in our everyday lives, the notions of bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train systems. With the commercialization and employment of such systems that are sometimes delegated… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted at CSCW 2021 Workshop Investigating and Mitigating Biases in Crowdsourced Data, October 23, 2021, Virtual

  9. The Many Dimensions of Truthfulness: Crowdsourcing Misinformation Assessments on a Multidimensional Scale

    Authors: Michael Soprano, Kevin Roitero, David La Barbera, Davide Ceolin, Damiano Spina, Stefano Mizzaro, Gianluca Demartini

    Abstract: Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) using a coarse grained assessment scale, the crowd ca… ▽ More

    Submitted 23 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: 33 pages; Paper accepted at Information Processing & Management on July 28, 2021; IP&M Special Issue on Dis/Misinformation Mining from Social Media

    MSC Class: 68P20 ACM Class: H.3

    Journal ref: Information Processing & Management Information Processing & Management, Volume 58, Issue 6, November 2021, 102710

  10. arXiv:2107.13519  [pdf, other

    cs.HC

    On the state of reporting in crowdsourcing experiments and a checklist to aid current practices

    Authors: Jorge Ramírez, Burcu Sayin, Marcos Baez, Fabio Casati, Luca Cernuzzi, Boualem Benatallah, Gianluca Demartini

    Abstract: Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., sampling crowd workers, map** experimental conditions to micro-tasks, or ensure quality contributions. While several guideline… ▽ More

    Submitted 9 September, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: Accepted to CSCW 2021

  11. Can the Crowd Judge Truthfulness? A Longitudinal Study on Recent Misinformation about COVID-19

    Authors: Kevin Roitero, Michael Soprano, Beatrice Portelli, Massimiliano De Luise, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, Gianluca Demartini

    Abstract: Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to COVID-19, thus addressing (mis)information that is… ▽ More

    Submitted 19 September, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: 31 pages; Preprint of an article accepted in Personal and Ubiquitous Computing (Special Issue on Intelligent Systems for Tackling Online Harms, 2021). arXiv admin note: substantial text overlap with arXiv:2008.05701

    MSC Class: 68P20 ACM Class: H.3

  12. The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation Objectively?

    Authors: Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, Gianluca Demartini

    Abstract: Misinformation is an ever increasing problem that is difficult to solve for the research community and has a negative impact on the society at large. Very recently, the problem has been addressed with a crowdsourcing-based approach to scale up labeling efforts: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of (non-expert) judges is exploited. We follow the… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: 10 pages; Preprint of the full paper accepted at CIKM 2020

    MSC Class: 68P20 ACM Class: H.3

  13. arXiv:2007.11659   

    cs.IR cs.DB

    Proceedings of the KG-BIAS Workshop 2020 at AKBC 2020

    Authors: Edgar Meij, Tara Safavi, Chenyan Xiong, Gianluca Demartini, Miriam Redi, Fatma Özcan

    Abstract: The KG-BIAS 2020 workshop touches on biases and how they surface in knowledge graphs (KGs), biases in the source data that is used to create KGs, methods for measuring or remediating bias in KGs, but also identifying other biases such as how and which languages are represented in automatically constructed KGs or how personal KGs might incur inherent biases. The goal of this workshop is to uncover… ▽ More

    Submitted 18 June, 2020; originally announced July 2020.

  14. Can The Crowd Identify Misinformation Objectively? The Effects of Judgment Scale and Assessor's Background

    Authors: Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, Gianluca Demartini

    Abstract: Truthfulness judgments are a fundamental step in the process of fighting misinformation, as they are crucial to train and evaluate classifiers that automatically distinguish true and false statements. Usually such judgments are made by experts, like journalists for political statements or medical doctors for medical statements. In this paper, we follow a different approach and rely on (non-expert)… ▽ More

    Submitted 24 June, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: Preprint of the full paper accepted at SIGIR 2020

    MSC Class: 68P20 ACM Class: H.3

  15. Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs -- The Case of Wikidata

    Authors: Michael Luggen, Djellel Difallah, Cristina Sarasua, Gianluca Demartini, Philippe Cudré-Mauroux

    Abstract: Collaborative Knowledge Graph platforms allow humans and automated scripts to collaborate in creating, updating and interlinking entities and facts. To ensure both the completeness of the data as well as a uniform coverage of the different topics, it is crucial to identify underrepresented classes in the Knowledge Graph. In this paper, we tackle this problem by develo** statistical techniques fo… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  16. arXiv:1710.09788  [pdf, other

    cs.AI

    FashionBrain Project: A Vision for Understanding Europe's Fashion Data Universe

    Authors: Alessandro Checco, Gianluca Demartini, Alexander Loeser, Ines Arous, Mourad Khayati, Matthias Dantone, Richard Koopmanschap, Svetlin Stalinov, Martin Kersten, Ying Zhang

    Abstract: A core business in the fashion industry is the understanding and prediction of customer needs and trends. Search engines and social networks are at the same time a fundamental bridge and a costly middleman between the customer's purchase intention and the retailer. To better exploit Europe's distinctive characteristics e.g., multiple languages, fashion and cultural differences, it is pivotal to re… ▽ More

    Submitted 26 October, 2017; originally announced October 2017.

  17. arXiv:1609.02171  [pdf, other

    cs.IR cs.HC

    The Effect of Class Imbalance and Order on Crowdsourced Relevance Judgments

    Authors: Rehab K. Qarout, Alessandro Checco, Gianluca Demartini

    Abstract: In this paper we study the effect on crowd worker efficiency and effectiveness of the dominance of one class in the data they process. We aim at understanding if there is any positive or negative bias in workers seeing many negative examples in the identification of positive labels. To test our hypothesis, we design an experiment where crowd workers are asked to judge the relevance of documents pr… ▽ More

    Submitted 4 September, 2016; originally announced September 2016.

  18. arXiv:1609.00683  [pdf, other

    cs.IR cs.HC

    Pairwise, Magnitude, or Stars: What's the Best Way for Crowds to Rate?

    Authors: Alessandro Checco, Gianluca Demartini

    Abstract: We compare three popular techniques of rating content: the ubiquitous five star rating, the less used pairwise comparison, and the recently introduced (in crowdsourcing) magnitude estimation approach. Each system has specific advantages and disadvantages, in terms of required user effort, achievable user preference prediction accuracy and number of ratings required. We design an experiment where… ▽ More

    Submitted 2 September, 2016; originally announced September 2016.