-
Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning
Authors:
Niccolò Marini,
Stefano Marchesin,
Lluis Borras Ferris,
Simon Püttmann,
Marek Wodzinski,
Riccardo Fratti,
Damian Podareanu,
Alessandro Caputo,
Svetla Boytcheva,
Simona Vatrano,
Filippo Fraggetta,
Iris Nagtegaal,
Gianmaria Silvello,
Manfredo Atzori,
Henning Müller
Abstract:
The increasing availability of biomedical data is hel** to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to train DL algorithms to perform a specific task is the need for medical experts to label data. Automatic methods to label data exist, however automatic labels can be noisy and it is not completely clear when automatic…
▽ More
The increasing availability of biomedical data is hel** to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to train DL algorithms to perform a specific task is the need for medical experts to label data. Automatic methods to label data exist, however automatic labels can be noisy and it is not completely clear when automatic labels can be adopted to train DL models. This paper aims to investigate under which circumstances automatic labels can be adopted to train a DL model on the classification of Whole Slide Images (WSI). The analysis involves multiple architectures, such as Convolutional Neural Networks (CNN) and Vision Transformer (ViT), and over 10000 WSIs, collected from three use cases: celiac disease, lung cancer and colon cancer, which one including respectively binary, multiclass and multilabel data. The results allow identifying 10% as the percentage of noisy labels that lead to train competitive models for the classification of WSIs. Therefore, an algorithm generating automatic labels needs to fit this criterion to be adopted. The application of the Semantic Knowledge Extractor Tool (SKET) algorithm to generate automatic labels leads to performance comparable to the one obtained with manual labels, since it generates a percentage of noisy labels between 2-5%. Automatic labels are as effective as manual ones, reaching solid performance comparable to the one obtained training models with manual labels.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Learning to Rank from Relevance Judgments Distributions
Authors:
Alberto Purpura,
Gianmaria Silvello,
Gian Antonio Susto
Abstract:
Learning to Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real…
▽ More
Learning to Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real or synthetically generated) assigned to document-topic pairs instead of single-valued relevance labels. We propose five new probabilistic loss functions to deal with the higher expressive power provided by relevance judgments distributions and show how they can be applied both to neural and GBM architectures. Moreover, we show how training a LETOR model on a sampled version of the relevance judgments from certain probability distributions can improve its performance when relying either on traditional or probabilistic loss functions. Finally, we validate our hypothesis on real-world crowdsourced relevance judgments distributions. Overall, we observe that relying on relevance judgments distributions to train different LETOR models can boost their performance and even outperform strong baselines such as LambdaMART on several test collections.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Algorithmic Fairness Datasets: the Story so Far
Authors:
Alessandro Fabris,
Stefano Messina,
Gianmaria Silvello,
Gian Antonio Susto
Abstract:
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. As a result, a growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair…
▽ More
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. As a result, a growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity). In this work, we target data documentation debt by surveying over two hundred datasets employed in algorithmic fairness research, and producing standardized and searchable documentation for each of them. Moreover we rigorously identify the three most popular fairness datasets, namely Adult, COMPAS and German Credit, for which we compile in-depth documentation.
This unifying documentation effort supports multiple contributions. Firstly, we summarize the merits and limitations of Adult, COMPAS and German Credit, adding to and unifying recent scholarship, calling into question their suitability as general-purpose fairness benchmarks. Secondly, we document and summarize hundreds of available alternatives, annotating their domain and supported fairness tasks, along with additional properties of interest for fairness researchers. Finally, we analyze these datasets from the perspective of five important data curation topics: anonymization, consent, inclusivity, sensitive attributes, and transparency. We discuss different approaches and levels of attention to these topics, making them tangible, and distill them into a set of best practices for the curation of novel resources.
△ Less
Submitted 26 September, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Incentives for Item Duplication under Fair Ranking Policies
Authors:
Giorgio Maria Di Nunzio,
Alessandro Fabris,
Gianmaria Silvello,
Gian Antonio Susto
Abstract:
Ranking is a fundamental operation in information access systems, to filter information and direct user attention towards items deemed most relevant to them. Due to position bias, items of similar relevance may receive significantly different exposure, raising fairness concerns for item providers and motivating recent research into fair ranking. While the area has progressed dramatically over rece…
▽ More
Ranking is a fundamental operation in information access systems, to filter information and direct user attention towards items deemed most relevant to them. Due to position bias, items of similar relevance may receive significantly different exposure, raising fairness concerns for item providers and motivating recent research into fair ranking. While the area has progressed dramatically over recent years, no study to date has investigated the potential problem posed by duplicated items. Duplicates and near-duplicates are common in several domains, including marketplaces and document collections available to search engines. In this work, we study the behaviour of different fair ranking policies in the presence of duplicates, quantifying the extra-exposure gained by redundant items. We find that fairness-aware ranking policies may conflict with diversity, due to their potential to incentivize duplication more than policies solely focused on relevance. This fact poses a problem for system owners who, as a result of this incentive, may have to deal with increased redundancy, which is at odds with user satisfaction. Finally, we argue that this aspect represents a blind spot in the normative reasoning underlying common fair ranking metrics, as rewarding providers who duplicate their items with increased exposure seems unfair for the remaining providers.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
Algorithmic Audit of Italian Car Insurance: Evidence of Unfairness in Access and Pricing
Authors:
Alessandro Fabris,
Alan Mishler,
Stefano Gottardi,
Mattia Carletti,
Matteo Daicampi,
Gian Antonio Susto,
Gianmaria Silvello
Abstract:
We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and int…
▽ More
We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and international regulations against their use. Birthplace, in particular, is used quite frequently to the disadvantage of foreign-born drivers and drivers born in certain Italian cities. In extreme cases, a driver born in Laos may be charged 1,000 euros more than a driver born in Milan, all else being equal. For a subset of our sample, we collect quotes directly on a company website, where the direct influence of gender and birthplace is confirmed. Finally, we find that drivers with riskier profiles tend to see fewer quotes in the aggregator result pages, substantiating concerns of differential treatment raised in the past by Italian insurance regulators.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Neural Feature Selection for Learning to Rank
Authors:
Alberto Purpura,
Karolina Buchner,
Gianmaria Silvello,
Gian Antonio Susto
Abstract:
LEarning TO Rank (LETOR) is a research area in the field of Information Retrieval (IR) where machine learning models are employed to rank a set of items. In the past few years, neural LETOR approaches have become a competitive alternative to traditional ones like LambdaMART. However, neural architectures performance grew proportionally to their complexity and size. This can be an obstacle for thei…
▽ More
LEarning TO Rank (LETOR) is a research area in the field of Information Retrieval (IR) where machine learning models are employed to rank a set of items. In the past few years, neural LETOR approaches have become a competitive alternative to traditional ones like LambdaMART. However, neural architectures performance grew proportionally to their complexity and size. This can be an obstacle for their adoption in large-scale search systems where a model size impacts latency and update time. For this reason, we propose an architecture-agnostic approach based on a neural LETOR model to reduce the size of its input by up to 60% without affecting the system performance. This approach also allows to reduce a LETOR model complexity and, therefore, its training and inference time up to 50%.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms
Authors:
Alessandro Fabris,
Alberto Purpura,
Gianmaria Silvello,
Gian Antonio Susto
Abstract:
Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, levera…
▽ More
Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, leveraging gender-related information encoded in WEs. Through the critical lens of construct validity, we validate the proposed measure on synthetic and real collections. Subsequently, we use GSR to compare widely-used Information Retrieval ranking algorithms, including lexical, semantic, and neural models. We check if and how ranking algorithms based on WEs inherit the biases of the underlying embeddings. We also consider the most common debiasing approaches for WEs proposed in the literature and test their impact in terms of GSR and common performance measures. To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
A Relation Extraction Approach for Clinical Decision Support
Authors:
Maristella Agosti,
Giorgio Maria Di Nunzio,
Stefano Marchesin,
Gianmaria Silvello
Abstract:
In this paper, we investigate how semantic relations between concepts extracted from medical documents can be employed to improve the retrieval of medical literature. Semantic relations explicitly represent relatedness between concepts and carry high informative power that can be leveraged to improve the effectiveness of retrieval functionalities of clinical decision support systems. We present pr…
▽ More
In this paper, we investigate how semantic relations between concepts extracted from medical documents can be employed to improve the retrieval of medical literature. Semantic relations explicitly represent relatedness between concepts and carry high informative power that can be leveraged to improve the effectiveness of retrieval functionalities of clinical decision support systems. We present preliminary results and show how relations are able to provide a sizable increase of the precision for several topics, albeit having no impact on others. We then discuss some future directions to minimize the impact of negative results while maximizing the impact of good results.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
A Progressive Visual Analytics Tool for Incremental Experimental Evaluation
Authors:
Fabio Giachelle,
Gianmaria Silvello
Abstract:
This paper presents a visual tool, AVIATOR, that integrates the progressive visual analytics paradigm in the IR evaluation process. This tool serves to speed-up and facilitate the performance assessment of retrieval models enabling a result analysis through visual facilities. AVIATOR goes one step beyond the common "compute wait visualize" analytics paradigm, introducing a continuous evaluation me…
▽ More
This paper presents a visual tool, AVIATOR, that integrates the progressive visual analytics paradigm in the IR evaluation process. This tool serves to speed-up and facilitate the performance assessment of retrieval models enabling a result analysis through visual facilities. AVIATOR goes one step beyond the common "compute wait visualize" analytics paradigm, introducing a continuous evaluation mechanism that minimizes human and computational resource consumption.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
An InfoVis Tool for Interactive Component-Based Evaluation
Authors:
Giacomo Rocco,
Gianmaria Silvello
Abstract:
In this paper, we present an InfoVis tool based on Sankey diagrams for the exploration of large combinatorial combinations of IR components - the Grid of Points (GoP). The goal of this tool is to ease the comprehension of the behavior of single IR components within fully functioning off-the-shelf IR systems without recurring to complex statistical tools.
In this paper, we present an InfoVis tool based on Sankey diagrams for the exploration of large combinatorial combinations of IR components - the Grid of Points (GoP). The goal of this tool is to ease the comprehension of the behavior of single IR components within fully functioning off-the-shelf IR systems without recurring to complex statistical tools.
△ Less
Submitted 31 January, 2019;
originally announced January 2019.
-
Theory and Practice of Data Citation
Authors:
Gianmaria Silvello
Abstract:
Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online cur…
▽ More
Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has.
The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.
△ Less
Submitted 24 June, 2017;
originally announced June 2017.