Individual Fairness Guarantees for Neural Networks
Authors:
Elias Benussi,
Andrea Patane,
Matthew Wicker,
Luca Laurenti,
Marta Kwiatkowska
Abstract:
We consider the problem of certifying the individual fairness (IF) of feed-forward neural networks (NNs). In particular, we work with the $ε$-$δ$-IF formulation, which, given a NN and a similarity metric learnt from data, requires that the output difference between any pair of $ε$-similar individuals is bounded by a maximum decision tolerance $δ\geq 0$. Working with a range of metrics, including t…
▽ More
We consider the problem of certifying the individual fairness (IF) of feed-forward neural networks (NNs). In particular, we work with the $ε$-$δ$-IF formulation, which, given a NN and a similarity metric learnt from data, requires that the output difference between any pair of $ε$-similar individuals is bounded by a maximum decision tolerance $δ\geq 0$. Working with a range of metrics, including the Mahalanobis distance, we propose a method to overapproximate the resulting optimisation problem using piecewise-linear functions to lower and upper bound the NN's non-linearities globally over the input space. We encode this computation as the solution of a Mixed-Integer Linear Programming problem and demonstrate that it can be used to compute IF guarantees on four datasets widely used for fairness benchmarking. We show how this formulation can be used to encourage models' fairness at training time by modifying the NN loss, and empirically confirm our approach yields NNs that are orders of magnitude fairer than state-of-the-art methods.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
Authors:
Hannah Kirk,
Yennie Jun,
Haider Iqbal,
Elias Benussi,
Filippo Volpin,
Frederic A. Dreyer,
Aleksandar Shtedritski,
Yuki M. Asano
Abstract:
The capabilities of natural language models trained on large-scale data have increased immensely over the past few years. Open source libraries such as HuggingFace have made these models easily available and accessible. While prior research has identified biases in large language models, this paper considers biases contained in the most popular versions of these models when applied `out-of-the-box…
▽ More
The capabilities of natural language models trained on large-scale data have increased immensely over the past few years. Open source libraries such as HuggingFace have made these models easily available and accessible. While prior research has identified biases in large language models, this paper considers biases contained in the most popular versions of these models when applied `out-of-the-box' for downstream tasks. We focus on generative language models as they are well-suited for extracting biases inherited from training data. Specifically, we conduct an in-depth analysis of GPT-2, which is the most downloaded text generation model on HuggingFace, with over half a million downloads per month. We assess biases related to occupational associations for different protected categories by intersecting gender with religion, sexuality, ethnicity, political affiliation, and continental name origin. Using a template-based data collection pipeline, we collect 396K sentence completions made by GPT-2 and find: (i) The machine-predicted jobs are less diverse and more stereotypical for women than for men, especially for intersections; (ii) Intersectional interactions are highly relevant for occupational associations, which we quantify by fitting 262 logistic models; (iii) For most occupations, GPT-2 reflects the skewed gender and ethnicity distribution found in US Labor Bureau data, and even pulls the societally-skewed distribution towards gender parity in cases where its predictions deviate from real labor market observations. This raises the normative question of what language models should learn - whether they should reflect or correct for existing inequalities.
△ Less
Submitted 27 October, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.