Search | arXiv e-print repository

doi 10.1145/3531146.3533105

De-biasing "bias" measurement

Authors: Kristian Lum, Yunfeng Zhang, Amanda Bower

Abstract: When a model's performance differs across socially or culturally relevant groups--like race, gender, or the intersections of many such groups--it is often called "biased." While much of the work in algorithmic fairness over the last several years has focused on develo** various definitions of model fairness (the absence of group-wise model performance disparities) and eliminating such "bias," mu… ▽ More When a model's performance differs across socially or culturally relevant groups--like race, gender, or the intersections of many such groups--it is often called "biased." While much of the work in algorithmic fairness over the last several years has focused on develo** various definitions of model fairness (the absence of group-wise model performance disparities) and eliminating such "bias," much less work has gone into rigorously measuring it. In practice, it important to have high quality, human digestible measures of model performance disparities and associated uncertainty quantification about them that can serve as inputs into multi-faceted decision-making processes. In this paper, we show both mathematically and through simulation that many of the metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying quantities they purport to represent. We argue that this can cause misleading conclusions about the relative group-wise model performance disparities along different dimensions, especially in cases where some sensitive variables consist of categories with few members. We propose the "double-corrected" variance estimator, which provides unbiased estimates and uncertainty quantification of the variance of model performance across groups. It is conceptually simple and easily implementable without statistical software package or numerical optimization. We demonstrate the utility of this approach through simulation and show on a real dataset that while statistically biased estimators of group-wise model performance disparities indicate statistically significant differences, when accounting for statistical bias in the estimator, the estimated between-group disparities are no longer statistically significant. △ Less

Submitted 29 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

Journal ref: 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22), June 21--24, 2022, Seoul, Republic of Korea

arXiv:2103.11023 [pdf, other]

Individually Fair Ranking

Authors: Amanda Bower, Hamid Eftekhari, Mikhail Yurochkin, Yuekai Sun

Abstract: We develop an algorithm to train individually fair learning-to-rank (LTR) models. The proposed approach ensures items from minority groups appear alongside similar items from majority groups. This notion of fair ranking is based on the definition of individual fairness from supervised learning and is more nuanced than prior fair LTR approaches that simply ensure the ranking model provides underrep… ▽ More We develop an algorithm to train individually fair learning-to-rank (LTR) models. The proposed approach ensures items from minority groups appear alongside similar items from majority groups. This notion of fair ranking is based on the definition of individual fairness from supervised learning and is more nuanced than prior fair LTR approaches that simply ensure the ranking model provides underrepresented items with a basic level of exposure. The crux of our method is an optimal transport-based regularizer that enforces individual fairness and an efficient algorithm for optimizing the regularizer. We show that our approach leads to certifiably individually fair LTR models and demonstrate the efficacy of our method on ranking tasks subject to demographic biases. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: ICLR Camera-Ready Version

arXiv:2002.09615 [pdf, other]

Preference Modeling with Context-Dependent Salient Features

Authors: Amanda Bower, Laura Balzano

Abstract: We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose… ▽ More We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the salient feature preference model and prove a finite sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US. △ Less

Submitted 26 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

Comments: This is the ICML camera ready version. The main difference is that the statements of the theorems now hold with arbitrary probability δinstead of with probability 1-2/d

Journal ref: ICML 2020

arXiv:1907.00020 [pdf, other]

Training individually fair ML models with Sensitive Subspace Robustness

Authors: Mikhail Yurochkin, Amanda Bower, Yuekai Sun

Abstract: We consider training machine learning models that are fair in the sense that their performance is invariant under certain sensitive perturbations to the inputs. For example, the performance of a resume screening system should be invariant under changes to the gender and/or ethnicity of the applicant. We formalize this notion of algorithmic fairness as a variant of individual fairness and develop a… ▽ More We consider training machine learning models that are fair in the sense that their performance is invariant under certain sensitive perturbations to the inputs. For example, the performance of a resume screening system should be invariant under changes to the gender and/or ethnicity of the applicant. We formalize this notion of algorithmic fairness as a variant of individual fairness and develop a distributionally robust optimization approach to enforce it during training. We also demonstrate the effectiveness of the approach on two ML tasks that are susceptible to gender and racial biases. △ Less

Submitted 13 March, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

Comments: ICLR 2020 (spotlight)

arXiv:1707.00391 [pdf, other]

Fair Pipelines

Authors: Amanda Bower, Sarah N. Kitchen, Laura Niss, Martin J. Strauss, Alexander Vargas, Suresh Venkatasubramanian

Abstract: This work facilitates ensuring fairness of machine learning in the real world by decoupling fairness considerations in compound decisions. In particular, this work studies how fairness propagates through a compound decision-making processes, which we call a pipeline. Prior work in algorithmic fairness only focuses on fairness with respect to one decision. However, many decision-making processes re… ▽ More This work facilitates ensuring fairness of machine learning in the real world by decoupling fairness considerations in compound decisions. In particular, this work studies how fairness propagates through a compound decision-making processes, which we call a pipeline. Prior work in algorithmic fairness only focuses on fairness with respect to one decision. However, many decision-making processes require more than one decision. For instance, hiring is at least a two stage model: deciding who to interview from the applicant pool and then deciding who to hire from the interview pool. Perhaps surprisingly, we show that the composition of fair components may not guarantee a fair pipeline under a $(1+\varepsilon)$-equal opportunity definition of fair. However, we identify circumstances that do provide that guarantee. We also propose numerous directions for future work on more general compound machine learning decisions. △ Less

Submitted 2 July, 2017; originally announced July 2017.

Comments: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

Showing 1–5 of 5 results for author: Bower, A