Search | arXiv e-print repository

doi 10.1109/JSAIT.2024.3397741

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

Authors: Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

Abstract: Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estim… ▽ More Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estimating the worst-case performance gap across groups (e.g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes. To address this issue, we propose an approach to test for performance disparities based on Conditional Value-at-Risk (CVaR). By allowing a small probabilistic slack on the groups over which a model has approximately equal performance, we show that the sample complexity required for discovering performance violations is reduced exponentially to be at most upper bounded by the square root of the number of groups. As a byproduct of our analysis, when the groups are weighted by a specific prior distribution, we show that Rényi entropy of order 2/3 of the prior distribution captures the sample complexity of the proposed CVaR test algorithm. Finally, we also show that there exists a non-i.i.d. data collection strategy that results in a sample complexity independent of the number of groups. △ Less

Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted for publication in the IEEE Journal on Selected Areas in Information Theory (JSAIT)

arXiv:2305.19429 [pdf, other]

Adapting Fairness Interventions to Missing Values

Authors: Raymond Feng, Flavio P. Calmon, Hao Wang

Abstract: Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Different demographic groups may be unequally affected by missing data, and the standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification -- a procedure referred to as "impute-then-classify" -- can exacerbate discrimination. In this p… ▽ More Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Different demographic groups may be unequally affected by missing data, and the standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification -- a procedure referred to as "impute-then-classify" -- can exacerbate discrimination. In this paper, we analyze how missing values affect algorithmic fairness. We first prove that training a classifier from imputed data can significantly worsen the achievable values of group fairness and average accuracy. This is because imputing data results in the loss of the missing pattern of the data, which often conveys information about the predictive label. We present scalable and adaptive algorithms for fair classification with missing values. These algorithms can be combined with any preexisting fairness-intervention algorithm to handle all possible missing patterns while preserving information encoded within the missing patterns. Numerical experiments with state-of-the-art fairness interventions demonstrate that our adaptive algorithms consistently achieve higher fairness and accuracy than impute-then-classify across different datasets. △ Less

Submitted 10 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2302.14517 [pdf, other]

doi 10.1145/3593013.3594103

Arbitrary Decisions are a Hidden Cost of Differentially Private Training

Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon

Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output… ▽ More Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output predicted by equally-private models depends on the randomness used in training. Thus, for a given input, the predicted output can vary drastically if a model is re-trained, even if the same training dataset is used. The predictive-multiplicity cost of DP training has not been studied, and is currently neither audited for nor communicated to model designers and stakeholders. We derive a bound on the number of re-trainings required to estimate predictive multiplicity reliably. We analyze--both theoretically and through extensive experiments--the predictive-multiplicity cost of three DP-ensuring algorithms: output perturbation, objective perturbation, and DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the level of privacy increases, and is unevenly distributed across individuals and demographic groups in the data. Because randomness used to ensure DP during training explains predictions for some examples, our results highlight a fundamental challenge to the justifiability of decisions supported by differentially private models in high-stakes settings. We conclude that practitioners should audit the predictive multiplicity of their DP-ensuring algorithms before deploying them in applications of individual-level consequence. △ Less

Submitted 15 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: To appear in ACM FAccT 2023

arXiv:2301.11781 [pdf, other]

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Authors: Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon

Abstract: Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify ale… ▽ More Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination. △ Less

Submitted 15 April, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2109.10431 [pdf, other]

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

Authors: Haewon Jeong, Hao Wang, Flavio P. Calmon

Abstract: We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fai… ▽ More We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets. △ Less

Submitted 13 April, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2102.02976 [pdf, other]

Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels

Authors: Hao Wang, Rui Gao, Flavio P. Calmon

Abstract: Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and infor… ▽ More Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory. Our generalization bounds shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastic gradient Langevin dynamics (SGLD). We demonstrate our bounds through numerical experiments, showing that they can help understand recent empirical observations of the generalization phenomena of neural networks. △ Less

Submitted 27 December, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

arXiv:2102.01258 [pdf, other]

Local Differential Privacy Is Equivalent to Contraction of $E_γ$-Divergence

Authors: Shahab Asoodeh, Maryam Aliakbarpour, Flavio P. Calmon

Abstract: We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_γ$-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergen… ▽ More We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_γ$-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergences. When combined with standard estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:2012.11035

arXiv:2012.11035 [pdf, other]

Contraction of $E_γ$-Divergence and Its Applications to Privacy

Authors: Shahab Asoodeh, Mario Diaz, Flavio P. Calmon

Abstract: We investigate the contraction coefficients derived from strong data processing inequalities for the $E_γ$-divergence. By generalizing the celebrated Dobrushin's coefficient from total variation distance to $E_γ$-divergence, we derive a closed-form expression for the contraction of $E_γ$-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local diff… ▽ More We investigate the contraction coefficients derived from strong data processing inequalities for the $E_γ$-divergence. By generalizing the celebrated Dobrushin's coefficient from total variation distance to $E_γ$-divergence, we derive a closed-form expression for the contraction of $E_γ$-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local differential privacy can be equivalently expressed in terms of the contraction of $E_γ$-divergence. This equivalent formula can be used to precisely quantify the impact of local privacy in (Bayesian and minimax) estimation and hypothesis testing problems in terms of the reduction of effective sample size. Second, it leads to a new information-theoretic technique for analyzing privacy guarantees of online algorithms. In this technique, we view such algorithms as a composition of amplitude-constrained Gaussian channels and then relate their contraction coefficients under $E_γ$-divergence to the overall differential privacy guarantees. As an example, we apply our technique to derive the differential privacy parameters of gradient descent. Moreover, we also show that this framework can be tailored to batch learning algorithms that can be implemented with one pass over the training dataset. △ Less

Submitted 10 February, 2023; v1 submitted 20 December, 2020; originally announced December 2020.

Comments: Submitted

arXiv:2008.06529 [pdf, other]

Three Variants of Differential Privacy: Lossless Conversion and Applications

Authors: Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

Abstract: We consider three different variants of differential privacy (DP), namely approximate DP, Rényi DP (RDP), and hypothesis test DP. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal approximate DP parameters of a mechanism… ▽ More We consider three different variants of differential privacy (DP), namely approximate DP, Rényi DP (RDP), and hypothesis test DP. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal approximate DP parameters of a mechanism that satisfies a given level of RDP. As an application, we apply our result to the moments accountant framework for characterizing privacy guarantees of noisy stochastic gradient descent (SGD). When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget. In the second part, we establish a relationship between RDP and hypothesis test DP which allows us to translate the RDP constraint into a tradeoff between type I and type II error probabilities of a certain binary hypothesis test. We then demonstrate that for noisy SGD our result leads to tighter privacy guarantees compared to the recently proposed $f$-DP framework for some range of parameters. △ Less

Submitted 23 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: To appear in IEEE Journal on Selected Areas in Information Theory, Special Issue on Privacy and Security of Information Systems. arXiv admin note: text overlap with arXiv:2001.05990

arXiv:2006.07326 [pdf, other]

CPR: Classifier-Projection Regularization for Continual Learning

Authors: Sungmin Cha, Hsiang Hsu, Taebaek Hwang, Flavio P. Calmon, Taesup Moon

Abstract: We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this… ▽ More We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods. The codes and scripts for this work are available at https://github.com/csm9493/CPR_CL. △ Less

Submitted 19 April, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: ICLR 2021 camera ready version

arXiv:2002.04788 [pdf, other]

To Split or Not to Split: The Impact of Disparate Treatment in Classification

Authors: Hao Wang, Hsiang Hsu, Mario Diaz, Flavio P. Calmon

Abstract: Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers… ▽ More Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets. △ Less

Submitted 13 April, 2022; v1 submitted 11 February, 2020; originally announced February 2020.

arXiv:2001.06546 [pdf, other]

Privacy Amplification of Iterative Algorithms via Contraction Coefficients

Authors: Shahab Asoodeh, Mario Diaz, Flavio P. Calmon

Abstract: We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative map**s can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin'… ▽ More We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative map**s can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an $f$-divergence known as $E_γ$-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Comments: Submitted for publication

arXiv:2001.05990 [pdf, other]

A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via $f$-Divergences

Authors: Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

Abstract: We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Rényi differential privacy (RDP). Our result is based on the joint range of two $f$-divergences that underlie the approximate and the Rényi variations of differential privacy. We apply our result to the moments accountant framework for characterizing privacy guarantees of stochastic gradient d… ▽ More We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Rényi differential privacy (RDP). Our result is based on the joint range of two $f$-divergences that underlie the approximate and the Rényi variations of differential privacy. We apply our result to the moments accountant framework for characterizing privacy guarantees of stochastic gradient descent. When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: Submitted for Publication

arXiv:1902.07828 [pdf, other]

Correspondence Analysis Using Neural Networks

Authors: Hsiang Hsu, Salman Salamatian, Flavio P. Calmon

Abstract: Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia compo… ▽ More Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables. △ Less

Submitted 20 February, 2019; originally announced February 2019.

Comments: Accepted to AISTATS 2019. Overlaps with arXiv:1806.08449

arXiv:1901.10501 [pdf, other]

Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions

Authors: Hao Wang, Berk Ustun, Flavio P. Calmon

Abstract: When the performance of a machine learning model varies over groups defined by sensitive attributes (e.g., gender or ethnicity), the performance disparity can be expressed in terms of the probability distributions of the input and output variables over each group. In this paper, we exploit this fact to reduce the disparate impact of a fixed classification model over a population of interest. Given… ▽ More When the performance of a machine learning model varies over groups defined by sensitive attributes (e.g., gender or ethnicity), the performance disparity can be expressed in terms of the probability distributions of the input and output variables over each group. In this paper, we exploit this fact to reduce the disparate impact of a fixed classification model over a population of interest. Given a black-box classifier, we aim to eliminate the performance gap by perturbing the distribution of input variables for the disadvantaged group. We refer to the perturbed distribution as a counterfactual distribution, and characterize its properties for common fairness criteria. We introduce a descent algorithm to learn a counterfactual distribution from data. We then discuss how the estimated distribution can be used to build a data preprocessor that can reduce disparate impact without training a new model. We validate our approach through experiments on real-world datasets, showing that it can repair different forms of disparity without a significant drop in accuracy. △ Less

Submitted 17 May, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

arXiv:1812.01105 [pdf, other]

Correspondence Analysis of Government Expenditure Patterns

Authors: Hsiang Hsu, Flavio P. Calmon, José Cândido Silveira Santos Filho, Andre P. Calmon, Salman Salamatian

Abstract: We analyze expenditure patterns of discretionary funds by Brazilian congress members. This analysis is based on a large dataset containing over $7$ million expenses made publicly available by the Brazilian government. This dataset has, up to now, remained widely untouched by machine learning methods. Our main contributions are two-fold: (i) we provide a novel dataset benchmark for machine learning… ▽ More We analyze expenditure patterns of discretionary funds by Brazilian congress members. This analysis is based on a large dataset containing over $7$ million expenses made publicly available by the Brazilian government. This dataset has, up to now, remained widely untouched by machine learning methods. Our main contributions are two-fold: (i) we provide a novel dataset benchmark for machine learning-based efforts for government transparency to the broader research community, and (ii) introduce a neural network-based approach for analyzing and visualizing outlying expense patterns. Our hope is that the approach presented here can inspire new machine learning methodologies for government transparency applicable to other develo** nations. △ Less

Submitted 29 November, 2018; originally announced December 2018.

Comments: Presented at NIPS 2018 Workshop on Machine Learning for the Develo** World

arXiv:1806.08449 [pdf, other]

Generalizing Correspondence Analysis for Applications in Machine Learning

Authors: Hsiang Hsu, Salman Salamatian, Flavio P. Calmon

Abstract: Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA i… ▽ More Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA in terms of an information-theoretic quantity called the principal inertia components. We show that estimating the principal inertia components, which consists in solving a functional optimization problem over the space of finite variance functions of two random variable, is equivalent to performing CA. We then leverage this insight to design novel algorithms to perform CA at an unprecedented scale. Particularly, we demonstrate how the principal inertia components can be reliably approximated from data using deep neural networks. Finally, we show how these maximally correlated embeddings of pairs of random variables in CA further play a central role in several learning problems including visualization of classification boundary and training process, and underlying recent multi-view and multi-modal learning methods. △ Less

Submitted 27 June, 2020; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: 30 pages, 7 figures, 6 tables. arXiv admin note: text overlap with arXiv:1902.07828

arXiv:1801.05398 [pdf, other]

On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning

Authors: Hao Wang, Berk Ustun, Flavio P. Calmon

Abstract: In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate… ▽ More In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a recidivism prediction problem based on the ProPublica COMPAS dataset. △ Less

Submitted 11 May, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

arXiv:1704.03354 [pdf, other]

Optimized Data Pre-Processing for Discrimination Prevention

Authors: Flavio P. Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, Kush R. Varshney

Abstract: Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the imp… ▽ More Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective, and apply two instances of the proposed optimization to datasets, including one on real-world criminal recidivism. The results demonstrate that all three criteria can be simultaneously achieved and also reveal interesting patterns of bias in American society. △ Less

Submitted 11 April, 2017; originally announced April 2017.

Showing 1–19 of 19 results for author: Calmon, F P