Skip to main content

Showing 1–19 of 19 results for author: Calmon, F P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.03867  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

    Authors: Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

    Abstract: Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estim… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in the IEEE Journal on Selected Areas in Information Theory (JSAIT)

  2. arXiv:2305.19429  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Adapting Fairness Interventions to Missing Values

    Authors: Raymond Feng, Flavio P. Calmon, Hao Wang

    Abstract: Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Different demographic groups may be unequally affected by missing data, and the standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification -- a procedure referred to as "impute-then-classify" -- can exacerbate discrimination. In this p… ▽ More

    Submitted 10 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023

  3. arXiv:2302.14517  [pdf, other

    cs.LG cs.CR cs.CY stat.ML

    Arbitrary Decisions are a Hidden Cost of Differentially Private Training

    Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon

    Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: To appear in ACM FAccT 2023

  4. arXiv:2301.11781  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

    Authors: Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon

    Abstract: Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify ale… ▽ More

    Submitted 15 April, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

  5. arXiv:2109.10431  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

    Authors: Haewon Jeong, Hao Wang, Flavio P. Calmon

    Abstract: We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fai… ▽ More

    Submitted 13 April, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

  6. arXiv:2102.02976  [pdf, other

    stat.ML cs.IT cs.LG

    Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels

    Authors: Hao Wang, Rui Gao, Flavio P. Calmon

    Abstract: Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and infor… ▽ More

    Submitted 27 December, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

  7. arXiv:2102.01258  [pdf, other

    cs.IT cs.LG stat.ML

    Local Differential Privacy Is Equivalent to Contraction of $E_γ$-Divergence

    Authors: Shahab Asoodeh, Maryam Aliakbarpour, Flavio P. Calmon

    Abstract: We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_γ$-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergen… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.11035

  8. arXiv:2012.11035  [pdf, other

    cs.IT cs.CR cs.LG stat.ML

    Contraction of $E_γ$-Divergence and Its Applications to Privacy

    Authors: Shahab Asoodeh, Mario Diaz, Flavio P. Calmon

    Abstract: We investigate the contraction coefficients derived from strong data processing inequalities for the $E_γ$-divergence. By generalizing the celebrated Dobrushin's coefficient from total variation distance to $E_γ$-divergence, we derive a closed-form expression for the contraction of $E_γ$-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local diff… ▽ More

    Submitted 10 February, 2023; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: Submitted

  9. arXiv:2008.06529  [pdf, other

    cs.IT cs.AI cs.CR stat.ML

    Three Variants of Differential Privacy: Lossless Conversion and Applications

    Authors: Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

    Abstract: We consider three different variants of differential privacy (DP), namely approximate DP, Rényi DP (RDP), and hypothesis test DP. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal approximate DP parameters of a mechanism… ▽ More

    Submitted 23 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: To appear in IEEE Journal on Selected Areas in Information Theory, Special Issue on Privacy and Security of Information Systems. arXiv admin note: text overlap with arXiv:2001.05990

  10. arXiv:2006.07326  [pdf, other

    cs.LG cs.CV stat.ML

    CPR: Classifier-Projection Regularization for Continual Learning

    Authors: Sungmin Cha, Hsiang Hsu, Taebaek Hwang, Flavio P. Calmon, Taesup Moon

    Abstract: We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this… ▽ More

    Submitted 19 April, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: ICLR 2021 camera ready version

  11. arXiv:2002.04788  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    To Split or Not to Split: The Impact of Disparate Treatment in Classification

    Authors: Hao Wang, Hsiang Hsu, Mario Diaz, Flavio P. Calmon

    Abstract: Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers… ▽ More

    Submitted 13 April, 2022; v1 submitted 11 February, 2020; originally announced February 2020.

  12. arXiv:2001.06546  [pdf, other

    cs.IT cs.CR cs.LG stat.ML

    Privacy Amplification of Iterative Algorithms via Contraction Coefficients

    Authors: Shahab Asoodeh, Mario Diaz, Flavio P. Calmon

    Abstract: We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative map**s can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin'… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: Submitted for publication

  13. arXiv:2001.05990  [pdf, other

    cs.IT cs.CR cs.LG stat.ML

    A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via $f$-Divergences

    Authors: Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

    Abstract: We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Rényi differential privacy (RDP). Our result is based on the joint range of two $f$-divergences that underlie the approximate and the Rényi variations of differential privacy. We apply our result to the moments accountant framework for characterizing privacy guarantees of stochastic gradient d… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: Submitted for Publication

  14. arXiv:1902.07828  [pdf, other

    stat.ML cs.IT cs.LG

    Correspondence Analysis Using Neural Networks

    Authors: Hsiang Hsu, Salman Salamatian, Flavio P. Calmon

    Abstract: Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia compo… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

    Comments: Accepted to AISTATS 2019. Overlaps with arXiv:1806.08449

  15. arXiv:1901.10501  [pdf, other

    cs.LG cs.CY cs.IT stat.ML

    Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions

    Authors: Hao Wang, Berk Ustun, Flavio P. Calmon

    Abstract: When the performance of a machine learning model varies over groups defined by sensitive attributes (e.g., gender or ethnicity), the performance disparity can be expressed in terms of the probability distributions of the input and output variables over each group. In this paper, we exploit this fact to reduce the disparate impact of a fixed classification model over a population of interest. Given… ▽ More

    Submitted 17 May, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

  16. arXiv:1812.01105  [pdf, other

    cs.CY cs.LG stat.ML

    Correspondence Analysis of Government Expenditure Patterns

    Authors: Hsiang Hsu, Flavio P. Calmon, José Cândido Silveira Santos Filho, Andre P. Calmon, Salman Salamatian

    Abstract: We analyze expenditure patterns of discretionary funds by Brazilian congress members. This analysis is based on a large dataset containing over $7$ million expenses made publicly available by the Brazilian government. This dataset has, up to now, remained widely untouched by machine learning methods. Our main contributions are two-fold: (i) we provide a novel dataset benchmark for machine learning… ▽ More

    Submitted 29 November, 2018; originally announced December 2018.

    Comments: Presented at NIPS 2018 Workshop on Machine Learning for the Develo** World

  17. arXiv:1806.08449  [pdf, other

    cs.LG cs.IT stat.ML

    Generalizing Correspondence Analysis for Applications in Machine Learning

    Authors: Hsiang Hsu, Salman Salamatian, Flavio P. Calmon

    Abstract: Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA i… ▽ More

    Submitted 27 June, 2020; v1 submitted 21 June, 2018; originally announced June 2018.

    Comments: 30 pages, 7 figures, 6 tables. arXiv admin note: text overlap with arXiv:1902.07828

  18. arXiv:1801.05398  [pdf, other

    cs.IT cs.LG stat.ML

    On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning

    Authors: Hao Wang, Berk Ustun, Flavio P. Calmon

    Abstract: In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate… ▽ More

    Submitted 11 May, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

  19. arXiv:1704.03354  [pdf, other

    stat.ML cs.CY cs.IT

    Optimized Data Pre-Processing for Discrimination Prevention

    Authors: Flavio P. Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, Kush R. Varshney

    Abstract: Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the imp… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.