Search | arXiv e-print repository

Theoretical Guarantees of Data Augmented Last Layer Retraining Methods

Authors: Monica Welfert, Nathan Stromberg, Lalitha Sankar

Abstract: Ensuring fair predictions across many distinct subpopulations in the training data can be prohibitive for large models. Recently, simple linear last layer retraining strategies, in combination with data augmentation methods such as upweighting, downsampling and mixup, have been shown to achieve state-of-the-art performance for worst-group accuracy, which quantifies accuracy for the least prevalent… ▽ More Ensuring fair predictions across many distinct subpopulations in the training data can be prohibitive for large models. Recently, simple linear last layer retraining strategies, in combination with data augmentation methods such as upweighting, downsampling and mixup, have been shown to achieve state-of-the-art performance for worst-group accuracy, which quantifies accuracy for the least prevalent subpopulation. For linear last layer retraining and the abovementioned augmentations, we present the optimal worst-group accuracy when modeling the distribution of the latent representations (input to the last layer) as Gaussian for each subpopulation. We evaluate and verify our results for both synthetic and large publicly available datasets. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Extended version of a paper accepted to ISIT 2024. arXiv admin note: text overlap with arXiv:2402.11039

arXiv:2402.11039 [pdf, other]

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Authors: Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

Abstract: Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla em… ▽ More Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla empirical risk minimization. We introduce Regularized Annotation of Domains (RAD) in order to train robust last layer classifiers without the need for explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5% noise in the training data for several publicly available datasets. △ Less

Submitted 26 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Generalized Gaussian assumption

arXiv:2310.18291 [pdf, other]

Addressing GAN Training Instabilities via Tunable Classification Losses

Authors: Monica Welfert, Gowtham R. Kurri, Kyle Otstot, Lalitha Sankar

Abstract: Generative adversarial networks (GANs), modeled as a zero-sum game between a generator (G) and a discriminator (D), allow generating synthetic data with formal guarantees. Noting that D is a classifier, we begin by reformulating the GAN value function using class probability estimation (CPE) losses. We prove a two-way correspondence between CPE loss GANs and $f$-GANs which minimize $f$-divergences… ▽ More Generative adversarial networks (GANs), modeled as a zero-sum game between a generator (G) and a discriminator (D), allow generating synthetic data with formal guarantees. Noting that D is a classifier, we begin by reformulating the GAN value function using class probability estimation (CPE) losses. We prove a two-way correspondence between CPE loss GANs and $f$-GANs which minimize $f$-divergences. We also show that all symmetric $f$-divergences are equivalent in convergence. In the finite sample and model capacity setting, we define and obtain bounds on estimation and generalization errors. We specialize these results to $α$-GANs, defined using $α$-loss, a tunable CPE loss family parametrized by $α\in(0,\infty]$. We next introduce a class of dual-objective GANs to address training instabilities of GANs by modeling each player's objective using $α$-loss to obtain $(α_D,α_G)$-GANs. We show that the resulting non-zero sum game simplifies to minimizing an $f$-divergence under appropriate conditions on $(α_D,α_G)$. Generalizing this dual-objective formulation using CPE losses, we define and obtain upper bounds on an appropriately defined estimation error. Finally, we highlight the value of tuning $(α_D,α_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring as well as the large publicly available Celeb-A and LSUN Classroom image datasets. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.14320

arXiv:2302.14320 [pdf, other]

$(α_D,α_G)$-GANs: Addressing GAN Training Instabilities via Dual Objectives

Authors: Monica Welfert, Kyle Otstot, Gowtham R. Kurri, Lalitha Sankar

Abstract: In an effort to address the training instabilities of GANs, we introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D). In particular, we model each objective using $α$-loss, a tunable classification loss, to obtain $(α_D,α_G)$-GANs, parameterized by $(α_D,α_G)\in (0,\infty]^2$. For sufficiently large number of samples and ca… ▽ More In an effort to address the training instabilities of GANs, we introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D). In particular, we model each objective using $α$-loss, a tunable classification loss, to obtain $(α_D,α_G)$-GANs, parameterized by $(α_D,α_G)\in (0,\infty]^2$. For sufficiently large number of samples and capacities for G and D, we show that the resulting non-zero sum game simplifies to minimizing an $f$-divergence under appropriate conditions on $(α_D,α_G)$. In the finite sample and capacity setting, we define estimation error to quantify the gap in the generator's performance relative to the optimal setting with infinite samples and obtain upper bounds on this error, showing it to be order optimal under certain conditions. Finally, we highlight the value of tuning $(α_D,α_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets. △ Less

Submitted 3 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Extended version of a paper accepted to ISIT 2023

arXiv:2205.06393 [pdf, other]

$α$-GAN: Convergence and Estimation Guarantees

Authors: Gowtham R. Kurri, Monica Welfert, Tyler Sypherd, Lalitha Sankar

Abstract: We prove a two-way correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences. We then focus on $α$-GAN, defined via the $α$-loss, which interpolates several GANs (Hellinger, vanilla, Total Variation) and corresponds to the minimization of the Arimoto divergence. We show that the Arimoto divergences induced by $α$-GAN equiva… ▽ More We prove a two-way correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences. We then focus on $α$-GAN, defined via the $α$-loss, which interpolates several GANs (Hellinger, vanilla, Total Variation) and corresponds to the minimization of the Arimoto divergence. We show that the Arimoto divergences induced by $α$-GAN equivalently converge, for all $α\in \mathbb{R}_{>0}\cup\{\infty\}$. However, under restricted learning models and finite samples, we provide estimation bounds which indicate diverse GAN behavior as a function of $α$. Finally, we present empirical results on a toy dataset that highlight the practical utility of tuning the $α$ hyperparameter. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Extended version of a paper accepted to ISIT 2022. 12 pages, 7 figures

arXiv:1910.00411 [pdf, other]

Generating Fair Universal Representations using Adversarial Models

Authors: Peter Kairouz, Jiachun Liao, Chong Huang, Maunil Vyas, Monica Welfert, Lalitha Sankar

Abstract: We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constra… ▽ More We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constrained minimax game between an encoder and an adversary where the constraint ensures a measure of usefulness (utility) of the representation. The resulting problem is that of censoring, i.e., finding a representation that is least informative about the sensitive attributes given a utility constraint. For appropriately chosen adversarial loss functions, our censoring framework precisely clarifies the optimal adversarial strategy against strong information-theoretic adversaries; it also achieves the fairness measure of demographic parity for the resulting constrained representations. We evaluate the performance of our proposed framework on both synthetic and publicly available datasets. For these datasets, we use two tradeoff measures: censoring vs. representation fidelity and fairness vs. utility for downstream tasks, to amply demonstrate that multiple sensitive features can be effectively censored even as the resulting fair representations ensure accuracy for multiple downstream tasks. △ Less

Submitted 11 May, 2022; v1 submitted 27 September, 2019; originally announced October 2019.

Comments: Extended version of a paper accepted to TIFS

Showing 1–6 of 6 results for author: Welfert, M