Mind the Graph When Balancing Data for Fairness or Robustness

Jessica Schrouff
Google DeepMind
[email protected]
&Alexis Bellot
Google DeepMind
&Amal Rannen-Triki
Google DeepMind
&Alan Malek
Google DeepMind
&Isabela Albuquerque
Google DeepMind
&Arthur Gretton
Google DeepMind
Gatsby Computational Neuroscience Unit
&Alexander D’Amour
Google DeepMind
&Silvia Chiappa
Google DeepMind
Abstract

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

1 Introduction

When training prediction models, practitioners often desire that the model’s outputs display safety properties in addition to high performance, such as being fair across demographic subgroups [29, 50] or being robust to distribution shifts [e.g. 19, 58]. These objectives can be difficult to attain if there are undesired dependencies between covariates X𝑋Xitalic_X, labels Y𝑌Yitalic_Y, and auxiliary factors of variation Z𝑍Zitalic_Z, such as confounding factors or hidden stratification [26, 27]. A commonly referenced example is that of an animal classification task from wildlife pictures [e.g. 63]: the model might identify patterns in the background of the images that are indicative of the type of animal (e.g. the presence of snow for polar bears or grass for cows), which might lead to the model failing to recognize the same animal when it is on another background. When the auxiliary factors relate to demographic attributes, the deployment of such models can have societal implications, e.g. patients not being assigned medical resources due to factors related to race [53].

Multiple mitigation strategies have been proposed to remove undesired dependencies pre-, in- or post-processing. Amongst them, balancing the training data is typically considered a straightforward approach and has been used or researched in various settings [e.g. 37, 38, 59, 8, 33, 39, 2]. This approach modifies the training distribution, indicated with P(X,Y,Z)𝑃𝑋𝑌𝑍P(X,Y,Z)italic_P ( italic_X , italic_Y , italic_Z ), into a new, balanced distribution (which we refer to as Q(X,Y,Z)𝑄𝑋𝑌𝑍Q(X,Y,Z)italic_Q ( italic_X , italic_Y , italic_Z )) that aims to approximate an ‘idealized’ training distribution in which the undesired dependencies are absent [47, 14, 76]. Models are then trained on this balanced distribution to attain different fairness or robustness criteria. A popular approach to construct a balanced distribution is by balancing classes (resp. groups), leading to a uniform distribution over Y𝑌Yitalic_Y (resp. Z𝑍Zitalic_Z). While successful for addressing failures of robustness [e.g. 33] or of fairness due to under-representation of certain groups [e.g. 74], this approach does not induce independence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. To approximate independence, a ‘joint’ balancing on (Y,Z)𝑌𝑍(Y,Z)( italic_Y , italic_Z ) is often performed [e.g.  47, 8]. Joint balancing can be implemented by matching the numbers of samples in all (y,z)𝑦𝑧(y,z)( italic_y , italic_z ) groups (only feasible when Y𝑌Yitalic_Y and Z𝑍Zitalic_Z have small, discrete domains) via subsampling the majority groups [e.g.  8], upsampling the minority groups [e.g.  62], resampling the data with weights proportional to P(Y)P(Z)/P(Y,Z)𝑃𝑌𝑃𝑍𝑃𝑌𝑍P(Y)P(Z)/P(Y,Z)italic_P ( italic_Y ) italic_P ( italic_Z ) / italic_P ( italic_Y , italic_Z ), or reweighting the loss [9]. Our work focuses on joint balancing given its suitability to mitigate a marginal dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z.111We briefly discuss group or class data balancing in Appendix A.1. While the choice of the method for jointly balancing can impact the results [11, 64, 33], these methods can be seen as modifying P𝑃Pitalic_P as described in Definition 1.1.

Definition 1.1 (Jointly balanced distribution).

We say that the distribution Q(X,Y,Z)𝑄𝑋𝑌𝑍Q(X,Y,Z)italic_Q ( italic_X , italic_Y , italic_Z ) is a jointly balanced version of P(X,Y,Z)𝑃𝑋𝑌𝑍P(X,Y,Z)italic_P ( italic_X , italic_Y , italic_Z ) if Q(X,Y,Z)=P(X,Y,Z)P(Y)P(Z)P(Y,Z)𝑄𝑋𝑌𝑍𝑃𝑋𝑌𝑍𝑃𝑌𝑃𝑍𝑃𝑌𝑍Q(X,Y,Z)=P(X,Y,Z)\frac{P(Y)P(Z)}{P(Y,Z)}italic_Q ( italic_X , italic_Y , italic_Z ) = italic_P ( italic_X , italic_Y , italic_Z ) divide start_ARG italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG italic_P ( italic_Y , italic_Z ) end_ARG.

In some cases, data balancing has proven to be an effective mitigation strategy for undesired dependencies, performing on-par with other, more complex mitigation techniques [33]. Recently, data balancing has also shown promises for mitigation during fine-tuning or partial retraining [40, 43, 48, 78, 74], which is relevant to the settings of training large-scale models and with large amounts of data. Nevertheless, data balancing has also displayed failure modes in which the obtained models were not fair, robust or optimal [75, 47, 57, 2]. These failure modes have not been thoroughly characterized and can be difficult to predict. Furthermore, the impact of data balancing on other mitigation strategies has not been studied extensively.

Given data balancing’s popularity as a baseline mitigation strategy for undesired dependencies, we aim to formalize some of its promises and pitfalls. Our analysis relies on a causal graphical framework, which allows investigating the impact of data balancing in different data generating processes. Our contributions can be summarized as follows:

  • We display failure modes of data balancing in semi-synthetic tasks and highlight how predicting these failures can be challenging.

  • We introduce necessary and sufficient conditions for data balancing to attain invariance to undesired dependencies as defined by fairness or robustness criteria.

  • We prove that data balancing does not correspond to ‘removing’ undesired dependencies from a causal perspective, and can negatively impact fairness or robustness criteria when combined with regularization strategies.

  • We illustrate how our findings can be used to distinguish between failure modes and identify next steps.

2 Preliminaries

Let X𝑋Xitalic_X, Y𝑌Yitalic_Y, Z𝑍Zitalic_Z be random variables with X𝒳𝑋𝒳{X\in\mathcal{X}}italic_X ∈ caligraphic_X corresponding to a set of covariates (e.g. tabular, images or text), Y𝒴𝑌𝒴Y\in\mathcal{Y}italic_Y ∈ caligraphic_Y to an outcome to be predicted, and Z𝒵𝑍𝒵Z\in\mathcal{Z}italic_Z ∈ caligraphic_Z to an auxiliary factor of variation, such as a sensitive attribute or the type of background of an image, that displays statistical dependence with Y𝑌Yitalic_Y in the original, training distribution P(X,Y,Z)𝑃𝑋𝑌𝑍P(X,Y,Z)italic_P ( italic_X , italic_Y , italic_Z ). We consider a prediction model f:𝒳𝒴:𝑓𝒳𝒴f:\mathcal{X}\rightarrow\mathcal{Y}italic_f : caligraphic_X → caligraphic_Y that is trained on data from distribution P(X,Y,Z)𝑃𝑋𝑌𝑍P(X,Y,Z)italic_P ( italic_X , italic_Y , italic_Z ) to minimize the risk RP(f):=𝔼X,YP[(f;X,Y)]assignsubscript𝑅𝑃𝑓subscript𝔼similar-to𝑋𝑌𝑃𝑓𝑋𝑌R_{P}(f):=\operatorname{\mathbb{E}}_{X,Y\sim P}[\ell(f;X,Y)]italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_f ) := blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] where \ellroman_ℓ is a loss function. We call f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F optimal on P𝑃Pitalic_P if the risk attains the minimum for P𝑃Pitalic_P.

Definition 2.1 (Optimality).

A prediction model f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F is optimal on P𝑃Pitalic_P if f=argminfRP(f)𝑓subscriptsuperscript𝑓subscript𝑅𝑃superscript𝑓f=\arg\!\min_{f^{\prime}\in\mathcal{F}}R_{P}(f^{\prime})italic_f = roman_arg roman_min start_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_F end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

2.1 Desired criteria on a model’s predictions

While a model may be optimal on P𝑃Pitalic_P, it might not be optimal on another distribution of interest P(X,Y,Z)superscript𝑃𝑋𝑌𝑍P^{\prime}(X,Y,Z)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y , italic_Z ) (e.g. in deployment), and/or might display disparities across subsets of the data (e.g. P(X,Y|Z=z)𝑃𝑋conditional𝑌𝑍𝑧P(X,Y\,|\,Z=z)italic_P ( italic_X , italic_Y | italic_Z = italic_z )) [22]. To mitigate this issue, multiple safety criteria have been defined in the fields of fairness and robustness.

Fairness: Fairness criteria can be defined in terms of the dependence between the model’s output f(X)𝑓𝑋f(X)italic_f ( italic_X ) and the auxiliary factor of variation Z𝑍Zitalic_Z. We consider established fairness criteria [5, 50], including demographic parity [f(X)Zperpendicular-toabsentperpendicular-to𝑓𝑋𝑍f(X)\mathrel{\perp\mspace{-10.0mu}\perp}Zitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP italic_Z, 23], equalized odds [f(X)Z|Yperpendicular-toabsentperpendicular-to𝑓𝑋conditional𝑍𝑌f(X)\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y, 29] and predictive parity [YZ|f(X)perpendicular-toabsentperpendicular-to𝑌conditional𝑍𝑓𝑋Y\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,f(X)italic_Y start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_f ( italic_X ), 24]. Beyond fairness of f(X)𝑓𝑋f(X)italic_f ( italic_X ), we also consider fairness of intermediate representations ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ), e.g. ϕ(X)Zperpendicular-toabsentperpendicular-toitalic-ϕ𝑋𝑍\phi(X)\mathrel{\perp\mspace{-10.0mu}\perp}Zitalic_ϕ ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP italic_Z [80], for their usage in downstream tasks.

Robustness: In this field, the focus is typically on finding models fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT parameterized by θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ that provide the lowest risk across a family of target distributions 𝒫𝒫\mathcal{P}caligraphic_P. For instance, the ‘worst group performance’ criterion aims to select parameters such that the performance on a ‘worst’ distribution Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is optimized, i.e. θ=minθΘ{supP𝒫RP(fθ)}superscript𝜃subscript𝜃Θsubscriptsupremumsuperscript𝑃𝒫subscript𝑅superscript𝑃subscript𝑓𝜃\theta^{*}=\min_{\theta\in\Theta}\{\sup_{P^{\prime}\in\mathcal{P}}R_{P^{\prime% }}(f_{\theta})\}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT { roman_sup start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) } [6, 20]. 𝒫𝒫\mathcal{P}caligraphic_P can be defined so that each distribution Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT represents a specific subpopulation [63], to minimize the loss in each subgroup, or aiming for an invariance of RPsubscript𝑅superscript𝑃R_{P^{\prime}}italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT across subgroups [risk-invariance 47].

Definition 2.2 (Risk-invariance).

A prediction model f𝑓fitalic_f is risk-invariant w.r.t. the family of distributions 𝒫𝒫\mathcal{P}caligraphic_P if RP(f)=RP′′(f)subscript𝑅superscript𝑃𝑓subscript𝑅superscript𝑃′′𝑓R_{P^{\prime}}(f)=R_{P^{\prime\prime}}(f)italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_f ) = italic_R start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_f ) P,P′′𝒫for-allsuperscript𝑃superscript𝑃′′𝒫\forall P^{\prime},P^{\prime\prime}\in\mathcal{P}∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_P.

If a model is optimal on P𝑃Pitalic_P and risk-invariant w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P, it is also optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P. The choice of 𝒫𝒫\mathcal{P}caligraphic_P is context-specific and reflects some domain knowledge about shifts that are likely to arise in a given application. For instance, a plausible family of target distributions could imply a shift in the dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z, also known as a correlation shift [61], and be expressed as 𝒫={P(X,Y,Z)=P(X|Y,Z)P(Z|Y)P(Y),P(Z|Y)}𝒫superscript𝑃𝑋𝑌𝑍𝑃conditional𝑋𝑌𝑍superscript𝑃conditional𝑍𝑌𝑃𝑌for-allsuperscript𝑃conditional𝑍𝑌\mathcal{P}=\{P^{\prime}(X,Y,Z)=P(X\,|\,Y,Z)P^{\prime}(Z\,|\,Y)P(Y),\forall P^% {\prime}(Z\,|\,Y)\}caligraphic_P = { italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y , italic_Z ) = italic_P ( italic_X | italic_Y , italic_Z ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) italic_P ( italic_Y ) , ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) }. Alternatively, we can define 𝒫𝒫\mathcal{P}caligraphic_P using a causal framework (see Section 2.2) when the data generation process is known [47].

We acknowledge that selecting amongst those criteria is context-dependent and do not advocate for a specific choice. We call a prediction model f𝑓fitalic_f invariant to undesired dependencies, denoted with finv𝑓subscript𝑖𝑛𝑣f\in\mathcal{F}_{inv}italic_f ∈ caligraphic_F start_POSTSUBSCRIPT italic_i italic_n italic_v end_POSTSUBSCRIPT, if it satisfies one of such criteria. For brevity, we focus on risk-invariance in the main text and consider fairness criteria in Appendix. Obtaining an invariant model can be performed in different ways, with data balancing being a popular approach.

2.2 Causal framework to analyse data balancing

To understand the effects of data balancing, we need to investigate its impact on the distribution P𝑃Pitalic_P. A causal formalization is useful for studying how distributions change under different interventions. To analyse the implications of data balancing, we use the framework of causal Bayesian networks (CBNs) [e.g. 70, 13, 51, 73, 25, 47]. A Bayesian network [54, 55, 15, 41] is a pair 𝒢,P𝒢𝑃\langle\mathcal{G},P\rangle⟨ caligraphic_G , italic_P ⟩, in which 𝒢𝒢\mathcal{G}caligraphic_G is a directed acyclic graph whose nodes X1,,XDsuperscript𝑋1superscript𝑋𝐷X^{1},\ldots,X^{D}italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT represent random variables and in which P𝑃Pitalic_P is a joint distribution over the nodes. The absence of edges in 𝒢𝒢\mathcal{G}caligraphic_G implies a set of statistical independence assumptions satisfied by P𝑃Pitalic_P that can be expressed by the factorization P(X1,,XD)=d=1DP(Xd|pa(Xd))𝑃superscript𝑋1superscript𝑋𝐷superscriptsubscriptproduct𝑑1𝐷𝑃conditionalsuperscript𝑋𝑑pasuperscript𝑋𝑑P(X^{1},\dots,X^{D})=\prod_{d=1}^{D}P(X^{d}\,|\,{\text{pa}}(X^{d}))italic_P ( italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | pa ( italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ), where pa(Xd)pasuperscript𝑋𝑑{\text{pa}}(X^{d})pa ( italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) denote the parents of Xdsuperscript𝑋𝑑X^{d}italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, namely the nodes with an edge into Xdsuperscript𝑋𝑑X^{d}italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (we say that P𝑃Pitalic_P factorizes according to 𝒢𝒢\mathcal{G}caligraphic_G). A CBN is a Bayesian network in which an edge expresses causal influence, so that pa(Xd)pasuperscript𝑋𝑑{\text{pa}}(X^{d})pa ( italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) are direct causes of Xdsuperscript𝑋𝑑X^{d}italic_X start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. A directed path between Xisuperscript𝑋𝑖X^{i}italic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and Xjsuperscript𝑋𝑗X^{j}italic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT in a CBN is also called a causal path. A non-directed path, also called non-causal path, expresses statistical dependence of non-causal nature. We refer to the statistical dependence between Xisuperscript𝑋𝑖X^{i}italic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and Xjsuperscript𝑋𝑗X^{j}italic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT that arises only due to the presence of non-causal paths as purely spurious. In our setting X1XD=XYZ𝐔superscript𝑋1superscript𝑋𝐷𝑋𝑌𝑍𝐔X^{1}\cup\dots\cup X^{D}=X\cup Y\cup Z\cup\mathbf{U}italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∪ ⋯ ∪ italic_X start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT = italic_X ∪ italic_Y ∪ italic_Z ∪ bold_U where 𝐔𝐔\mathbf{U}bold_U are unobserved variables. Inspired by prior work [73, 3, 69, 76], we make the following assumption on the form of the covariates X𝑋Xitalic_X.

U𝑈Uitalic_UZ𝑍Zitalic_ZY𝑌Yitalic_YXZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPTXYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT
U𝑈Uitalic_UZ𝑍Zitalic_ZY𝑌Yitalic_YXZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPTXYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT
Z𝑍Zitalic_ZY𝑌Yitalic_YXZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPTXYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPTV𝑉Vitalic_VU1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTU2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTU3subscript𝑈3U_{3}italic_U start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTXVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT
U𝑈Uitalic_UZ𝑍Zitalic_ZY𝑌Yitalic_YXZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPTXYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT

(a) Anti-causal
Purely spurious

(b) Causal
Purely spurious

(c) Anti-causal
Factor of variation V𝑉Vitalic_V

(d) Anti-causal
Entangled data

Figure 1: Examples of CBNs with undesired dependencies between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z displayed by red edges. Light gray indicates unobserved variables. XYZ=subscript𝑋𝑌𝑍X_{Y\wedge Z}=\emptysetitalic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT = ∅ in (a-b) and there is no entanglement between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z via X𝑋Xitalic_X. In (c), we expand the system to include V𝐔𝑉𝐔V\in\mathbf{U}italic_V ∈ bold_U and its influence on X𝑋Xitalic_X, which is given by XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT.
Assumption 2.3 (Form of Covariates X𝑋Xitalic_X).

In the system defined by XYZU𝑋𝑌𝑍𝑈X\cup Y\cup Z\cup Uitalic_X ∪ italic_Y ∪ italic_Z ∪ italic_U with U𝐔𝑈𝐔U\in\mathbf{U}italic_U ∈ bold_U, X𝑋Xitalic_X decomposes as X=XZXYXYZ𝑋subscriptsuperscript𝑋perpendicular-to𝑍subscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍X=X^{\perp}_{Z}\cup X^{\perp}_{Y}\cup X_{Y\wedge Z}italic_X = italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ∪ italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ∪ italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT, where XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a function of X𝑋Xitalic_X that does not have causal paths to/from Z𝑍Zitalic_Z but has causal paths to/from Y𝑌Yitalic_Y, XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is a function of X𝑋Xitalic_X that does not have causal paths to/from Y𝑌Yitalic_Y but has causal paths to/from Z𝑍Zitalic_Z, and XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT is a function of X𝑋Xitalic_X that has causal paths to/from both Y𝑌Yitalic_Y and Z𝑍Zitalic_Z, representing entangled signals.

In the animal classification example, XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT would correspond to the animal pixels, XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT to the background pixels (e.g. snowy or grassy landscape), and XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT to characteristics of the animal that depend on its environment (e.g. color of the fur pixels in bears). Intuitively, we want to build a prediction model f𝑓fitalic_f that only depends on the animal pixels. While the decomposition may be readily available when a causal graph of the application is available and the data is tabular, we typically do not have direct access to the different functions of X𝑋Xitalic_X and these need to be isolated algorithmically.

Following Schölkopf et al. [65], we consider both the case in which XZXYZsubscriptsuperscript𝑋perpendicular-to𝑍subscript𝑋𝑌𝑍X^{\perp}_{Z}\cup X_{Y\wedge Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ∪ italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT are direct causes of the label Y𝑌Yitalic_Y (causal task) e.g. estimating the helpfulness of a text review, and the case in which Y𝑌Yitalic_Y is a direct cause of XZXYZsubscriptsuperscript𝑋perpendicular-to𝑍subscript𝑋𝑌𝑍X^{\perp}_{Z}\cup X_{Y\wedge Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ∪ italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT (anti-causal task) as in object detection tasks in computer vision. Figures  1(a-b) display examples of anti-causal and causal tasks with a purely spurious dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. It is important to note that statistical relationships between the different variables and functions of X𝑋Xitalic_X are determined by the graph: for instance, in Figure 1(a) XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y, while in Figure 1(b) XZZperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍𝑍X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Zitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z.

Based on a CBN of the task and Assumption 2.3, we characterize undesired dependencies as the presence of undesired paths between Z𝑍Zitalic_Z and Y𝑌Yitalic_Y, which we indicate through red edges (Figure 1). Based on this depiction of undesired dependencies, we can define the family of target distributions 𝒫𝒫\mathcal{P}caligraphic_P such that black edges are preserved, but those in red may lead to changes in the distribution. For the anti-causal task in Figure 1(a), we can hence write 𝒫={P(Y,Z,X)=P(Y)P(Z|Y)P(XZ|Y)P(XY|Z)}𝒫superscript𝑃𝑌𝑍𝑋𝑃𝑌superscript𝑃conditional𝑍𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑌𝑍\mathcal{P}=\{P^{\prime}(Y,Z,X)=P(Y)P^{\prime}(Z\,|\,Y)P(X^{\perp}_{Z}\,|\,Y)P% (X^{\perp}_{Y}\,|\,Z)\}caligraphic_P = { italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y , italic_Z , italic_X ) = italic_P ( italic_Y ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | italic_Z ) } in which P(Z|Y)superscript𝑃conditional𝑍𝑌P^{\prime}(Z\,|\,Y)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) represents any distribution but all other causal mechanisms are fixed [47], which corresponds to a correlation shift.

3 Can we predict when data balancing fails?

As reported previously, data balancing can display failure modes, e.g. due to the presence of other confounders [75, 2], finite sampling effects [47] or a dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z when conditioning on X𝑋Xitalic_X (Y\centernotZ|Xperpendicular-to𝑌\centernotconditional𝑍𝑋Y\centernot{\perp}Z\,|\,Xitalic_Y ⟂ italic_Z | italic_X) [57]. However, this list is non-exhaustive and, to the best of our knowledge, there is no unifying study of those failure modes or of how they could be mitigated. In this section, we perform joint data balancing on different tasks to illustrate that successes and failures of this approach can be difficult to predict. For details of the experiments, see Appendix D.

Let’s first consider semi-synthetic examples generated from the graphs in Figure 1(a,b), i.e. an anti-causal and causal task with a purely spurious correlation. We aim to obtain a risk-invariant and optimal model on these tasks by training on the jointly balanced distribution Q𝑄Qitalic_Q.

Anti-causal task: number detection in MNIST. Inspired by Brown et al. [8], we modify MNIST images [44, 17] by adding a factor of variation Z𝑍Zitalic_Z such that the top of the image is replaced by red noise for Z=0𝑍0Z=0italic_Z = 0 and blue noise for Z=1𝑍1Z=1italic_Z = 1 (Figure 2). We sample a dataset in which the factor of variation and label are dependent (P(Y=0|Z=0)=0.95𝑃𝑌conditional0𝑍00.95P(Y=0\,|\,Z=0)=0.95italic_P ( italic_Y = 0 | italic_Z = 0 ) = 0.95, P(Y=1|Z=0)=0.10𝑃𝑌conditional1𝑍00.10P(Y=1\,|\,Z=0)=0.10italic_P ( italic_Y = 1 | italic_Z = 0 ) = 0.10, called the ‘confounded’ data), a jointly balanced dataset, and a dataset from a distribution P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT in which the undesired dependency is absent (P0(Z=0|Y)=0.5superscript𝑃0𝑍conditional0𝑌0.5P^{0}(Z=0\,|\,Y)=0.5italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_Z = 0 | italic_Y ) = 0.5). We train convolutional networks to predict whether the number in an image is smaller or larger than 5, assessing the models on their training distribution and on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

Models trained with confounded data (95/10) display biased outputs (Table 1), with low worst group performance and high equalized odds. Performance on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is also lower compared to that on P𝑃Pitalic_P (0.937±0.002plus-or-minus0.9370.0020.937\pm 0.0020.937 ± 0.002), showing that these models are not risk-invariant w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P. Models trained from balanced data obtain high overall performance and worst group accuracy, as well as low equalized odds. In addition, we were not able to decode Z𝑍Zitalic_Z from the model representation ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ), suggesting that the model has not learned XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT.222This result is interesting as an addition across the channels of the raw image allows to discriminate red from blue samples, and colors can easily be discriminated from a model trained to predict Z𝑍Zitalic_Z from scratch (accuracy=100%). We therefore show that the model is not performing any ‘incidental’ learning of XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Our results suggest that data balancing led to a fair/robust and optimal model.

Causal task: helpfulness of reviews with Amazon reviews [52]. Inspired by Veitch et al. [73], we refer to the causal task of predicting the helpfulness rating of an Amazon review (thumbs up or down, Y𝑌Yitalic_Y) from its text (X𝑋Xitalic_X). We add a synthetic factor of variation Z𝑍Zitalic_Z such that words like ‘the’ or ‘my’ are replaced by ‘thexxxx’ and ‘myxxxx’ (Z=0𝑍0Z=0italic_Z = 0) or ‘theyyyy’ and ‘myyyyy’ (Z=1𝑍1Z=1italic_Z = 1). We train a BERT [34] model on a class-balanced version of the data for reference (due to high class imbalance), and compare to a model trained on jointly balanced data, both evaluated on their training distribution and on a distribution P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT with no association.

In this case, jointly balancing improves fairness and risk-invariance, with the model’s performance on the training distribution (acc.: 0.574±0.016plus-or-minus0.5740.0160.574\pm 0.0160.574 ± 0.016) being similar to that on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (Table 1). This however comes at a high performance cost when compared to the class balanced model’s performance on P𝑃Pitalic_P (acc: 0.658±0.015plus-or-minus0.6580.0150.658\pm 0.0150.658 ± 0.015). Therefore, data balancing might not to lead to optimality for this causal task.

Refer to caption
Figure 2: MNIST data samples.
Table 1: Model performance on semi-synthetic data, for the tasks in Figure 1. ‘Acc’ refers to accuracy, ‘Worst Grp’ to worst group accuracy, ‘Encoding’ to confounder encoding as measured by transfer learning and ‘Equ. Odds’ refers to equalized odds between Z𝑍Zitalic_Z subgroups. \uparrow (resp. \downarrow) means the higher (resp. lower), the better.
Task Dataset Acc. (\uparrow) Worst Grp (\uparrow) Encoding (0.5similar-toabsent0.5\sim 0.5∼ 0.5) Equ. Odds (\downarrow)
Anti-causal (a) 95/10 0.717±0.027plus-or-minus0.7170.0270.717\pm 0.0270.717 ± 0.027 0.380±0.062plus-or-minus0.3800.0620.380\pm 0.0620.380 ± 0.062 0.996±0.004plus-or-minus0.9960.0040.996\pm 0.0040.996 ± 0.004 0.539±0.015plus-or-minus0.5390.0150.539\pm 0.0150.539 ± 0.015
Anti-causal (a) Balanced 0.880±0.006plus-or-minus0.8800.0060.880\pm 0.0060.880 ± 0.006 0.836±0.075plus-or-minus0.8360.0750.836\pm 0.0750.836 ± 0.075 0.486±0.005plus-or-minus0.4860.0050.486\pm 0.0050.486 ± 0.005 0.018±0.008plus-or-minus0.0180.0080.018\pm 0.0080.018 ± 0.008
Causal (b) Class bal. 0.558±0.015plus-or-minus0.5580.0150.558\pm 0.0150.558 ± 0.015 0.092±0.015plus-or-minus0.0920.0150.092\pm 0.0150.092 ± 0.015 0.690±0.113plus-or-minus0.6900.1130.690\pm 0.1130.690 ± 0.113 0.0.542±0.098plus-or-minus0.0.5420.0980.0.542\pm 0.0980.0.542 ± 0.098
Causal (b) Jointly bal. 0.583±0.017plus-or-minus0.5830.0170.583\pm 0.0170.583 ± 0.017 0.399±0.014plus-or-minus0.3990.0140.399\pm 0.0140.399 ± 0.014 0.545±0.037plus-or-minus0.5450.0370.545\pm 0.0370.545 ± 0.037 0.060±0.046plus-or-minus0.0600.0460.060\pm 0.0460.060 ± 0.046
Anti-causal (c) With V𝑉Vitalic_V 0.769±0.008plus-or-minus0.7690.0080.769\pm 0.0080.769 ± 0.008 0.555±0.031plus-or-minus0.5550.0310.555\pm 0.0310.555 ± 0.031 0.665±0.134plus-or-minus0.6650.1340.665\pm 0.1340.665 ± 0.134 0.094±0.035plus-or-minus0.0940.0350.094\pm 0.0350.094 ± 0.035
Anti-causal (d) Entangled 0.672±0.004plus-or-minus0.6720.0040.672\pm 0.0040.672 ± 0.004 0.000±0.001plus-or-minus0.0000.0010.000\pm 0.0010.000 ± 0.001 0.881±0.223plus-or-minus0.8810.2230.881\pm 0.2230.881 ± 0.223 0.554±0.028plus-or-minus0.5540.0280.554\pm 0.0280.554 ± 0.028

Using the same framework, we can replicate the failure modes due to another confounder described in Wang et al. [75], Alabdulmohsin et al. [2] as well as that from Puli et al. [57].

Anti-causal task with another factor of variation V𝑉Vitalic_V. It is common for multiple auxiliary factors to influence the data generating process, and they tend to correlate with each other [e.g. 21]. To emulate this case, we introduce more unobserved variables U2,U3subscript𝑈2subscript𝑈3U_{2},U_{3}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT as well as a factor of variation V𝑉Vitalic_V which affects the data through XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT (Figure 1(c)).333XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and its dependencies to (X,Y,Z)𝑋𝑌𝑍(X,Y,Z)( italic_X , italic_Y , italic_Z ) were selected to describe an example without entangled data, but the results hold for XVXYsubscript𝑋𝑉subscriptsuperscript𝑋perpendicular-to𝑌X_{V}\subset X^{\perp}_{Y}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ⊂ italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. We modify the MNIST data generation to include XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT depicted by a green cross on the top left or top right of the image and jointly balance the data on (Y,Z)𝑌𝑍(Y,Z)( italic_Y , italic_Z ) before training the model. We evaluate the obtained predictor on a distribution where V𝑉Vitalic_V and Z𝑍Zitalic_Z are not correlated with Y𝑌Yitalic_Y and observe (Table 1) a large gap between worst group accuracy and overall performance, as well as non-null equalized odds. These results suggest that the model is not fair or robust, and also displays a decrease in performance compared to the model trained on data without XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT.

Anti-causal task with entangled data. We map the work in Puli et al. [57] to our decomposition of X𝑋Xitalic_X and propose the example graph in Figure 1(d) where XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT represents an entangled function of X𝑋Xitalic_X. To match this data generating process, the color of the noise in MNIST samples is defined by OR(Y,Z)OR𝑌𝑍\textsc{OR}(Y,Z)OR ( italic_Y , italic_Z ) and the evaluation distribution is the disentangled P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT with no dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. Once again, the obtained model is not fair, robust or optimal (Table 1). Appendix A.2 discusses this case further.

Motivated by these examples of both success and failures, we define necessary and sufficient conditions for the success of data balancing, and highlight when the cases above fail to meet these conditions.

4 Conditions for data balancing to produce an invariant and optimal model

In this section, we introduce necessary and sufficient conditions that, taken together, lead to a risk-invariant and optimal prediction model f𝑓fitalic_f after training on Q𝑄Qitalic_Q (proofs in Appendix B.1). In Appendix B.2, we derive similar conditions for fairness criteria. Throughout the rest of the paper, we use an underscore to indicate under which of P𝑃Pitalic_P or Q𝑄Qitalic_Q a statistical independence holds, e.g. YPZsubscriptperpendicular-toabsentperpendicular-to𝑃𝑌𝑍Y\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Zitalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z to indicate P(Y|Z)=P(Y)𝑃conditional𝑌𝑍𝑃𝑌P(Y\,|\,Z)=P(Y)italic_P ( italic_Y | italic_Z ) = italic_P ( italic_Y ).

We consider the criterion of risk-invariance (Definition 2.2) under correlation shift, i.e. 𝒫={P(X,Y,X)=P(X|Y,Z)P(Z|Y)P(Y)}𝒫superscript𝑃𝑋𝑌𝑋𝑃conditional𝑋𝑌𝑍superscript𝑃conditional𝑍𝑌𝑃𝑌\mathcal{P}=\{P^{\prime}(X,Y,X)=P(X|Y,Z)P^{\prime}(Z|Y)P(Y)\}caligraphic_P = { italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y , italic_X ) = italic_P ( italic_X | italic_Y , italic_Z ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) italic_P ( italic_Y ) }. According to our decomposition of X𝑋Xitalic_X, the risk-minimizing function f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] should only be a function of XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and not of XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT or XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT. To achieve this result with data balancing, we build on a prior result by Makar et al. [47], which shows that a model trained on a balanced distribution only depends on XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT represents a sufficient statistic for Y𝑌Yitalic_Y, i.e. no other part of X𝑋Xitalic_X influences Y𝑌Yitalic_Y.

Definition 4.1.

(Sufficient Statistic) We say that XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q if 𝔼Q[Y|X]=𝔼Q[Y|XZ]subscript𝔼𝑄conditional𝑌𝑋subscript𝔼𝑄conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X^{% \perp}_{Z}]blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] = blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ].

Definition 4.1 implies that the risk-minimizing function f𝑓fitalic_f for Q𝑄Qitalic_Q does not vary with XY,XYZsubscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍X^{\perp}_{Y},X_{Y\wedge Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT. However, this condition is not sufficient on its own to ensure that f𝑓fitalic_f is risk-invariant w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P, as XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT or Y𝑌Yitalic_Y may have non-causal relationships with Z𝑍Zitalic_Z. To ensure optimality and risk-invariance w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P, we derive the sufficient condition in Proposition 4.2.

Proposition 4.2.

If XZQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y and XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q, then the risk-minimizer f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] is risk-invariant and optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P.

The conditions of Proposition 4.2 concern Q𝑄Qitalic_Q. However, it would be of interest to express them in P𝑃Pitalic_P if it is possible to observe all covariates (e.g. in the case of tabular data). Based on our expression for Q𝑄Qitalic_Q, we can derive sufficient conditions on P𝑃Pitalic_P, expressed in Corollary 4.3. Let’s denote {XY,XYZ}subscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍\{X^{\perp}_{Y},X_{Y\wedge Z}\}{ italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT } by R𝑅Ritalic_R.

Corollary 4.3.

If RP{Y,XZ}|Zsubscriptperpendicular-toabsentperpendicular-to𝑃𝑅conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑍R\mathrel{\perp\mspace{-10.0mu}\perp}_{P}\{Y,X^{\perp}_{Z}\}\,|\,Zitalic_R start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT { italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT } | italic_Z and XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y, then the risk-minimizer f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] is risk-invariant and optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P.

In general, we can expect that anti-causal tasks with purely spurious correlations will satisfy these conditions, as per their definition. However, this would not be the case for most causal tasks as XZ\centernotPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍\centernotconditional𝑍𝑌X^{\perp}_{Z}\centernot{\mathrel{\perp\mspace{-10.0mu}\perp}}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y. This result is in line with our findings in Section 3, as the MNIST data generated from the graph in Figure 1(a) validates Corollary 4.3, but the Amazon reviews data generated from Figure 1(b) does not.

It may be less obvious, but the conditions for a sufficient statistic are not met in Figures 1(c,d) as XV\centernotP{Y,XZ}|Zsubscriptperpendicular-toabsentperpendicular-to𝑃subscript𝑋𝑉\centernotconditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑍X_{V}\centernot{\mathrel{\perp\mspace{-10.0mu}\perp}}_{P}\{Y,X^{\perp}_{Z}\}\,% |\,Zitalic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT { italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT } | italic_Z in the case of another factor of variation V𝑉Vitalic_V, and XYZ\centernotP{Y,XZ}|Zsubscriptperpendicular-toabsentperpendicular-to𝑃subscript𝑋𝑌𝑍\centernotconditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑍X_{Y\wedge Z}\centernot{\mathrel{\perp\mspace{-10.0mu}\perp}}_{P}\{Y,X^{\perp}% _{Z}\}\,|\,Zitalic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT { italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT } | italic_Z in the case of entangled data. We hence see that when a causal graph of the application is available, Corollary 4.3 can provide indicators on when data balancing might succeed or fail.

While Proposition 4.2 and its corollary provide conditions on the data generating process, prior work [e.g. 10, 31] has demonstrated that the learning strategy of f𝑓fitalic_f also influences the model’s fairness and robustness characteristics. As data balancing on its own does not control the learning strategy, we need to define conditions on f𝑓fitalic_f to ensure risk-invariance and optimality. To this end, we assume that the penultimate representation ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ) can be decomposed into ϕZ(X)subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋\phi^{\perp}_{Z}(X)italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ), ϕY(X)subscriptsuperscriptitalic-ϕperpendicular-to𝑌𝑋\phi^{\perp}_{Y}(X)italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_X ) and ϕYZ(X)subscriptitalic-ϕ𝑌𝑍𝑋\phi_{Y\wedge Z}(X)italic_ϕ start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT ( italic_X ) such that ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ) is disentangled, i.e. 𝔼P[Y|ϕZ(X)]=𝔼P[Y|XZ]P𝒫superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍for-allsuperscript𝑃𝒫\mathbb{E}_{P}^{\prime}[Y\,|\,\phi^{\perp}_{Z}(X)]=\mathbb{E}_{P}^{\prime}[Y\,% |\,X^{\perp}_{Z}]\forall P^{\prime}\in\mathcal{P}blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ] = blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ] ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P. We can define the following condition for risk-invariance and optimality of f𝑓fitalic_f where f𝑓fitalic_f is a linear transformation of ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ).

Proposition 4.4 (Disentangled representation).

Let ϕ()italic-ϕ\phi(\cdot)italic_ϕ ( ⋅ ) be disentangled with 𝔼P[Y|ϕZ(X)]=𝔼P[Y|XZ]P𝒫superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍for-allsuperscript𝑃𝒫\mathbb{E}_{P}^{\prime}[Y\,|\,\phi^{\perp}_{Z}(X)]=\mathbb{E}_{P}^{\prime}[Y\,% |\,X^{\perp}_{Z}]\forall P^{\prime}\in\mathcal{P}blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ] = blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ] ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P and hhitalic_h be a linear function. The risk-minimizer f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] is optimal and risk-invariant w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P if 𝔼P[Y|f(X)]=𝔼P[Y|h(ϕZ(X))]P𝒫subscript𝔼superscript𝑃conditional𝑌𝑓𝑋subscript𝔼superscript𝑃conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋for-allsuperscript𝑃𝒫\operatorname{\mathbb{E}}_{P^{\prime}}[Y\,|\,f(X)]=\operatorname{\mathbb{E}}_{% P^{\prime}}[Y\,|\,h(\phi^{\perp}_{Z}(X))]\forall P^{\prime}\in\mathcal{P}blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_f ( italic_X ) ] = blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_h ( italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ) ] ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q and XZQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y.

In Proposition 4.4, we require that the representation ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ) does not ’loose’ information about XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT or mixes it with information from Z𝑍Zitalic_Z. We note that such a representation can be obtained even if the data is entangled, e.g. by drop** modes of variation during training. Unlike other strategies [4, 47, 57], data balancing cannot enforce this property on its own and a disentangled representation is considered as necessary. This condition hence suggests another failure mode of data balancing when the conditions on the data are validated, but the representation is of low quality. We believe this failure mode is displayed in Kirichenko et al. [40], as the success of their data balancing mitigation only holds when using models pre-trained on large datasets.

In this section, we have identified conditions for data balancing to be successful. In the next section, we go one step further to understand how data balancing impacts the data generating process, and how it interacts with other mitigation strategies for undesired dependencies, focusing on regularization.

5 Impact of data balancing on the CBN

Joint data balancing is assumed to remove statistical dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z while kee** other relationships in the CBN of the task unaffected [e.g. 47, 76, 14]. This could be interpreted as ‘drop**’ edges in the undesired paths in 𝒢𝒢\mathcal{G}caligraphic_G, e.g. removing the influence of U𝑈Uitalic_U on Y𝑌Yitalic_Y and/or Z𝑍Zitalic_Z in Figure 1(a), leading to a new graph 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. While this interpretation is correct for joint balancing in the case of Figure 1(a), Proposition 5.1 below (proof in Appendix C) shows that it can be erroneous in general: the distribution Q𝑄Qitalic_Q underlying the balanced data might not factorize according to 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and therefore might not obey the statistical dependence relationships implied by 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Therefore, balancing data to make Z𝑍Zitalic_Z and Y𝑌Yitalic_Y statistically independent, i.e. selecting samples in proportion to P(Z)P(Y)/P(Z,Y)𝑃𝑍𝑃𝑌𝑃𝑍𝑌P(Z)P(Y)/P(Z,Y)italic_P ( italic_Z ) italic_P ( italic_Y ) / italic_P ( italic_Z , italic_Y ), is not equivalent to generating data from a distribution that factorises according to 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT in general. This factorization is important because downstream distributions P(X,Y,Z)superscript𝑃𝑋𝑌𝑍P^{\prime}(X,Y,Z)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y , italic_Z ) are often assumed to follow this factorization; in fact, this assumption underlies a number recommendations for applying regularization methodologies such as in [73].

Proposition 5.1.

Let 𝒢,P𝒢𝑃\langle\mathcal{G},P\rangle⟨ caligraphic_G , italic_P ⟩ be the CBN underlying the data, where 𝒢𝒢\mathcal{G}caligraphic_G contains an undesired path between Z𝑍Zitalic_Z and Y𝑌Yitalic_Y, and let 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT be a modification of 𝒢𝒢\mathcal{G}caligraphic_G in which the undesired path has been removed. The distribution Q𝑄Qitalic_Q obtained by jointly balancing the data need not factorize according to 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

Proposition 5.1 shows that statistical (in)dependencies that we assumed would remain fixed (i.e. the black edges on the graph) can be modified by the process of joint balancing. As a consequence, further interventions on Q𝑄Qitalic_Q (e.g. the addition of a regularizer) should not be motivated by 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, and we show below that combining data balancing with other mitigation strategies can lead to unexpected results.

5.1 Data balancing can hinder regularization and vice-versa

Refer to caption
Figure 3: Accuracy across different values of the MMD hyper-parameter for models trained on balanced data and evaluated on their respective training distribution (dashed) and P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (solid line) averaged across replicates. We consider anti-causal tasks: (left) purely spurious case, (middle) when another confounder V𝑉Vitalic_V is present, and (right) the entangled dataset. Worst group performance on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is displayed in red. Markers display individual replicates.

When confronted with a failure mode, it is reasonable to ask whether an additional fairness or robustness regularizer might be beneficial. Based on Proposition 5.1, we see that this question might have a different answer if we are in P𝑃Pitalic_P or in Q𝑄Qitalic_Q. Below, we consider each failure mode and ask whether performing an additional regularization motivated by the literature would mitigate the undesired dependencies in Q𝑄Qitalic_Q. In Appendix C.1.2, we discuss when balancing with regularization is sufficient for different fairness criteria.

Anti-causal task. In the case of an anti-causal task with a dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z (Figures 1(a,c,d)), Veitch et al. [73] recommend to impose an independence between f(X)𝑓𝑋f(X)italic_f ( italic_X ) and Z𝑍Zitalic_Z conditioned on Y𝑌Yitalic_Y. If we consider both the purely spurious correlation and the entangled case, we see that regularization and data balancing would have the same effects of blocking any dependence between {Y,XZ}𝑌subscriptsuperscript𝑋perpendicular-to𝑍\{Y,X^{\perp}_{Z}\}{ italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT } and {Z,XY,XYZ}𝑍subscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍\{Z,X^{\perp}_{Y},X_{Y\wedge Z}\}{ italic_Z , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT }. We demonstrate that XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y in both P𝑃Pitalic_P and Q𝑄Qitalic_Q (see Appendix C.1), and this regularization is sensible under both distributions. This means that performing the regularization provides the sufficient conditions for a risk-invariant model, whether or not joint data balancing is performed. In theory, data balancing is not needed but is also not harmful. In the case of an added confounder, we have that XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT depends on both Y𝑌Yitalic_Y and Z𝑍Zitalic_Z due to non-causal paths through V𝑉Vitalic_V. Therefore, imposing that f(X)QZYsubscriptperpendicular-toabsentperpendicular-to𝑄𝑓𝑋conditional𝑍𝑌f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\mid Yitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z ∣ italic_Y might lead to results whereby the model only depends on V𝑉Vitalic_V or is trivial (e.g. predicts a constant) as the regularization encourages the removal of any dependence on Z𝑍Zitalic_Z, which is related to Y𝑌Yitalic_Y via XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT. This behavior would be observed in both P𝑃Pitalic_P and Q𝑄Qitalic_Q, but data balancing on its own might be less detrimental than regularization in terms of predictive power even though it does not resolve all undesired dependencies. In this case, regularization hinders data balancing.

Based on the balanced data from Section 3, we add a conditional Maximum Mean Discrepancy [MMD, 28] to encourage f(X)QZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑓𝑋conditional𝑍𝑌f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y during training, varying the strength of this regularizer via a hyper-parameter. In the case of the purely spurious statistical dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z (Figure 1(a)), there is little variation between the metrics across MMD strengths, and the model is fair and robust (Figure 3(left)). In the entangled case (Figure 3(right)), the model’s performance on Q𝑄Qitalic_Q and P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT are close for medium values of the hyper-parameter (before MMD overpowers the training) and worst group performance improves markedly. This result suggests that, with the added regularizer, f𝑓fitalic_f only varies with XZ)X^{\perp}_{Z})italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ). Performing the same regularization in the presence of another confounder (Figure 3(middle)) leads to a plateau in performance on Q𝑄Qitalic_Q, but low performance on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and chance-level worst group performance. In this case, we posit that the model relies exclusively on XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT for its predictions, and the regularizer is detrimental compared to data balancing on its own (MMD=0 on the plot).

Causal task. Finally, let us consider the causal task in Figure 1(b). In a similar case, Veitch et al. [73] suggests a regularizer such that f(X)PZsubscriptperpendicular-toabsentperpendicular-to𝑃𝑓𝑋𝑍f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Zitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z, which would encourage the model f(X)𝑓𝑋f(X)italic_f ( italic_X ) to vary only with XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT as XZPZsubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍𝑍X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Zitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z. However, data balancing induces a dependence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z, as expressed below:

Q(XZ|Z)=XY,YP(XZ,XY|Z,Y)P(Z)P(Y)XY,XZ,YP(XZ,XY|Z,Y)P(Z)P(Y)=YP(XZ|Z,Y)P(Y),𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍subscriptsubscriptsuperscript𝑋perpendicular-to𝑌𝑌𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditionalsubscriptsuperscript𝑋perpendicular-to𝑌𝑍𝑌𝑃𝑍𝑃𝑌subscriptsubscriptsuperscript𝑋perpendicular-to𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditionalsubscriptsuperscript𝑋perpendicular-to𝑌𝑍𝑌𝑃𝑍𝑃𝑌subscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌𝑃𝑌\displaystyle\begin{aligned} Q(X^{\perp}_{Z}\,|\,Z)=\frac{\sum_{X^{\perp}_{Y},% Y}P(X^{\perp}_{Z},X^{\perp}_{Y}\,|\,Z,Y)P(Z)P(Y)}{\sum_{X^{\perp}_{Y},X^{\perp% }_{Z},Y}P(X^{\perp}_{Z},X^{\perp}_{Y}\,|\,Z,Y)P(Z)P(Y)}=\sum_{Y}P(X^{\perp}_{Z% }\,|\,Z,Y)P(Y)\end{aligned},start_ROW start_CELL italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_P ( italic_Y ) end_CELL end_ROW ,

The RHS cannot be simplified further because XZPZYconditionalsubscriptsuperscript𝑋perpendicular-to𝑍subscriptperpendicular-toabsentperpendicular-to𝑃𝑍𝑌X^{\perp}_{Z}\not\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\mid Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT not start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z ∣ italic_Y, because Y𝑌Yitalic_Y is a collider under P𝑃Pitalic_P. Thus, the left hand side is a function of Z𝑍Zitalic_Z in general (see Appendix C.1 for further details and a numerical simulation). In this case, regularizing to enforce f(X)QZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑓𝑋𝑍f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z would destroy information in XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT, whereas the same regularization under P𝑃Pitalic_P would have enabled f(X)𝑓𝑋f(X)italic_f ( italic_X ) to use all of the information in XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT. Therefore, data balancing may hinder regularization.

We illustrate this result on the Amazon reviews dataset from Section 3 by imposing a marginal MMD regularization f(X)Zperpendicular-toabsentperpendicular-to𝑓𝑋𝑍f(X)\mathrel{\perp\mspace{-10.0mu}\perp}Zitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP italic_Z during training and evaluating risk-invariance across multiple P𝒫superscript𝑃𝒫P^{\prime}\in\mathcal{P}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P. When training on P𝑃Pitalic_P, we observe that the regularization allows to ’flatten’ the curve, such that from medium to high values of MMD regularization, the model is risk-invariant (Figure 4(a)). On the jointly balanced data, medium values of the regularization degrade risk-invariance (see green curves on Figure 4(b)). Overall, model performance is also lower for the models trained on Q𝑄Qitalic_Q compared to models trained on P𝑃Pitalic_P across test sets from P𝒫superscript𝑃𝒫P^{\prime}\in\mathcal{P}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, at similar levels of regularization (see Figure 4(c) for MMD=16). This result displays that XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is not a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q.

(a) Trained on P𝑃Pitalic_P

(b) Trained on Q𝑄Qitalic_Q

(c) MMD=16

Refer to caption
Refer to caption
Refer to caption
Figure 4: Accuracy across different values of the confounder strength (i.e. different P𝒫superscript𝑃𝒫P^{\prime}\in\mathcal{P}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P), for each value of MMD regularization considered (displayed by the color gradient). (a) Models trained on P𝑃Pitalic_P. (b) Models trained on Q𝑄Qitalic_Q. Results are averaged across seeds for clarity. Notice the different y-scales. (c) Displays the mean and standard deviation across seeds for MMD=16.

6 Case study: distinguishing between failure modes in CelebA

In this section, we show that when Y𝑌Yitalic_Y and Z𝑍Zitalic_Z are available at training time, we can try to distinguish between failure modes of data balancing by using our different observations, even in the absence of a full causal graph. We illustrate this using the benchmark task of detecting blond hair in pictures of celebrities in the CelebA [45] dataset. This label has a strong correlation with perceived gender: half of the non-males have blond hair, while only 7%similar-toabsentpercent7\sim 7\%∼ 7 % of males do. We consider a balanced, subsampled dataset (train: n=4,096𝑛4096n=4,096italic_n = 4 , 096, test/valid: n=400𝑛400n=400italic_n = 400)444Please note that these results were also replicated with a resampled dataset with n=30,000𝑛30000n=30,000italic_n = 30 , 000 for training. and the original, confounded dataset. We train a VGG [67] and four Vision Transformer [ViT, 18] architectures, with number of parameters ranging from 17 to 690 millions.

We observe that, while training with balanced data leads to higher worst group accuracy and lower equalized odds scores than training with the historical data (Table 2), an important gap remains between the overall and worst group performances. These results show that data balancing leads to improvements in downstream fairness and robustness metrics, but does not provide a risk-invariant or fair model on its own. Therefore, it is likely that one of the conditions for data balancing to be sufficient is not fulfilled and understanding which condition is violated can guide our selection of another technique.

Distinguishing between failure modes. We first assume that the task is anti-causal. We then aim to understand whether there is another confounder, the data is entangled, or the representation is entangled (Proposition 4.4). As per Kirichenko et al. [40], we first attempt to improve our representation by pre-training the VGG with ImageNet [16]. While we observe an increase in performance with pre-training, there is no clear decrease in equalized odds. This result suggests that the failure may lie elsewhere. We then train models with MMD on P𝑃Pitalic_P, with the expectation that we would observe a plateau for entangled data when the model learns f(XZ)𝑓subscriptsuperscript𝑋perpendicular-to𝑍f(X^{\perp}_{Z})italic_f ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ), or a stark decrease in worst group performance in the presence of another confounder. While there is no major pattern of correlation between Y𝑌Yitalic_Y and another attribute in the balanced data (see Appendix E.2.2), small effects might combine, or there might be other, unobserved attributes that influence Y𝑌Yitalic_Y. For a medium value of the regularization hyper-parameter, the model displays a plateau in performance and poor worst group performance. This result suggests an effect of another confounder and next steps can include methods such as Alabdulmohsin et al. [2], which controls for all (observed) auxiliary factors of variation.

Table 2: VGG model performance on CelebA, when trained from the original distribution P𝑃Pitalic_P, the balanced data Q𝑄Qitalic_Q, with ImageNet pre-training (‘Pre-trained’), with MMD (‘MMD’) on P𝑃Pitalic_P with regularizer=5. All models are evaluated on Q𝑄Qitalic_Q.
Model Acc. (\uparrow) Worst Grp (\uparrow) Encoding (0.5similar-toabsent0.5\sim 0.5∼ 0.5) Equ. Odds (\downarrow)
Original 0.791±0.037plus-or-minus0.7910.0370.791\pm 0.0370.791 ± 0.037 0.314±0.093plus-or-minus0.3140.0930.314\pm 0.0930.314 ± 0.093 0.868±0.015plus-or-minus0.8680.0150.868\pm 0.0150.868 ± 0.015 0.243±0.036plus-or-minus0.2430.0360.243\pm 0.0360.243 ± 0.036
Balanced 0.839±0.022plus-or-minus0.8390.0220.839\pm 0.0220.839 ± 0.022 0.674±0.088plus-or-minus0.6740.0880.674\pm 0.0880.674 ± 0.088 0.709±0.066plus-or-minus0.7090.0660.709\pm 0.0660.709 ± 0.066 0.125±0.022plus-or-minus0.1250.0220.125\pm 0.0220.125 ± 0.022
Pre-trained 0.874±0.006plus-or-minus0.8740.0060.874\pm 0.0060.874 ± 0.006 0.726±0.037plus-or-minus0.7260.0370.726\pm 0.0370.726 ± 0.037 0.740±0.033plus-or-minus0.7400.0330.740\pm 0.0330.740 ± 0.033 0.111±0.010plus-or-minus0.1110.0100.111\pm 0.0100.111 ± 0.010
MMD on P𝑃Pitalic_P 0.813±0.036plus-or-minus0.8130.0360.813\pm 0.0360.813 ± 0.036 0.146±0.172plus-or-minus0.1460.1720.146\pm 0.1720.146 ± 0.172 0.630±0.010plus-or-minus0.6300.0100.630\pm 0.0100.630 ± 0.010 0.001±0.002plus-or-minus0.0010.0020.001\pm 0.0020.001 ± 0.002
Refer to caption
Figure 5: Model performance on test sets sampled from P𝑃Pitalic_P (dotted) and Q𝑄Qitalic_Q (dashed). The model is trained on P𝑃Pitalic_P with regularization f(X)ZYperpendicular-toabsentperpendicular-to𝑓𝑋conditional𝑍𝑌f(X)\mathrel{\perp\mspace{-10.0mu}\perp}Z\mid Yitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP italic_Z ∣ italic_Y.

7 Related works

Balanced data as mitigation for invariant models. Our results extend those of Makar et al. [47] which considered a single causal graph. Wang et al. [75] displayed that balancing data did not lead to a reduction in bias amplification. The authors posit that this failure of balanced data to correct for spurious signals is due to unobserved confounding factors which is confirmed in Alabdulmohsin et al. [2]. Rolf et al. [62] investigated upsampling by relying on a scaling law per group, focusing on the question of fairness vs performance trade-off [22]. Focusing on causal NLP settings, Joshi et al. [36] investigated causal and non-causal features, concluding that data balancing does not help in all cases. Closer to our work is that of Puli et al. [57], in which the authors showed that having YQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌𝑍Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z does not imply that YQZ|Xsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌conditional𝑍𝑋Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Xitalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_X and the model can learn signals related to Z𝑍Zitalic_Z. Puli et al. [57] propose a method to learn a representation r𝑟ritalic_r such that YZ|r(X)perpendicular-toabsentperpendicular-to𝑌conditional𝑍𝑟𝑋Y\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,r(X)italic_Y start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_r ( italic_X ). Our work provides a framework to understand these different failure modes and proposes strategies to distinguish between them. While we focus on pre-processing mitigation with a fixed distribution Q(X,Y,Z)𝑄𝑋𝑌𝑍Q(X,Y,Z)italic_Q ( italic_X , italic_Y , italic_Z ), another line of work considers dynamic resampling in-processing [e.g. 35, 60, 12]. As the resampling converges towards a fixed distribution P(Z|Y)𝑃conditional𝑍𝑌P\textquoteright(Z|Y)italic_P ’ ( italic_Z | italic_Y ), we would expect failure modes in the presence of entangled data or of another confounder. Nevertheless, the variation in P(Z|Y)𝑃conditional𝑍𝑌P\textquoteright(Z|Y)italic_P ’ ( italic_Z | italic_Y ) at the early stages of training might be beneficial, e.g. by disentangling the representation. We leave this investigation for future work.

Causal feature selection. Some works have used a causal framing to select features such that f(X)𝑓𝑋f(X)italic_f ( italic_X ) has robustness and/or fairness properties [e.g. 46, 70, 68, 25, 66]. Similarly, our work defines independence conditions on covariates to obtain an optimal, invariant model, and can be used to select features. Two major distinctions between feature selection works and ours reside in the fact that we consider the case in which we do not observe XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT explicitly and that we investigate the impact of data balancing.

8 Discussion

In this work, we uncover important results to guide the use of data balancing for mitigating undesired dependencies between covariates, outcomes and auxiliary factors of variation. We first show (Section 3) that joint data balancing might not achieve the desired fairness or robustness criteria, and that the failures may seem difficult to predict. Motivated by these results, we introduce conditions under which data balancing leads to a robust or fair model (Sections 4, B.2). Importantly, we show that data balancing is not equivalent to ‘drop** an edge’ in the causal graph and can lead to distributions that do not factorize according to the desired graph (Section 5). This can have downstream consequences if further mitigation strategies are motivated by the causal graph and highlights why regularization and data balancing might not go ‘hand in hand’. This last result shows that data balancing should not be performed as a ‘default’, and mitigation strategies should be based on the causal graph of the application. Finally, even in the absence of a causal graph, our findings may help to pinpoint which condition(s) are not fulfilled, and guide further mitigation (Section 6).

Limitations. The conditions defined in Section 4 for risk-invariance depend on the expression of 𝒫𝒫\mathcal{P}caligraphic_P as a correlation shift [47, 61]. Other expressions are likely to lead to other conditions. In our experiments, we have mostly subsampled datasets to obtain balanced distributions. We would expect similar results for other joint balancing methods. Variations are, however, possible due to the finite-set nature of the computations [47], e.g. with reweighting displaying more variance [33], potentially under-performing in overparametrized settings [11, 64]. We also note that, while we aimed to provide upper bounds for the effectiveness of data balancing, we did not use additional training strategies for mitigation beyond regularization. We believe that our causal framework can be a useful tool to analyze other pre- or in-processing methods that enforce independence between variables in the data generating process [e.g. 1, 57]. On the other hand, our framework might not be suited to analyze the effects of other mitigation strategies, e.g. hyper-parameter optimization [56].

Future work. This work considered a variety of causal graphs in order to provide general insights rather than task-specific conditions. However, investigating specific graphs could enable to leverage further strategies including other balancing techniques [e.g. 71]. We believe that our causal framing could then be a useful resource to analyze the effect of these strategies on downstream fairness and robustness criteria. Finally, we illustrate our propositions with binary classification tasks and confounders. While our reasoning applies to more complex settings, there might be further considerations to account for when generalizing beyond binary variables, especially with respect to estimation.

Broader impact

Our work investigates a common mitigation strategy for failures of fairness or robustness in machine learning predictive settings. We aim to clearly highlight when data balancing is promising, and when it fails, hence advancing the field of trustworthy machine learning. As with most papers addressing fairness questions, we acknowledge that our mathematical formulations of fairness criteria might not correspond to the desired societal impact, e.g. in terms of equity. Specific considerations for our work include the use of the CelebA [45] dataset, and in particular the ‘is-male’ binary label provided. We acknowledge that a binary characterization of gender is not representative and can be harmful. In addition, it would be desirable to have self-reported instead of perceived gender. Our work considers cases for which auxiliary factors of variation Z𝑍Zitalic_Z are observed at train, test or fine-tuning time. This is a limitation of our investigation, as our insights might not be available when Z𝑍Zitalic_Z is unobserved. This is exemplified by the more difficult case of distinguishing between failure modes without a P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT in the classification of CelebA images.

Acknowledgments and Disclosure of Funding

We thank Virginia Aglietti for feedback on this work and Victor Veitch for sharing experimental code for the Amazon reviews experiments. This work was funded by Google DeepMind.

References

  • Alabdulmohsin & Lučić [2021] Alabdulmohsin, I. and Lučić, M. A near-optimal algorithm for debiasing trained machine learning models. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=H5TBqNFPKSJ.
  • Alabdulmohsin et al. [2024] Alabdulmohsin, I., Wang, X., Steiner, A., Goyal, P., D’Amour, A., and Zhai, X. CLIP the bias: How useful is balancing data in multimodal learning? In International Conference on Learning Representations, 2024.
  • Anthis & Veitch [2023] Anthis, J. R. and Veitch, V. Causal context connects counterfactual fairness to robust prediction and group fairness. In Advances in Neural Information Processing Systems, volume 37, 2023. URL https://openreview.net/forum?id=AmwgBjXqc3.
  • Arjovsky et al. [2019] Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019. Preprint 1907.02893. URL http://arxiv.longhoe.net/abs/1907.02893.
  • Barocas et al. [2023] Barocas, S., Hardt, M., and Narayanan, A. Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023.
  • Ben-Tal et al. [2013] Ben-Tal, A., den Hertog, D., De Waegenaere, A., Melenberg, B., and Rennen, G. Robust solutions of optimization problems affected by uncertain probabilities. Manage. Sci., 59(2):341–357, 2013.
  • Bradbury et al. [2018] Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., and Zhang, Q. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  • Brown et al. [2023] Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., and Schrouff, J. Detecting shortcut learning for fair medical AI using shortcut testing. Nat. Commun., 14(1):4314, 2023.
  • Byrd & Lipton [2019] Byrd, J. and Lipton, Z. What is the effect of importance weighting in deep learning? In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  872–881. PMLR, 2019.
  • Carlini & Wagner [2017] Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
  • Celis et al. [2018] Celis, E., Keswani, V., Straszak, D., Deshpande, A., Kathuria, T., and Vishnoi, N. Fair and diverse DPP-based data summarization. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  716–725. PMLR, 2018. URL https://proceedings.mlr.press/v80/celis18a.html.
  • Chen et al. [2023] Chen, X., Fan, W., Chen, J., Liu, H., Liu, Z., Zhang, Z., and Li, Q. Fairly adaptive negative sampling for recommendations. In Proceedings of the ACM Web Conference 2023, WWW ’23, pp. 3723–3733, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394161. doi: 10.1145/3543507.3583355. URL https://doi.org/10.1145/3543507.3583355.
  • Chiappa [2019] Chiappa, S. Path-Specific counterfactual fairness. AAAI, 33(01):7801–7808, 2019.
  • Compton et al. [2023] Compton, R., Zhang, L., Puli, A., and Ranganath, R. When more is less: Incorporating additional datasets can hurt performance by introducing spurious correlations, 2023. Preprint 2308.04431. URL http://arxiv.longhoe.net/abs/2308.04431.
  • Cowell et al. [2007] Cowell, R. G., Dawid, A. P., Lauritzen, S., and Spiegelhalter, D. J. Probabilistic Networks and Expert Systems, Exact Computational Methods for Bayesian Networks. Springer-Verlag, 2007.
  • Deng et al. [2009] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255. IEEE, 2009.
  • Deng [2012] Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  • Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  • Drenkow et al. [2021] Drenkow, N., Sani, N., Shpitser, I., and Unberath, M. A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2021. Preprint 2112.00639. URL http://arxiv.longhoe.net/abs/2112.00639.
  • Duchi et al. [2016] Duchi, J., Glynn, P., and Namkoong, H. Statistics of robust optimization: A generalized empirical likelihood approach, 2016. Preprint 1610.03425. URL http://arxiv.longhoe.net/abs/1610.03425.
  • Duffy et al. [2022] Duffy, G., Clarke, S. L., Christensen, M., He, B., Yuan, N., Cheng, S., and Ouyang, D. Confounders mediate AI prediction of demographics in medical imaging. NPJ Digit Med, 5(1):188, 2022.
  • Dutta et al. [2020] Dutta, S., Wei, D., Yueksel, H., Chen, P.-Y., Liu, S., and Varshney, K. Is there a trade-off between fairness and accuracy? A perspective using mismatched hypothesis testing. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  2803–2813. PMLR, 2020. URL https://proceedings.mlr.press/v119/dutta20a.html.
  • Dwork et al. [2012] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, pp.  214–226, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450311151. doi: 10.1145/2090236.2090255. URL https://doi.org/10.1145/2090236.2090255.
  • Flores et al. [2016] Flores, A. W., Bechtel, K., and Lowenkamp, C. T. False positives, false negatives, and false analyses: A rejoinder to “machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks.”. Fed. Probat., 80(2), 2016.
  • Galhotra et al. [2022] Galhotra, S., Shanmugam, K., Sattigeri, P., and Varshney, K. R. Causal feature selection for algorithmic fairness. In Proceedings of the 2022 International Conference on Management of Data, SIGMOD ’22, pp.  276–285, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392495. doi: 10.1145/3514221.3517909. URL https://doi.org/10.1145/3514221.3517909.
  • Geirhos et al. [2019] Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., and Brendel, W. Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bygh9j09KX.
  • Gichoya et al. [2022] Gichoya, J. W., Banerjee, I., Bhimireddy, A. R., Burns, J. L., Celi, L. A., Chen, L.-C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.-C., Kuo, P.-C., Lungren, M. P., Palmer, L. J., Price, B. J., Purkayastha, S., Pyrros, A. T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., and Zhang, H. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health, 4(6):e406–e414, 2022.
  • Gretton et al. [2012] Gretton, A., Borgwardt, K. M., Rasch, M. J., and Scholkopf, B. A kernel Two-Sample test. J. Mach. Learn. Res., 13(25):723–773, 2012.
  • Hardt et al. [2016] Hardt, M., Price, E., Price, E., and Srebro, N. Equality of opportunity in supervised learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf.
  • Harris et al. [2020] Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., Del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. Array programming with NumPy. Nature, 585(7825):357–362, 2020.
  • Hooker et al. [2020] Hooker, S., Moorosi, N., Clark, G., Bengio, S., and Denton, E. Characterising bias in compressed models, 2020. Preprint 2010.03058. URL http://arxiv.longhoe.net/abs/2010.03058.
  • Hunter [2007] Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng., 9(3):90–95, 2007.
  • Idrissi et al. [2022] Idrissi, B. Y., Arjovsky, M., Pezeshki, M., and Lopez-Paz, D. Simple data balancing achieves competitive worst-group-accuracy. In Schölkopf, B., Uhler, C., and Zhang, K. (eds.), Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pp.  336–351. PMLR, 2022. URL https://proceedings.mlr.press/v177/idrissi22a.html.
  • J. Devlin & Toutanova [2019] J. Devlin, M.-W. Chang, K. L. and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), volume 1, pp.  2, 2019.
  • Jiang & Nachum [2020] Jiang, H. and Nachum, O. Identifying and correcting label bias in machine learning. In Chiappa, S. and Calandra, R. (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp.  702–712. PMLR, 2020. URL https://proceedings.mlr.press/v108/jiang20a.html.
  • Joshi et al. [2022] Joshi, N., Pan, X., and He, H. Are all spurious features in natural language alike? an analysis through a causal lens. In Empirical Methods in Natural Language Processing (EMNLP), 2022.
  • Kamiran & Calders [2012] Kamiran, F. and Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst., 33(1):1–33, 2012.
  • Kehrenberg et al. [2020] Kehrenberg, T., Chen, Z., and Quadrianto, N. Tuning fairness by balancing target labels. Front Artif Intell, 3:33, 2020.
  • Kim et al. [2023] Kim, D., Park, S., Hwang, S., and Byun, H. Fair classification by loss balancing via fairness-aware batch sampling. Neurocomputing, 518:231–241, 2023.
  • Kirichenko et al. [2022] Kirichenko, P., Izmailov, P., and Wilson, A. G. Last layer re-training is sufficient for robustness to spurious correlations, 2022. Preprint 2204.02937. URL http://arxiv.longhoe.net/abs/2204.02937.
  • Koller & Friedman [2009] Koller, D. and Friedman, N. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
  • Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  • LaBonte et al. [2023] LaBonte, T., Muthukumar, V., and Kumar, A. Towards last-layer retraining for group robustness with fewer annotations, 2023. Preprint 2309.08534. URL http://arxiv.longhoe.net/abs/2309.08534.
  • Lecun et al. [1998] Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
  • Liu et al. [2015] Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015.
  • Magliacane et al. [2018] Magliacane, S., van Ommen, T., Claassen, T., Bongers, S., Versteeg, P., and Mooij, J. M. Domain adaptation by using causal inference to predict invariant conditional distributions. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • Makar et al. [2022] Makar, M., Packer, B., Moldovan, D., Blalock, D., Halpern, Y., and D’Amour, A. Causally motivated shortcut removal using auxiliary labels. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp.  739–766. PMLR, 2022. URL https://proceedings.mlr.press/v151/makar22a.html.
  • Mao et al. [2023] Mao, Y., Deng, Z., Yao, H., Ye, T., Kawaguchi, K., and Zou, J. Last-layer fairness fine-tuning is simple and effective for neural networks, 2023. Preprint 2304.03935. URL http://arxiv.longhoe.net/abs/2304.03935.
  • McKinney [2010] McKinney, W. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference. SciPy, 2010.
  • Mehrabi et al. [2021] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv., 54(6):1–35, 2021.
  • Mooij et al. [2020] Mooij, J. M., Magliacane, S., and Claassen, T. Joint causal inference from multiple contexts. J. Mach. Learn. Res., 21(99):1–108, 2020.
  • Ni et al. [2019] Ni, J., Li, J., and McAuley, J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp.  188–197, 2019.
  • Obermeyer et al. [2019] Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019.
  • Pearl [1988] Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., 1988.
  • Pearl [2000] Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
  • Perrone et al. [2021] Perrone, V., Donini, M., Zafar, M. B., Schmucker, R., Kenthapadi, K., and Archambeau, C. Fair bayesian optimization. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp.  854–863, 2021.
  • Puli et al. [2022] Puli, A. M., Zhang, L. H., Oermann, E. K., and Ranganath, R. Out-of-distribution generalization in the presence of nuisance-induced spurious correlations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=12RoR2o32T.
  • Quinonero-Candela et al. [2022] Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D. (eds.). Dataset shift in machine learning. Neural Information Processing series. MIT Press, London, England, 2022.
  • Rančić et al. [2021] Rančić, S., Radovanović, S., and Delibašić, B. Investigating oversampling techniques for fair machine learning models. In Decision Support Systems XI: Decision Support Systems, Analytics and Technologies in Response to Global Crisis Management, pp. 110–123. Springer International Publishing, 2021.
  • Roh et al. [2021] Roh, Y., Lee, K., Whang, S. E., and Suh, C. Fairbatch: Batch selection for model fairness. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YNnpaAKeCfx.
  • Roh et al. [2023] Roh, Y., Lee, K., Whang, S. E., and Suh, C. Improving fair training under correlation shifts. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  29179–29209. PMLR, 2023. URL https://proceedings.mlr.press/v202/roh23a.html.
  • Rolf et al. [2021] Rolf, E., Worledge, T. T., Recht, B., and Jordan, M. Representation matters: Assessing the importance of subgroup allocations in training data. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  9040–9051. PMLR, 2021. URL https://proceedings.mlr.press/v139/rolf21a.html.
  • Sagawa* et al. [2020] Sagawa*, S., Koh*, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS.
  • Sagawa et al. [2020] Sagawa, S., Raghunathan, A., Koh, P. W., and Liang, P. An investigation of why overparameterization exacerbates spurious correlations. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  8346–8356. PMLR, 2020. URL https://proceedings.mlr.press/v119/sagawa20a.html.
  • Schölkopf et al. [2012] Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In International Conference on Machine Learning, pp. 459–466, 2012.
  • Schrouff et al. [2022] Schrouff, J., Harris, N., Koyejo, S., Alabdulmohsin, I. M., Schnider, E., Opsahl-Ong, K., Brown, A., Roy, S., Mincu, D., Chen, C., Dieng, A., Liu, Y., Natarajan, V., Karthikesalingam, A., Heller, K. A., Chiappa, S., and D’Amour, A. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  19304–19318. Curran Associates, Inc., 2022.
  • Simonyan & Zisserman [2015] Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  • Singh et al. [2021] Singh, H., Singh, R., Mhasawade, V., and Chunara, R. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp.  3–13. Association for Computing Machinery, New York, NY, USA, 2021.
  • Sreekumar & Boddeti [2023] Sreekumar, G. and Boddeti, V. N. Spurious correlations and where to find them, 2023. Preprint 2308.11043. URL http://arxiv.longhoe.net/abs/2308.11043.
  • Subbaswamy & Saria [2018] Subbaswamy, A. and Saria, S. Counterfactual normalization: Proactively addressing dataset shift using causal mechanisms. In 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp.  947–957. Association For Uncertainty in Artificial Intelligence (AUAI), 2018.
  • Sun et al. [2023] Sun, Q., Murphy, K., Ebrahimi, S., and D’Amour, A. Beyond invariance: Test-time label-shift adaptation for distributions with "spurious" correlations, 2023. Preprint 2211.15646. URL http://arxiv.longhoe.net/abs/2211.15646.
  • Touvron et al. [2021] Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jegou, H. Training data-efficient image transformers & distillation through attention. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  10347–10357. PMLR, 2021.
  • Veitch et al. [2021] Veitch, V., D’Amour, A., Yadlowsky, S., and Eisenstein, J. Counterfactual invariance to spurious correlations in text classification. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=BdKxQp0iBi8.
  • Wang & Russakovsky [2023] Wang, A. and Russakovsky, O. Overcoming bias in pretrained models by manipulating the finetuning dataset, 2023. Preprint 2303.06167. URL http://arxiv.longhoe.net/abs/2303.06167.
  • Wang et al. [2019] Wang, T., Zhao, J., Yatskar, M., Chang, K., and Ordonez, V. Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  5309–5318, Los Alamitos, CA, USA, 2019. IEEE Computer Society. doi: 10.1109/ICCV.2019.00541. URL https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00541.
  • Wu et al. [2023] Wu, S., Yuksekgonul, M., Zhang, L., and Zou, J. Discover and cure: concept-aware mitigation of spurious correlation. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  • Yan et al. [2020] Yan, S., Kao, H.-T., and Ferrara, E. Fair class balancing: Enhancing model fairness without observing sensitive attributes. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp.  1715–1724, New York, NY, USA, 2020. Association for Computing Machinery.
  • Yang et al. [2023a] Yang, Y., Nushi, B., Palangi, H., and Mirzasoleiman, B. Mitigating spurious correlations in multi-modal models during fine-tuning. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  39365–39379. PMLR, 2023a. URL https://proceedings.mlr.press/v202/yang23j.html.
  • Yang et al. [2023b] Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D., and Ghassemi, M. The limits of fair medical imaging ai in the wild, 2023b. Preprint 2312.10083. URL http://arxiv.longhoe.net/abs/2312.10083.
  • Zemel et al. [2013] Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. Learning fair representations. In Dasgupta, S. and McAllester, D. (eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pp.  325–333, Atlanta, Georgia, USA, 2013. PMLR. URL https://proceedings.mlr.press/v28/zemel13.html.

Appendix A Failure modes of data balancing

A.1 Failure mode: Balancing on one variable can increase bias

It is common to consider balancing on classes or groups as it requires fewer labels than joint balancing. However, without further intervention, class or group balancing on its own does not provide an invariant model when Y𝑌Yitalic_Y and Z𝑍Zitalic_Z are marginally dependent [e.g. 43]. In Figure 1(a), this means that XZ\centernotQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍\centernotconditional𝑍𝑌X^{\perp}_{Z}\centernot{\mathrel{\perp\mspace{-10.0mu}\perp}}_{Q}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y, invalidating Prop.4.2. Below, we formalize the observation in Yan et al. [77] that balancing on one variable might affect the representation of the other, and provide bounds on the impact of this strategy.

Formalization and proof.

We formalize this issue in Proposition A.1 for the binary case with a binary attribute.

Proposition A.1.

Consider data balancing of Y𝑌Yitalic_Y; the marginal of Z𝑍Zitalic_Z will be farther from uniform than the marginal of Z𝑍Zitalic_Z before balancing if

sgn(P(Z=1)12P(Y=1)12)=sgn(𝔼[Z|Y=0]𝔼[Z|Y=1]).sgn𝑃𝑍112𝑃𝑌112sgn𝔼conditional𝑍𝑌0𝔼conditional𝑍𝑌1\mathrm{sgn}\left(\frac{P(Z=1)-\frac{1}{2}}{P(Y=1)-\frac{1}{2}}\right)=\mathrm% {sgn}\left(\operatorname{\mathbb{E}}[Z|Y=0]-\operatorname{\mathbb{E}}[Z|Y=1]% \right).roman_sgn ( divide start_ARG italic_P ( italic_Z = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG start_ARG italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG ) = roman_sgn ( blackboard_E [ italic_Z | italic_Y = 0 ] - blackboard_E [ italic_Z | italic_Y = 1 ] ) .

Intuitively, if the biases of Y𝑌Yitalic_Y and Z𝑍Zitalic_Z are in the same (resp. opposite) direction, then this condition is satisfied if Z𝑍Zitalic_Z has a negative (resp. positive) correlation with Y𝑌Yitalic_Y. For example, if we have P(Y=1)=14𝑃𝑌114P(Y=1)=\frac{1}{4}italic_P ( italic_Y = 1 ) = divide start_ARG 1 end_ARG start_ARG 4 end_ARG, 𝔼[Z|Y=1]=1𝔼conditional𝑍𝑌11\operatorname{\mathbb{E}}[Z|Y=1]=1blackboard_E [ italic_Z | italic_Y = 1 ] = 1 and 𝔼[Z|Y=0]=13𝔼conditional𝑍𝑌013\operatorname{\mathbb{E}}[Z|Y=0]=\frac{1}{3}blackboard_E [ italic_Z | italic_Y = 0 ] = divide start_ARG 1 end_ARG start_ARG 3 end_ARG, then 𝔼[Z]=12𝔼𝑍12\operatorname{\mathbb{E}}[Z]=\frac{1}{2}blackboard_E [ italic_Z ] = divide start_ARG 1 end_ARG start_ARG 2 end_ARG before balancing but 𝔼[Z]=13𝔼𝑍13\operatorname{\mathbb{E}}[Z]=\frac{1}{3}blackboard_E [ italic_Z ] = divide start_ARG 1 end_ARG start_ARG 3 end_ARG after balancing.

Proof of Proposition A.1..

We assume that Y𝑌Yitalic_Y and Z𝑍Zitalic_Z, representing the label and confounder, are both binary. We will data-balance on Y𝑌Yitalic_Y. Let Z|Sconditional𝑍𝑆Z\,|\,Sitalic_Z | italic_S denote the distribution of Z𝑍Zitalic_Z after data balancing. To characterize when the distribution of Z|Sconditional𝑍𝑆Z\,|\,Sitalic_Z | italic_S is farther from uniform than the distribution of Z𝑍Zitalic_Z, we will first derive

𝔼[Z]12=p(Y=1)(𝔼[Z|Y=1]12)+p(Y=0)(𝔼[Z|Y=0]12)𝔼𝑍12𝑝𝑌1𝔼conditional𝑍𝑌112𝑝𝑌0𝔼conditional𝑍𝑌012\operatorname{\mathbb{E}}[Z]-\frac{1}{2}=p(Y=1)\left(\operatorname{\mathbb{E}}% [Z\,|\,Y=1]-\frac{1}{2}\right)+p(Y=0)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=0% ]-\frac{1}{2}\right)blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG = italic_p ( italic_Y = 1 ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + italic_p ( italic_Y = 0 ) ( blackboard_E [ italic_Z | italic_Y = 0 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG )

and

𝔼[Z|S]12=12(𝔼[Z|Y=1]12)+12(𝔼[Z|Y=0]12).𝔼conditional𝑍𝑆1212𝔼conditional𝑍𝑌11212𝔼conditional𝑍𝑌012\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}=\frac{1}{2}\left(\operatorname{% \mathbb{E}}[Z\,|\,Y=1]-\frac{1}{2}\right)+\frac{1}{2}\left(\operatorname{% \mathbb{E}}[Z\,|\,Y=0]-\frac{1}{2}\right).blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( blackboard_E [ italic_Z | italic_Y = 1 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( blackboard_E [ italic_Z | italic_Y = 0 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) .

Now, taking the difference, we have

𝔼[Z]12𝔼𝑍12\displaystyle\operatorname{\mathbb{E}}[Z]-\frac{1}{2}blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG =𝔼[Z|S]12+(P(Y=1)12)(𝔼[Z|Y=1]12)+(P(Y=0)12)(𝔼[Z|Y=0]12)absent𝔼conditional𝑍𝑆12𝑃𝑌112𝔼conditional𝑍𝑌112𝑃𝑌012𝔼conditional𝑍𝑌012\displaystyle=\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}+\left(P(Y=1)-% \frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\frac{1}{2}\right% )+\left(P(Y=0)-\frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=0]-% \frac{1}{2}\right)= blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + ( italic_P ( italic_Y = 0 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 0 ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG )
=𝔼[Z|S]12+(P(Y=1)12)𝔼[Z|Y=1]+(P(Y=0)12)𝔼[Z|Y=0]absent𝔼conditional𝑍𝑆12𝑃𝑌112𝔼conditional𝑍𝑌1𝑃𝑌012𝔼conditional𝑍𝑌0\displaystyle=\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}+\left(P(Y=1)-% \frac{1}{2}\right)\operatorname{\mathbb{E}}[Z\,|\,Y=1]+\left(P(Y=0)-\frac{1}{2% }\right)\operatorname{\mathbb{E}}[Z\,|\,Y=0]= blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) blackboard_E [ italic_Z | italic_Y = 1 ] + ( italic_P ( italic_Y = 0 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) blackboard_E [ italic_Z | italic_Y = 0 ]
=𝔼[Z|S]12+(P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0]).absent𝔼conditional𝑍𝑆12𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌0\displaystyle=\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}+\left(P(Y=1)-% \frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{% \mathbb{E}}[Z\,|\,Y=0]\right).= blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ) .

We can derive some sufficient conditions for bias increase, which occurs when |𝔼[ZS]12||𝔼[Z]12|𝔼𝑍𝑆12𝔼𝑍12|\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}|\geq|\operatorname{\mathbb{E}}% [Z]-\frac{1}{2}|| blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | ≥ | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG |. We proceed by cases. If 𝔼[Z]12>0𝔼𝑍120\operatorname{\mathbb{E}}[Z]-\frac{1}{2}>0blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG > 0, then

𝔼[Z|S]12𝔼conditional𝑍𝑆12\displaystyle\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG =𝔼[Z]12+(P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0])absent𝔼𝑍12𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌0\displaystyle=\operatorname{\mathbb{E}}[Z]-\frac{1}{2}+\left(P(Y=1)-\frac{1}{2% }\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{\mathbb{E}}[Z% \,|\,Y=0]\right)= blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] )
=|𝔼[Z]12|+(P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0]),absent𝔼𝑍12𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌0\displaystyle=\left|\operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|+\left(P(Y=% 1)-\frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{% \mathbb{E}}[Z\,|\,Y=0]\right),= | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ) ,

so |𝔼[ZS]12|=|𝔼[Z]12|+(P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0])𝔼𝑍𝑆12𝔼𝑍12𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌0\left|\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}\right|=\left|% \operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|+\left(P(Y=1)-\frac{1}{2}\right% )\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{\mathbb{E}}[Z\,|\,Y=% 0]\right)| blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | = | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | + ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ). Thus, the bias gets worse if (P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0])>0𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌00\left(P(Y=1)-\frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-% \operatorname{\mathbb{E}}[Z\,|\,Y=0]\right)>0( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ) > 0.

Similar reasoning shows that if 𝔼[Z]12<0𝔼𝑍120\operatorname{\mathbb{E}}[Z]-\frac{1}{2}<0blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG < 0, then

|𝔼[ZS]12|=|𝔼[Z]12|(P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0]),𝔼𝑍𝑆12𝔼𝑍12𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌0\left|\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}\right|=\left|% \operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|-\left(P(Y=1)-\frac{1}{2}\right% )\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{\mathbb{E}}[Z\,|\,Y=% 0]\right),| blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | = | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | - ( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ) ,

and we can conclude that the bias is worsened if (P(Y=1)12)(𝔼[Z|Y=1]𝔼[Z|Y=0])<0𝑃𝑌112𝔼conditional𝑍𝑌1𝔼conditional𝑍𝑌00\left(P(Y=1)-\frac{1}{2}\right)\left(\operatorname{\mathbb{E}}[Z\,|\,Y=1]-% \operatorname{\mathbb{E}}[Z\,|\,Y=0]\right)<0( italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ( blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] ) < 0. Taking both statements together, we obtain the statement of the proposition. ∎

For example, if we have P(Y=1)=14𝑃𝑌114P(Y=1)=\frac{1}{4}italic_P ( italic_Y = 1 ) = divide start_ARG 1 end_ARG start_ARG 4 end_ARG, 𝔼[Z|Y=1]=1𝔼conditional𝑍𝑌11\operatorname{\mathbb{E}}[Z\,|\,Y=1]=1blackboard_E [ italic_Z | italic_Y = 1 ] = 1 and 𝔼[Z|Y=0]=13𝔼conditional𝑍𝑌013\operatorname{\mathbb{E}}[Z\,|\,Y=0]=\frac{1}{3}blackboard_E [ italic_Z | italic_Y = 0 ] = divide start_ARG 1 end_ARG start_ARG 3 end_ARG, then 𝔼[Z]=12𝔼𝑍12\operatorname{\mathbb{E}}[Z]=\frac{1}{2}blackboard_E [ italic_Z ] = divide start_ARG 1 end_ARG start_ARG 2 end_ARG but 𝔼[Z|S]12=16𝔼conditional𝑍𝑆1216\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}=\frac{1}{6}blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 6 end_ARG; despite Z𝑍Zitalic_Z starting as unbiased, the data balancing induces a bias of 1616\frac{1}{6}divide start_ARG 1 end_ARG start_ARG 6 end_ARG.

There are a few implications of this derivation. First, we obtain an easy upper bound for the worsening of the bias of Z𝑍Zitalic_Z caused by data balancing: taking absolute values of both sizes and using the triangle inequality on the right yields

|𝔼[Z]12||𝔼[ZS]12|+|P(Y=1)12||𝔼[ZY=1]𝔼[ZY=0]|,𝔼𝑍12𝔼𝑍𝑆12𝑃𝑌112𝔼𝑍𝑌1𝔼𝑍𝑌0\left|\operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|\leq\left|\operatorname{% \mathbb{E}}[Z\,|\,S]-\frac{1}{2}\right|+\left|P(Y=1)-\frac{1}{2}\right|\left|% \operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{\mathbb{E}}[Z\,|\,Y=0]% \right|,| blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | ≤ | blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | + | italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] | ,

Bringing the second term over to the left hand side and applying the same logic produces

|𝔼[ZS]12||𝔼[Z]12|+|P(Y=1)12||𝔼[ZY=1]𝔼[ZY=0]|,𝔼𝑍𝑆12𝔼𝑍12𝑃𝑌112𝔼𝑍𝑌1𝔼𝑍𝑌0\left|\operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}\right|\leq\left|% \operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|+\left|P(Y=1)-\frac{1}{2}\right% |\left|\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{\mathbb{E}}[Z\,|\,Y=% 0]\right|,| blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | ≤ | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | + | italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] | ,

and combining both terms shows that the difference in bias of Z𝑍Zitalic_Z and Z|Sconditional𝑍𝑆Z\,|\,Sitalic_Z | italic_S is bounded by

||𝔼[Z]12||𝔼[ZS]12|||P(Y=1)12||𝔼[ZY=1]𝔼[ZY=0]|.𝔼𝑍12𝔼𝑍𝑆12𝑃𝑌112𝔼𝑍𝑌1𝔼𝑍𝑌0\left|\left|\operatorname{\mathbb{E}}[Z]-\frac{1}{2}\right|-\left|% \operatorname{\mathbb{E}}[Z\,|\,S]-\frac{1}{2}\right|\right|\leq\left|P(Y=1)-% \frac{1}{2}\right|\left|\operatorname{\mathbb{E}}[Z\,|\,Y=1]-\operatorname{% \mathbb{E}}[Z\,|\,Y=0]\right|.| | blackboard_E [ italic_Z ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | - | blackboard_E [ italic_Z | italic_S ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | ≤ | italic_P ( italic_Y = 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | blackboard_E [ italic_Z | italic_Y = 1 ] - blackboard_E [ italic_Z | italic_Y = 0 ] | .
Simulation.

We present a simple simulation to illustrate our reasoning: U𝒩(0,0.1)similar-to𝑈𝒩00.1U\sim\mathcal{N}(0,0.1)italic_U ∼ caligraphic_N ( 0 , 0.1 ) is a common cause to Z𝑍Zitalic_Z and Y𝑌Yitalic_Y. More specifically, the continuous distributions of Y𝑌Yitalic_Y and Z𝑍Zitalic_Z both have the form U+ϵ𝑈italic-ϵU+\epsilonitalic_U + italic_ϵ, with ϵ𝒩(0.05,0.02)similar-toitalic-ϵ𝒩0.050.02\epsilon\sim\mathcal{N}(0.05,0.02)italic_ϵ ∼ caligraphic_N ( 0.05 , 0.02 ). We then binarize Y𝑌Yitalic_Y by thresholding at 0. This creates an imbalance in the marginal of Y𝑌Yitalic_Y, such that a random sample of 5,000 examples has 68%similar-toabsentpercent68\sim 68\%∼ 68 % of positive labels. We then want to vary the marginal of Z𝑍Zitalic_Z, which also requires affecting their correlation. To this end, we vary the threshold for binarizing Z𝑍Zitalic_Z. This leads us to 2 main cases: for thresholds above 0 (i.e. Y𝑌Yitalic_Y’s threshold), the marginal of Z𝑍Zitalic_Z is imbalanced in the same direction as that of Y𝑌Yitalic_Y. For thresholds smaller than 0., we obtain the opposite, i.e. if Y=1𝑌1Y=1italic_Y = 1 is over-represented, Z=1𝑍1Z=1italic_Z = 1 is under-represented.

We illustrate these 2 cases in Figure 6. We observe that when the marginals are similar, balancing Y𝑌Yitalic_Y brings Z𝑍Zitalic_Z closer to a uniform distribution (top row). However, the marginal distribution of Z𝑍Zitalic_Z becomes more imbalanced after balancing on Y𝑌Yitalic_Y if the two distributions are reversed (bottom row). When the correlation is small, there is little change in the marginal of Z𝑍Zitalic_Z when balancing on Y𝑌Yitalic_Y, which is expected.

Same direction

Refer to caption

Reverse direction

Refer to caption
Figure 6: Proportions of Y=0,1𝑌01Y={0,1}italic_Y = 0 , 1 (grey bars) and Z=0,1𝑍01Z={0,1}italic_Z = 0 , 1 (purple bars) before (left) and after (right) balancing the data on Y𝑌Yitalic_Y.

For completeness, we perform 200 simulations with different thresholdings for Z𝑍Zitalic_Z and present the results in Figure 7.

Refer to caption
Figure 7: Distribution P(Z=1)𝑃𝑍1P(Z=1)italic_P ( italic_Z = 1 ) before (blue) and after (orange) balancing the data according to Y𝑌Yitalic_Y, for different values of the binarization threshold of Z𝑍Zitalic_Z which translates into different correlation coefficients between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. Left: similar direction of under-representation. Right: opposite direction.

A.2 Failure mode: entangled signals

In the case where X𝑋Xitalic_X includes non-trivial intersection information XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT, data balancing will in general be insufficient to ensure that there is no association bias. This is because a risk-minimizing predictor f(X)𝑓𝑋f(X)italic_f ( italic_X ) will condition on XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT, and the distribution of these intersection features is influenced by Z𝑍Zitalic_Z.

Specifically, we will give a case where Y𝑌Yitalic_Y is marginally independent of Z𝑍Zitalic_Z and there is no uncontrolled confounding, but E[f(X)Z=0]E[f(X)Z=1]𝐸delimited-[]conditional𝑓𝑋𝑍0𝐸delimited-[]conditional𝑓𝑋𝑍1E[f(X)\mid Z=0]\neq E[f(X)\mid Z=1]italic_E [ italic_f ( italic_X ) ∣ italic_Z = 0 ] ≠ italic_E [ italic_f ( italic_X ) ∣ italic_Z = 1 ].

Suppose we have the following data generating process (DGP):

P(Y=1)𝑃𝑌1\displaystyle P(Y=1)italic_P ( italic_Y = 1 ) =0.5absent0.5\displaystyle=0.5= 0.5
P(Z=1)𝑃𝑍1\displaystyle P(Z=1)italic_P ( italic_Z = 1 ) =0.5absent0.5\displaystyle=0.5= 0.5
P(Y=1,Z=1)𝑃formulae-sequence𝑌1𝑍1\displaystyle P(Y=1,Z=1)italic_P ( italic_Y = 1 , italic_Z = 1 ) =P(Y=1)P(Z=1), i.e., YZabsent𝑃𝑌1𝑃𝑍1, i.e., YZ\displaystyle=P(Y=1)P(Z=1)\textrm{, i.e., $Y^{\perp}_{Z}$}= italic_P ( italic_Y = 1 ) italic_P ( italic_Z = 1 ) , i.e., italic_Y start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT
P(X=1)𝑃𝑋1\displaystyle P(X=1)italic_P ( italic_X = 1 ) ={pif Y OR Zqo.w.absentcases𝑝if Y OR Z𝑞o.w.\displaystyle=\left\{\begin{array}[]{rl}p&\textrm{if $Y$ OR $Z$}\\ q&\textrm{o.w.}\end{array}\right.= { start_ARRAY start_ROW start_CELL italic_p end_CELL start_CELL if italic_Y OR italic_Z end_CELL end_ROW start_ROW start_CELL italic_q end_CELL start_CELL o.w. end_CELL end_ROW end_ARRAY

Note that in this case the entirety of X𝑋Xitalic_X would be classified as intersection information XYZsubscript𝑋𝑌𝑍X_{Y\wedge Z}italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT.

In this setup, the Bayes-optimal probabilities for classification, f(X)𝑓𝑋f(X)italic_f ( italic_X ), are given by:

f(1):=P(Y=1X=1)assign𝑓1𝑃𝑌conditional1𝑋1\displaystyle f(1):=P(Y=1\mid X=1)italic_f ( 1 ) := italic_P ( italic_Y = 1 ∣ italic_X = 1 ) =P(X=1Y=1)P(Y=1)P(X=1)=p0.50.75p+0.25qabsent𝑃𝑋conditional1𝑌1𝑃𝑌1𝑃𝑋1𝑝0.50.75𝑝0.25𝑞\displaystyle=\frac{P(X=1\mid Y=1)P(Y=1)}{P(X=1)}=\frac{p\cdot 0.5}{0.75p+0.25q}= divide start_ARG italic_P ( italic_X = 1 ∣ italic_Y = 1 ) italic_P ( italic_Y = 1 ) end_ARG start_ARG italic_P ( italic_X = 1 ) end_ARG = divide start_ARG italic_p ⋅ 0.5 end_ARG start_ARG 0.75 italic_p + 0.25 italic_q end_ARG

and

f(0):=P(Y=1X=0)assign𝑓0𝑃𝑌conditional1𝑋0\displaystyle f(0):=P(Y=1\mid X=0)italic_f ( 0 ) := italic_P ( italic_Y = 1 ∣ italic_X = 0 ) =(1P(X=1Y=1))P(Y=1)P(X=0)=(1p)0.51(0.75p+0.25q)absent1𝑃𝑋conditional1𝑌1𝑃𝑌1𝑃𝑋01𝑝0.510.75𝑝0.25𝑞\displaystyle=\frac{(1-P(X=1\mid Y=1))P(Y=1)}{P(X=0)}=\frac{(1-p)\cdot 0.5}{1-% (0.75p+0.25q)}= divide start_ARG ( 1 - italic_P ( italic_X = 1 ∣ italic_Y = 1 ) ) italic_P ( italic_Y = 1 ) end_ARG start_ARG italic_P ( italic_X = 0 ) end_ARG = divide start_ARG ( 1 - italic_p ) ⋅ 0.5 end_ARG start_ARG 1 - ( 0.75 italic_p + 0.25 italic_q ) end_ARG

Note that when we condition on Z=0,1𝑍01Z={0,1}italic_Z = 0 , 1, the expectation of f(X)𝑓𝑋f(X)italic_f ( italic_X ) is different whenever (1) pq𝑝𝑞p\neq qitalic_p ≠ italic_q, i.e., whenever the distribution of X𝑋Xitalic_X actually depends on the function of Y𝑌Yitalic_Y and Z𝑍Zitalic_Z, and (2) f(1)f(0)𝑓1𝑓0f(1)\neq f(0)italic_f ( 1 ) ≠ italic_f ( 0 ), i.e., there is some information in X𝑋Xitalic_X to predict Y𝑌Yitalic_Y:

E[f(X)Z=1]𝐸delimited-[]conditional𝑓𝑋𝑍1\displaystyle E[f(X)\mid Z=1]italic_E [ italic_f ( italic_X ) ∣ italic_Z = 1 ] =E[E[f(X)X,Z=1]]=pf(1)+(1p)f(0)absent𝐸delimited-[]𝐸delimited-[]conditional𝑓𝑋𝑋𝑍1𝑝𝑓11𝑝𝑓0\displaystyle=E[E[f(X)\mid X,Z=1]]=pf(1)+(1-p)f(0)= italic_E [ italic_E [ italic_f ( italic_X ) ∣ italic_X , italic_Z = 1 ] ] = italic_p italic_f ( 1 ) + ( 1 - italic_p ) italic_f ( 0 )
E[f(X)Z=0]𝐸delimited-[]conditional𝑓𝑋𝑍0\displaystyle E[f(X)\mid Z=0]italic_E [ italic_f ( italic_X ) ∣ italic_Z = 0 ] =E[E[f(X)X,Z=0]]absent𝐸delimited-[]𝐸delimited-[]conditional𝑓𝑋𝑋𝑍0\displaystyle=E[E[f(X)\mid X,Z=0]]= italic_E [ italic_E [ italic_f ( italic_X ) ∣ italic_X , italic_Z = 0 ] ]
=(0.5p+0.5q)f(1)+(0.5(1p)+0.5(1q))f(0)absent0.5𝑝0.5𝑞𝑓10.51𝑝0.51𝑞𝑓0\displaystyle=(0.5p+0.5q)f(1)+(0.5(1-p)+0.5(1-q))f(0)= ( 0.5 italic_p + 0.5 italic_q ) italic_f ( 1 ) + ( 0.5 ( 1 - italic_p ) + 0.5 ( 1 - italic_q ) ) italic_f ( 0 )

In the simple case where p=1𝑝1p=1italic_p = 1 and q=0𝑞0q=0italic_q = 0 (i.e., X=Y OR Z𝑋𝑌 OR 𝑍X=Y\textrm{ OR }Zitalic_X = italic_Y OR italic_Z deterministically), we get

f(X):=P(Y=1X)={2/3if X=10if X=0.assign𝑓𝑋𝑃𝑌conditional1𝑋cases23if 𝑋10if 𝑋0f(X):=P(Y=1\mid X)=\left\{\begin{array}[]{rl}2/3&\textrm{if }X=1\\ 0&\textrm{if }X=0.\end{array}\right.italic_f ( italic_X ) := italic_P ( italic_Y = 1 ∣ italic_X ) = { start_ARRAY start_ROW start_CELL 2 / 3 end_CELL start_CELL if italic_X = 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if italic_X = 0 . end_CELL end_ROW end_ARRAY
E[f(X)Z]={2/3if Z=11/3if Z=0.𝐸delimited-[]conditional𝑓𝑋𝑍cases23if 𝑍113if 𝑍0E[f(X)\mid Z]=\left\{\begin{array}[]{rl}2/3&\textrm{if }Z=1\\ 1/3&\textrm{if }Z=0.\end{array}\right.italic_E [ italic_f ( italic_X ) ∣ italic_Z ] = { start_ARRAY start_ROW start_CELL 2 / 3 end_CELL start_CELL if italic_Z = 1 end_CELL end_ROW start_ROW start_CELL 1 / 3 end_CELL start_CELL if italic_Z = 0 . end_CELL end_ROW end_ARRAY

Appendix B Conditions for data balancing to lead to an invariant and optimal model

We first investigate the case of a risk-invariant model w.r.t 𝒫𝒫\mathcal{P}caligraphic_P, and then discuss fairness criteria.

B.1 Risk-invariant, optimal model

In this section we provide proofs for Section 4.

Recall that 𝒫={P(X,Y,Z)=P(XZ|Y,Z)P(XY|Y,Z)P(XZY|Y,Z)P(Z|Y)P(Y)}𝒫superscript𝑃𝑋𝑌𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑌𝑌𝑍𝑃conditionalsubscript𝑋𝑍𝑌𝑌𝑍superscript𝑃conditional𝑍𝑌𝑃𝑌\mathcal{P}=\{P^{\prime}(X,Y,Z)=P(X^{\perp}_{Z}|Y,Z)P(X^{\perp}_{Y}|Y,Z)P(X_{Z% \wedge Y}|Y,Z)P^{\prime}(Z|Y)P(Y)\}caligraphic_P = { italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y , italic_Z ) = italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_X start_POSTSUBSCRIPT italic_Z ∧ italic_Y end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y ) italic_P ( italic_Y ) } and that we assume a data balancing distribution Q(X,Y,Z)𝒫𝑄𝑋𝑌𝑍𝒫Q(X,Y,Z)\in\mathcal{P}italic_Q ( italic_X , italic_Y , italic_Z ) ∈ caligraphic_P of the form Q(X,Y,Z)=P(X|Y,Z)P(Z)P(Y)𝑄𝑋𝑌𝑍𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌Q(X,Y,Z)=P(X\,|\,Y,Z)P(Z)P(Y)italic_Q ( italic_X , italic_Y , italic_Z ) = italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ). Also recall that we define XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT to be a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q if 𝔼Q[Y|X]=𝔼Q[Y|XY]subscript𝔼𝑄conditional𝑌𝑋subscript𝔼𝑄conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑌\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X^{% \perp}_{Y}]blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] = blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ].

Proposition 4.2.

If XZQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y and XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q, then the risk-minimizer f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] is risk-invariant and optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P.

Proof.

Let Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be an arbitrary distribution in 𝒫𝒫\mathcal{P}caligraphic_P. We have

P(XZ|Y)superscript𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌\displaystyle P^{\prime}(X^{\perp}_{Z}\,|\,Y)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) =ZP(XZ|Y,Z)P(Z|Y)absentsubscript𝑍superscript𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍superscript𝑃conditional𝑍𝑌\displaystyle=\sum_{Z}P^{\prime}(X^{\perp}_{Z}\,|\,Y,Z)P^{\prime}(Z\,|\,Y)= ∑ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y )
=(1)ZQ(XZ|Y,Z)P(Z|Y)superscript1absentsubscript𝑍𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌cancel𝑍superscript𝑃conditional𝑍𝑌\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\sum_{Z}Q(X^{\perp}_{Z}\,|\,Y,% \cancel{Z})P^{\prime}(Z\,|\,Y)start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , cancel italic_Z ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Z | italic_Y )
=Q(XZ|Y).absent𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌\displaystyle=Q(X^{\perp}_{Z}\,|\,Y).= italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) .

where (1) holds as P,Q𝒫superscript𝑃𝑄𝒫P^{\prime},Q\in{\mathcal{P}}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q ∈ caligraphic_P and by the independence assumption. As P(Y)=Q(Y)superscript𝑃𝑌𝑄𝑌P^{\prime}(Y)=Q(Y)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) = italic_Q ( italic_Y ) we obtain P(XZ,Y)=Q(XZ,Y)superscript𝑃subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌P^{\prime}(X^{\perp}_{Z},Y)=Q(X^{\perp}_{Z},Y)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y ) = italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y ). As XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q, f(X):=𝔼Q[Y|X]=𝔼Q[Y|XZ]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋subscript𝔼𝑄conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]=\operatorname{\mathbb{E}}_{Q}[Y\,% |\,X^{\perp}_{Z}]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] = blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ], that is f(X)𝑓𝑋f(X)italic_f ( italic_X ) (and therefore the loss (f;X,Y)𝑓𝑋𝑌\ell(f;X,Y)roman_ℓ ( italic_f ; italic_X , italic_Y )) remains constant for different values of XY,XYZsubscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍X^{\perp}_{Y},X_{Y\wedge Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT, giving

𝔼X,YP[(f;X,Y)]=𝔼XZ,YP[(f;X,Y)]=𝔼XZ,YQ[(f;X,Y)].subscript𝔼similar-to𝑋𝑌superscript𝑃𝑓𝑋𝑌subscript𝔼similar-tosubscriptsuperscript𝑋perpendicular-to𝑍𝑌superscript𝑃𝑓𝑋𝑌subscript𝔼similar-tosubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑄𝑓𝑋𝑌\displaystyle\operatorname{\mathbb{E}}_{X,Y\sim P^{\prime}}[\ell(f;X,Y)]=% \operatorname{\mathbb{E}}_{X^{\perp}_{Z},Y\sim P^{\prime}}[\ell(f;X,Y)]=% \operatorname{\mathbb{E}}_{X^{\perp}_{Z},Y\sim Q}[\ell(f;X,Y)].blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] = blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] = blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y ∼ italic_Q end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] .

The same reasoning can be repeated for P′′𝒫superscript𝑃′′𝒫P^{\prime\prime}\in\mathcal{P}italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, obtaining 𝔼X,YP[(f;X,Y)]=𝔼X,YP′′[(f;X,Y)]subscript𝔼similar-to𝑋𝑌superscript𝑃𝑓𝑋𝑌subscript𝔼similar-to𝑋𝑌superscript𝑃′′𝑓𝑋𝑌\operatorname{\mathbb{E}}_{X,Y\sim P^{\prime}}[\ell(f;X,Y)]=\operatorname{% \mathbb{E}}_{X,Y\sim P^{\prime\prime}}[\ell(f;X,Y)]blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] = blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ], which proves that f𝑓fitalic_f is risk-invariant w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P.
As f=minf𝔼X,YQ[(f;X,Y)]𝑓subscriptsuperscript𝑓subscript𝔼similar-to𝑋𝑌𝑄superscript𝑓𝑋𝑌f=\min_{f^{\prime}}\operatorname{\mathbb{E}}_{X,Y\sim Q}[\ell(f^{\prime};X,Y)]italic_f = roman_min start_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_Q end_POSTSUBSCRIPT [ roman_ℓ ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_X , italic_Y ) ] and 𝔼X,YP[(f;X,Y)]=𝔼X,YQ[(f;X,Y)\operatorname{\mathbb{E}}_{X,Y\sim P^{\prime}}[\ell(f;X,Y)]=\operatorname{% \mathbb{E}}_{X,Y\sim Q}[\ell(f;X,Y)blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) ] = blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_Q end_POSTSUBSCRIPT [ roman_ℓ ( italic_f ; italic_X , italic_Y ) P𝒫for-allsuperscript𝑃𝒫\forall P^{\prime}\in\mathcal{P}∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, we obtain f=minf𝔼X,YP[(f;X,Y)])f=\min_{f^{\prime}}\operatorname{\mathbb{E}}_{X,Y\sim P^{\prime}}[\ell(f^{% \prime};X,Y)]\big{)}italic_f = roman_min start_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X , italic_Y ∼ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_X , italic_Y ) ] ), P𝒫for-allsuperscript𝑃𝒫\forall P^{\prime}\in\mathcal{P}∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, which implies that f𝑓fitalic_f is optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P. ∎

Corollary 4.3.

Let R={XY,XYZ}𝑅subscriptsuperscript𝑋perpendicular-to𝑌subscript𝑋𝑌𝑍R=\{X^{\perp}_{Y},X_{Y\wedge Z}\}italic_R = { italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT }. If RP{XZ,Y}|Zsubscriptperpendicular-toabsentperpendicular-to𝑃𝑅conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍R\mathrel{\perp\mspace{-10.0mu}\perp}_{P}\{X^{\perp}_{Z},Y\}\,|\,Zitalic_R start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT { italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y } | italic_Z and XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y, then f(XZ)=𝔼Q[Y|XZ]𝑓subscriptsuperscript𝑋perpendicular-to𝑍subscript𝔼𝑄conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍f(X^{\perp}_{Z})=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X^{\perp}_{Z}]italic_f ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ] is risk-invariant and optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P.

Proof.

We have

Q(Y|R,XZ)𝑄conditional𝑌𝑅subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle Q(Y\,|\,R,X^{\perp}_{Z})italic_Q ( italic_Y | italic_R , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) =ZQ(R,XZ,Y,Z)Z,YQ(R,XZ,Y,Z)absentsubscript𝑍𝑄𝑅subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍subscript𝑍𝑌𝑄𝑅subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍\displaystyle=\frac{\sum_{Z}Q(R,X^{\perp}_{Z},Y,Z)}{\sum_{Z,Y}Q(R,X^{\perp}_{Z% },Y,Z)}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_Q ( italic_R , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Z , italic_Y end_POSTSUBSCRIPT italic_Q ( italic_R , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG
=(1)ZP(R,XZ|Y,Z)P(Z)P(Y)Z,YP(R,XZ|Y,Z)P(Z)P(Y)superscript1absentsubscript𝑍𝑃𝑅conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑍𝑃𝑌subscript𝑍𝑌𝑃𝑅conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑍𝑃𝑌\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\frac{\sum_{Z}P(R,X^{\perp}_{Z}% \,|\,Y,Z)P(Z)P(Y)}{\sum_{Z,Y}P(R,X^{\perp}_{Z}\,|\,Y,Z)P(Z)P(Y)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP divide start_ARG ∑ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_P ( italic_R , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Z , italic_Y end_POSTSUBSCRIPT italic_P ( italic_R , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG
=(2)ZP(R|XZ,Y,Z)P(XZ|Y,Z)P(Z)P(Y)Z,YP(R|XZ,Y,Z)P(XZ|Y,Z)P(Z)P(Y)superscript2absentsubscript𝑍𝑃conditional𝑅cancelsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌cancel𝑍𝑃𝑍𝑃𝑌subscript𝑍𝑌𝑃conditional𝑅cancelsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌cancel𝑍𝑃𝑍𝑃𝑌\displaystyle\stackrel{{\scriptstyle(2)}}{{=}}\frac{\sum_{Z}P(R\,|\,\cancel{X^% {\perp}_{Z},Y},Z)P(X^{\perp}_{Z}\,|\,Y,\cancel{Z})P(Z)P(Y)}{\sum_{Z,Y}P(R\,|\,% \cancel{X^{\perp}_{Z},Y},Z)P(X^{\perp}_{Z}\,|\,Y,\cancel{Z})P(Z)P(Y)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 2 ) end_ARG end_RELOP divide start_ARG ∑ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT italic_P ( italic_R | cancel italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , cancel italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Z , italic_Y end_POSTSUBSCRIPT italic_P ( italic_R | cancel italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , cancel italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG
=P(R)P(XZ|Y)P(Y)P(R)YP(XZ|Y)P(Y)absent𝑃𝑅𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑅subscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌\displaystyle=\frac{P(R)P(X^{\perp}_{Z}\,|\,Y)P(Y)}{P(R)\sum_{Y}P(X^{\perp}_{Z% }\,|\,Y)P(Y)}= divide start_ARG italic_P ( italic_R ) italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) end_ARG start_ARG italic_P ( italic_R ) ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) end_ARG
=P(Y|XZ),absent𝑃conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle=P(Y\,|\,X^{\perp}_{Z}),= italic_P ( italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ,

where (1) holds by the definition of the balanced distribution Q𝑄Qitalic_Q and (2) holds by the independence assumptions. This derivation shows that YQR|XZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌conditional𝑅subscriptsuperscript𝑋perpendicular-to𝑍Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}R\,|\,X^{\perp}_{Z}italic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_R | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and therefore that XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in P𝑃Pitalic_P. We are in the same conditions as in Proposition 4.2, which implies that f𝑓fitalic_f is risk-invariant and optimal w.r.t. 𝒫𝒫\mathcal{P}caligraphic_P. ∎

Proposition 4.4.

Let ϕ()italic-ϕ\phi(\cdot)italic_ϕ ( ⋅ ) be disentangled with 𝔼P[Y|ϕZ(X)]=𝔼P[Y|XZ]P𝒫superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋superscriptsubscript𝔼𝑃delimited-[]conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍for-allsuperscript𝑃𝒫\mathbb{E}_{P}^{\prime}[Y\,|\,\phi^{\perp}_{Z}(X)]=\mathbb{E}_{P}^{\prime}[Y\,% |\,X^{\perp}_{Z}]\forall P^{\prime}\in\mathcal{P}blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ] = blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ] ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P and hhitalic_h be a linear function. The risk-minimizer f(X):=𝔼Q[Y|X]assign𝑓𝑋subscript𝔼𝑄conditional𝑌𝑋f(X):=\operatorname{\mathbb{E}}_{Q}[Y\,|\,X]italic_f ( italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT [ italic_Y | italic_X ] is optimal and risk-invariant across 𝒫𝒫\mathcal{P}caligraphic_P if 𝔼P[Y|f(X)]=𝔼P[Y|h(ϕZ(X))]P𝒫subscript𝔼superscript𝑃conditional𝑌𝑓𝑋subscript𝔼superscript𝑃conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋for-allsuperscript𝑃𝒫\operatorname{\mathbb{E}}_{P^{\prime}}[Y\,|\,f(X)]=\operatorname{\mathbb{E}}_{% P^{\prime}}[Y\,|\,h(\phi^{\perp}_{Z}(X))]\forall P^{\prime}\in\mathcal{P}blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_f ( italic_X ) ] = blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_h ( italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ) ] ∀ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P, XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is a sufficient statistic for Y𝑌Yitalic_Y in Q𝑄Qitalic_Q and XZQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y.

Proof.

The proof is straightforward as it directly depends on the definition of a disentangled representation and the previous statements:

𝔼P[Y|f(X)]subscript𝔼superscript𝑃conditional𝑌𝑓𝑋\displaystyle\operatorname{\mathbb{E}}_{P^{\prime}}[Y\,|\,f(X)]blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_f ( italic_X ) ] =𝔼P[Y|h(ϕZ(X))]absentsubscript𝔼superscript𝑃conditional𝑌subscriptsuperscriptitalic-ϕperpendicular-to𝑍𝑋\displaystyle=\operatorname{\mathbb{E}}_{P^{\prime}}[Y\,|\,h(\phi^{\perp}_{Z}(% X))]= blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_h ( italic_ϕ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_X ) ) ]
=(1)𝔼P[Y|h(XZ)]superscript1absentsubscript𝔼superscript𝑃conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\operatorname{\mathbb{E}}_{P^{% \prime}}[Y\,|\,h(X^{\perp}_{Z})]start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP blackboard_E start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_Y | italic_h ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ]
=(2)𝔼P[Y|h(XZ)]superscript2absentsubscript𝔼𝑃conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle\stackrel{{\scriptstyle(2)}}{{=}}\operatorname{\mathbb{E}}_{P}[Y% \,|\,h(X^{\perp}_{Z})]start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 2 ) end_ARG end_RELOP blackboard_E start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ italic_Y | italic_h ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ]

Where (1) reflects the assumption of a disentangled representation, and (2) uses the proof of Proposition 4.2. ∎

B.2 Conditions for data balancing to lead to a fair model

This section gives several results to illustrate the fact that data balancing implemented to generate independence between outcomes Y𝑌Yitalic_Y and sensitive attributes Z𝑍Zitalic_Z does not necessarily imply that a function of some covariates X𝑋Xitalic_X to predict Y𝑌Yitalic_Y will be independent of (or not encode information on) Z𝑍Zitalic_Z. The results we describe do not address the case where XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is not accessible directly.

Proposition B.1 (Demographic parity).

XZQZsubscriptperpendicular-toabsentperpendicular-to𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑍X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z if XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y; that is balancing successfully induces independence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Let XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y. The following derivation demonstrates the claim,

Q(XZ|Z)𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍\displaystyle Q(X^{\perp}_{Z}\,|\,Z)italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z ) =YQ(XZ,Y,Z)Y,XZQ(XZ,Y,Z)absentsubscript𝑌𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍subscript𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍\displaystyle=\frac{\sum_{Y}Q(X^{\perp}_{Z},Y,Z)}{\sum_{Y,X^{\perp}_{Z}}Q(X^{% \perp}_{Z},Y,Z)}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG
=(1)YP(XZ|Y,Z)P(Y)P(Z)Y,XZP(XZ|Y,Z)P(Y)P(Z)superscript1absentsubscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍subscript𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\frac{\sum_{Y}P(X^{\perp}_{Z}\,|% \,Y,Z)P(Y)P(Z)}{\sum_{Y,X^{\perp}_{Z}}P(X^{\perp}_{Z}\,|\,Y,Z)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=(2)YP(XZ|Y)P(Y)P(Z)Y,XZP(XZ|Y)P(Y)P(Z)superscript2absentsubscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍subscript𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(2)}}{{=}}\frac{\sum_{Y}P(X^{\perp}_{Z}\,|% \,Y)P(Y)P(Z)}{\sum_{Y,X^{\perp}_{Z}}P(X^{\perp}_{Z}\,|\,Y)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 2 ) end_ARG end_RELOP divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=P(XZ),absent𝑃subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle=P(X^{\perp}_{Z}),= italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ,

where (1)1(1)( 1 ) holds by the definition of data balancing on the joint, (2)2(2)( 2 ) holds by the assumption of conditional independence. Therefore, the l.h.s is not a function of Z𝑍Zitalic_Z which establishes marginal independence. ∎

Proposition B.2.

In general, XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent in Q𝑄Qitalic_Q if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in P𝑃Pitalic_P; that is data balancing does not induce independence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Note first that the reduction in (2)2(2)( 2 ) does not hold in general without conditional independence. Further, note that,

Q(XZ|Z)=YQ(XZ|Z,Y)Q(Y|Z)=YQ(XZ|Z,Y)Q(Y).𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍subscript𝑌𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌𝑄conditional𝑌𝑍subscript𝑌𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌𝑄𝑌\displaystyle Q(X^{\perp}_{Z}\,|\,Z)=\sum_{Y}Q(X^{\perp}_{Z}\,|\,Z,Y)Q(Y\,|\,Z% )=\sum_{Y}Q(X^{\perp}_{Z}\,|\,Z,Y)Q(Y).italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z ) = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_Q ( italic_Y | italic_Z ) = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_Q ( italic_Y ) .

If XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are dependent given Y𝑌Yitalic_Y in P𝑃Pitalic_P then XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are dependent given Y𝑌Yitalic_Y in Q𝑄Qitalic_Q so that Q(XZ|Z,Y)𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌Q(X^{\perp}_{Z}\,|\,Z,Y)italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) varies with Z𝑍Zitalic_Z, making the l.h.s a function of Z𝑍Zitalic_Z in general. Therefore, in general, data balancing will not be successful without conditional independence. ∎

Proposition B.3 (Predictive parity).

YQZ|XZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌conditional𝑍subscriptsuperscript𝑋perpendicular-to𝑍Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,X^{\perp}_{Z}italic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT if XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y; that is data balancing successfully induces independence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z given XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Let XZPZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y. The following derivation demonstrates the claim,

Q(Y|XZ,Z)𝑄conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍𝑍\displaystyle Q(Y\,|\,X^{\perp}_{Z},Z)italic_Q ( italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Z ) =Q(XZ,Y,Z)YQ(XZ,Y,Z)absent𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍subscript𝑌𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍\displaystyle=\frac{Q(X^{\perp}_{Z},Y,Z)}{\sum_{Y}Q(X^{\perp}_{Z},Y,Z)}= divide start_ARG italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG
=(1)P(XZ|Y,Z)P(Y)P(Z)YP(XZ|Y,Z)P(Y)P(Z)superscript1absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍subscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\frac{P(X^{\perp}_{Z}\,|\,Y,Z)P(% Y)P(Z)}{\sum_{Y}P(X^{\perp}_{Z}\,|\,Y,Z)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP divide start_ARG italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=(2)P(XZ|Y)P(Y)P(Z)YP(XZ|Y)P(Y)P(Z)superscript2absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍subscript𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(2)}}{{=}}\frac{P(X^{\perp}_{Z}\,|\,Y)P(Y)% P(Z)}{\sum_{Y}P(X^{\perp}_{Z}\,|\,Y)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 2 ) end_ARG end_RELOP divide start_ARG italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=P(Y|XZ),absent𝑃conditional𝑌subscriptsuperscript𝑋perpendicular-to𝑍\displaystyle=P(Y\,|\,X^{\perp}_{Z}),= italic_P ( italic_Y | italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ,

where (1)1(1)( 1 ) holds by the definition of data balancing on the joint, (2)2(2)( 2 ) holds by the assumption of conditional independence. Therefore, the l.h.s is not a function of z𝑧zitalic_z which establishes conditional independence. ∎

Proposition B.4.

In general, Y𝑌Yitalic_Y and Z𝑍Zitalic_Z are not independent given XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT in Q𝑄Qitalic_Q if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in P𝑃Pitalic_P; that is data balancing does not induce independence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z given XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Similarly to the arguments above, the reduction in (2)2(2)( 2 ) does not hold in general without conditional independence. Therefore, in general, data balancing will not be successful without conditional independence. ∎

Proposition B.5 (Equalized odds).

(XZZ|Y)Qsubscriptperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌𝑄(X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Y)_{Q}( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y ) start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT if (XZZ|Y)Psubscriptperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌𝑃(X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Y)_{P}( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y ) start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT; that is data balancing does not disturb independence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z given Y𝑌Yitalic_Y if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Let (XZZ|Y)Psubscriptperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌𝑃(X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Y)_{P}( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y ) start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. Note that in this case we just need to show that data balancing does not disturb the conditional independence (XZZ|Y)Psubscriptperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌𝑃(X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Y)_{P}( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y ) start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT present in the original data (we already had equalized odds in original data). The following derivation demonstrates the claim,

Q(XZ|Z,Y)𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌\displaystyle Q(X^{\perp}_{Z}\,|\,Z,Y)italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) =Q(XZ,Y,Z)XZQ(XZ,Y,Z)absent𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍subscriptsubscriptsuperscript𝑋perpendicular-to𝑍𝑄subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍\displaystyle=\frac{Q(X^{\perp}_{Z},Y,Z)}{\sum_{X^{\perp}_{Z}}Q(X^{\perp}_{Z},% Y,Z)}= divide start_ARG italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ) end_ARG
=(1)P(XZ|Y,Z)P(Y)P(Z)XZP(XZ|Y,Z)P(Y)P(Z)superscript1absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍subscriptsubscriptsuperscript𝑋perpendicular-to𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(1)}}{{=}}\frac{P(X^{\perp}_{Z}\,|\,Y,Z)P(% Y)P(Z)}{\sum_{X^{\perp}_{Z}}P(X^{\perp}_{Z}\,|\,Y,Z)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP divide start_ARG italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=(2)P(XZ|Y)P(Y)P(Z)XZP(XZ|Y)P(Y)P(Z)superscript2absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍subscriptsubscriptsuperscript𝑋perpendicular-to𝑍𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑃𝑌𝑃𝑍\displaystyle\stackrel{{\scriptstyle(2)}}{{=}}\frac{P(X^{\perp}_{Z}\,|\,Y)P(Y)% P(Z)}{\sum_{X^{\perp}_{Z}}P(X^{\perp}_{Z}\,|\,Y)P(Y)P(Z)}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 2 ) end_ARG end_RELOP divide start_ARG italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) italic_P ( italic_Y ) italic_P ( italic_Z ) end_ARG
=P(XZ|Y),absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌\displaystyle=P(X^{\perp}_{Z}\,|\,Y),= italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) ,

where (1)1(1)( 1 ) holds by the definition of data balancing, (2)2(2)( 2 ) holds by the assumption of conditional independence. Therefore, the l.h.s is not a function of z𝑧zitalic_z which establishes conditional independence. ∎

Proposition B.6.

In general, XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in Q𝑄Qitalic_Q if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in P𝑃Pitalic_P; that is data balancing does not induce independence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z if XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are not independent given Y𝑌Yitalic_Y in the original data distribution.

Proof.

Similarly to the arguments above, the reduction in (2)2(2)( 2 ) does not hold in general without conditional independence. Therefore, in general, data balancing will not be successful without conditional independence. ∎

Appendix C Impact of data balancing on the CBN

U𝑈Uitalic_UZ𝑍Zitalic_ZY𝑌Yitalic_YX𝑋Xitalic_X

In the following we assume that Z𝑍Zitalic_Z is discrete, but all the results remain valid for continuous Z𝑍Zitalic_Z.

Proposition 5.1.

Let 𝒢,P𝒢𝑃\langle\mathcal{G},P\rangle⟨ caligraphic_G , italic_P ⟩ be the CBN underlying the data, where 𝒢𝒢\mathcal{G}caligraphic_G contains an undesired path between Z𝑍Zitalic_Z and Y𝑌Yitalic_Y, and let 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT be a modification of 𝒢𝒢\mathcal{G}caligraphic_G in which the undesired path has been removed. The distribution Q𝑄Qitalic_Q obtained by joint balancing the data to make Z𝑍Zitalic_Z and Y𝑌Yitalic_Y statistically independent, i.e. Q(Y,X,Z)=P(X|Y,Z)P(Z)P(Y)𝑄𝑌𝑋𝑍𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌Q(Y,X,Z)=P(X\,|\,Y,Z)P(Z)P(Y)italic_Q ( italic_Y , italic_X , italic_Z ) = italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ), might not factorize according to 𝒢0superscript𝒢0\mathcal{G}^{0}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

Proof.

Example 1: Causal task with causal and non-causal paths. Consider 𝒢={ZXY,ZUY}\mathcal{G}=\{Z\rightarrow X\rightarrow Y,Z\leftarrow U\rightarrow Y\}caligraphic_G = { italic_Z → italic_X → italic_Y , italic_Z ← italic_U → italic_Y }, for unobserved U𝑈Uitalic_U. We have

Q(Y|X,Z)𝑄conditional𝑌𝑋𝑍\displaystyle Q(Y\,|\,X,Z)italic_Q ( italic_Y | italic_X , italic_Z ) =Q(X,Y,Z)YQ(X,Y,Z)=P(X|Y,Z)P(Z)P(Y)YP(X|Y,Z)P(Z)P(Y)=P(X|Z,Y)P(Y)YP(X|Z,Y)P(Y),absent𝑄𝑋𝑌𝑍subscript𝑌𝑄𝑋𝑌𝑍𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌subscript𝑌𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌𝑃conditional𝑋𝑍𝑌𝑃𝑌subscript𝑌𝑃conditional𝑋𝑍𝑌𝑃𝑌\displaystyle=\frac{Q(X,Y,Z)}{\sum_{Y}Q(X,Y,Z)}=\frac{P(X\,|\,Y,Z)P(Z)P(Y)}{% \sum_{Y}P(X\,|\,Y,Z)P(Z)P(Y)}=\frac{P(X\,|\,Z,Y)P(Y)}{\sum_{Y}P(X\,|\,Z,Y)P(Y)},= divide start_ARG italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG = divide start_ARG italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG = divide start_ARG italic_P ( italic_X | italic_Z , italic_Y ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Z , italic_Y ) italic_P ( italic_Y ) end_ARG ,

where the r.h.s is a function of Z𝑍Zitalic_Z in general as X𝑋Xitalic_X is not independent of Y𝑌Yitalic_Y given Z𝑍Zitalic_Z in P𝑃Pitalic_P. If Q𝑄Qitalic_Q were 𝒢0={ZXY}superscript𝒢0𝑍𝑋𝑌\mathcal{G}^{0}=\{Z\rightarrow X\rightarrow Y\}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = { italic_Z → italic_X → italic_Y }, then YZ|Xperpendicular-toabsentperpendicular-to𝑌conditional𝑍𝑋Y\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Xitalic_Y start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_X in Q𝑄Qitalic_Q. To show the claim it suffices therefore to construct a distribution P𝑃Pitalic_P such that X𝑋Xitalic_X is not independent of Y𝑌Yitalic_Y given Z𝑍Zitalic_Z.

Example 2: Causal task with non-causal path. Consider 𝒢={XY,ZUY}𝒢formulae-sequence𝑋𝑌𝑍𝑈𝑌\mathcal{G}=\{X\rightarrow Y,Z\leftarrow U\rightarrow Y\}caligraphic_G = { italic_X → italic_Y , italic_Z ← italic_U → italic_Y }. We have that,

Q(X|Z)=YQ(X,Y,Z)Y,XQ(X,Y,Z)=YP(X|Y,Z)P(Z)P(Y))Y,XP(X|Y,Z)P(Z)P(Y)=YP(X|Y,Z)P(Y).\displaystyle Q(X\,|\,Z)=\frac{\sum_{Y}Q(X,Y,Z)}{\sum_{Y,X}Q(X,Y,Z)}=\frac{% \sum_{Y}P(X\,|\,Y,Z)P(Z)P(Y))}{\sum_{Y,X}P(X\,|\,Y,Z)P(Z)P(Y)}=\sum_{Y}P(X\,|% \,Y,Z)P(Y).italic_Q ( italic_X | italic_Z ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG = divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Y ) .

The r.h.s is a function of Z𝑍Zitalic_Z in general as X𝑋Xitalic_X is not independent of Z𝑍Zitalic_Z given Y𝑌Yitalic_Y in a distribution P𝑃Pitalic_P consistent with 𝒢𝒢\mathcal{G}caligraphic_G. Therefore, one may not interpret the mutilated graph 𝒢0={XY}superscript𝒢0𝑋𝑌\mathcal{G}^{0}=\{X\rightarrow Y\}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = { italic_X → italic_Y } as a correct representation of the conditional independencies implied by the balanced distribution Q𝑄Qitalic_Q.

Example 3: Causal task with causal path. Consider 𝒢={ZXY}𝒢𝑍𝑋𝑌\mathcal{G}=\{Z\rightarrow X\rightarrow Y\}caligraphic_G = { italic_Z → italic_X → italic_Y }. We have that,

Q(Y|X,Z)𝑄conditional𝑌𝑋𝑍\displaystyle Q(Y\,|\,X,Z)italic_Q ( italic_Y | italic_X , italic_Z ) =Q(X,Y,Z)YQ(X,Y,Z)=P(X|Y,Z)P(Z)P(Y)YP(X|Y,Z)P(Z)P(Y)=P(X|Z,Y)P(Y)YP(X|Z,Y)P(Y),absent𝑄𝑋𝑌𝑍subscript𝑌𝑄𝑋𝑌𝑍𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌subscript𝑌𝑃conditional𝑋𝑌𝑍𝑃𝑍𝑃𝑌𝑃conditional𝑋𝑍𝑌𝑃𝑌subscript𝑌𝑃conditional𝑋𝑍𝑌𝑃𝑌\displaystyle=\frac{Q(X,Y,Z)}{\sum_{Y}Q(X,Y,Z)}=\frac{P(X\,|\,Y,Z)P(Z)P(Y)}{% \sum_{Y}P(X\,|\,Y,Z)P(Z)P(Y)}=\frac{P(X\,|\,Z,Y)P(Y)}{\sum_{Y}P(X\,|\,Z,Y)P(Y)},= divide start_ARG italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z ) end_ARG = divide start_ARG italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG = divide start_ARG italic_P ( italic_X | italic_Z , italic_Y ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Z , italic_Y ) italic_P ( italic_Y ) end_ARG ,

The r.h.s is a function of Z𝑍Zitalic_Z in general as X𝑋Xitalic_X is not independent of Z𝑍Zitalic_Z given Y𝑌Yitalic_Y in P𝑃Pitalic_P. Therefore, one may not interpret the mutilated graph 𝒢0={Z,XY}superscript𝒢0𝑍𝑋𝑌\mathcal{G}^{0}=\{Z,X\rightarrow Y\}caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = { italic_Z , italic_X → italic_Y } as a correct representation of the conditional independencies implied by the balanced distribution Q𝑄Qitalic_Q.

Example 4: Anti-causal task. Consider 𝒢={YX,ZUY,ZWX}\mathcal{G}=\{Y\rightarrow X,Z\leftarrow U\rightarrow Y,Z\rightarrow W% \rightarrow X\}caligraphic_G = { italic_Y → italic_X , italic_Z ← italic_U → italic_Y , italic_Z → italic_W → italic_X }. We have that,

Q(X|Z)=Y,WQ(X,Y,Z,W)Y,X,WQ(X,Y,Z,W)=Y,WP(X,W|Y,Z)P(Z)P(Y))Y,X,WP(X,W|Y,Z)P(Z)P(Y)=YP(X|Y,Z)P(Y).\displaystyle Q(X\,|\,Z)=\frac{\sum_{Y,W}Q(X,Y,Z,W)}{\sum_{Y,X,W}Q(X,Y,Z,W)}=% \frac{\sum_{Y,W}P(X,W\,|\,Y,Z)P(Z)P(Y))}{\sum_{Y,X,W}P(X,W\,|\,Y,Z)P(Z)P(Y)}=% \sum_{Y}P(X\,|\,Y,Z)P(Y).italic_Q ( italic_X | italic_Z ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_W end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z , italic_W ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X , italic_W end_POSTSUBSCRIPT italic_Q ( italic_X , italic_Y , italic_Z , italic_W ) end_ARG = divide start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_W end_POSTSUBSCRIPT italic_P ( italic_X , italic_W | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_Y , italic_X , italic_W end_POSTSUBSCRIPT italic_P ( italic_X , italic_W | italic_Y , italic_Z ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , italic_Z ) italic_P ( italic_Y ) .

The r.h.s is a function of Z𝑍Zitalic_Z in general as X𝑋Xitalic_X is not independent of Z𝑍Zitalic_Z given Y𝑌Yitalic_Y in a distribution P𝑃Pitalic_P consistent with 𝒢𝒢\mathcal{G}caligraphic_G. Therefore, one may not interpret the mutilated graph 𝒢={YX,ZWX}superscript𝒢formulae-sequence𝑌𝑋𝑍𝑊𝑋\mathcal{G}^{\prime}=\{Y\rightarrow X,Z\rightarrow W\rightarrow X\}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_Y → italic_X , italic_Z → italic_W → italic_X } as a correct representation of the conditional independencies implied by the balanced distribution Q𝑄Qitalic_Q.

C.1 Regularization and data balancing don’t always go hand in hand

C.1.1 Risk-invariance

We first consider the graph in Figure 1(d) and show that XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y in both Q𝑄Qitalic_Q, which justifies its use in addition to data balancing, although there might not be a benefit of using both techniques simultaneously (in theory).

Proposition C.1.

Consider the graph 𝒢𝒢\mathcal{G}caligraphic_G in Figure 1(d). Then XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y in both the training data distribution P𝑃Pitalic_P (consistent with 𝒢𝒢\mathcal{G}caligraphic_G) and the distribution after balancing, namely Q𝑄Qitalic_Q.

Proof.

XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y holds in the training data distribution P𝑃Pitalic_P by d𝑑ditalic_d-separation. For the conditional independence in Q𝑄Qitalic_Q, consider the following derivation,

Q(XZ|Y,Z)𝑄conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍\displaystyle Q(X^{\perp}_{Z}\,|\,Y,Z)italic_Q ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y , italic_Z ) =XYZP(XZ,XYZ|Z,Y)P(Z)P(Y)XYZ,XZP(XZ,XYZ|Z,Y)P(Z)P(Y)absentsubscriptsubscript𝑋𝑌𝑍𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditionalsubscript𝑋𝑌𝑍𝑍𝑌𝑃𝑍𝑃𝑌subscriptsubscript𝑋𝑌𝑍subscriptsuperscript𝑋perpendicular-to𝑍𝑃subscriptsuperscript𝑋perpendicular-to𝑍conditionalsubscript𝑋𝑌𝑍𝑍𝑌𝑃𝑍𝑃𝑌\displaystyle=\frac{\sum_{X_{Y\wedge Z}}P(X^{\perp}_{Z},X_{Y\wedge Z}\,|\,Z,Y)% P(Z)P(Y)}{\sum_{X_{Y\wedge Z},X^{\perp}_{Z}}P(X^{\perp}_{Z},X_{Y\wedge Z}\,|\,% Z,Y)P(Z)P(Y)}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_Y ∧ italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) italic_P ( italic_Z ) italic_P ( italic_Y ) end_ARG
=P(XZ|Z,Y)=P(XZ|Y)=g(XZ|Y)absent𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑍𝑌𝑃conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑔conditionalsubscriptsuperscript𝑋perpendicular-to𝑍𝑌\displaystyle=P(X^{\perp}_{Z}\,|\,Z,Y)=P(X^{\perp}_{Z}\,|\,Y)=g(X^{\perp}_{Z}% \,|\,Y)= italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Z , italic_Y ) = italic_P ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y ) = italic_g ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT | italic_Y )

The r.h.s is not a function of Z𝑍Zitalic_Z and therefore XZZ|Yperpendicular-toabsentperpendicular-tosubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y holds in Q𝑄Qitalic_Q. ∎

However, when considering the graph in Figure 1(b), we introduce a dependence between XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Z𝑍Zitalic_Z, which can be easily checked by the simulation Figure 8 in which we consider the simplified graph ZYX𝑍𝑌𝑋Z\rightarrow Y\leftarrow Xitalic_Z → italic_Y ← italic_X. While we are able to obtain the marginal dependence between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z (χ2:p=0.34:superscript𝜒2𝑝0.34\chi^{2}:p=0.34italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT : italic_p = 0.34), we introduce a dependence between X𝑋Xitalic_X and Z𝑍Zitalic_Z (χ2:p<0.0001:superscript𝜒2𝑝0.0001\chi^{2}:p<0.0001italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT : italic_p < 0.0001).

import numpy as np
import scipy
# Number of samples.
n = 10000
# Generate binary data with simple data generating model Z -> Y <- X
x = 1*(np.random.normal(size=n) > 0)
u = 1*(np.random.normal(size=n) > 0.3)
y = 1*(x - u + 0.5*np.random.normal(size=n) > 0.5)
z = 1*(u - 0.5*np.random.normal(size=n) > 0.1)
# Marginal of z.
p_z = np.array([np.mean(z==i) for i in z])
# Marginal of y.
p_y = np.array([np.mean(y==i) for i in y])
# Joint of z and y.
p_zy = np.array([np.mean((z==i)&(y==j)) for i, j in zip(z,y)])
# Resampling probabilities
indep_probs = p_z * p_y / p_zy
indep_probs /= np.sum(indep_probs)
# Re-sample according to computed probabilities
indeces = np.random.choice(n, size=n, replace=True, p=indep_probs)
z_bal, x_bal, y_bal = z[indeces], x[indeces], y[indeces]
# Check that Y and Z are independent
# Create contingency table.
contingency_table_bal_zy = scipy.stats.contingency.crosstab(z_bal,y_bal)
# Implement chi squared test.
statistic, pvalue, _, _ = scipy.stats.chi2_contingency(contingency_table_bal_zy)
# Check whether X and Z are independent
contingency_table_bal_xz = scipy.stats.contingency.crosstab(z_bal,x_bal)
statistic, pvalue, _, _ = scipy.stats.chi2_contingency(contingency_table_bal_xz)
Figure 8: Python code to assess the impact of balancing in a numerical simulation of graph Figure 1(b).

C.1.2 When does data-balancing together with regularization lead to fair models?

This section gives several results to analyze the combination of data balancing implemented to generate independence between outcomes Y𝑌Yitalic_Y and sensitive attributes Z𝑍Zitalic_Z and regularization in two variants. First, regularizing to learn representations W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) such that WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y; and second regularizing to learn representations W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) such that WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z. We write XZPYsubscriptperpendicular-toabsentperpendicular-to𝑃subscriptsuperscript𝑋perpendicular-to𝑍𝑌X^{\perp}_{Z}\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Yitalic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Y to state that XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and Y𝑌Yitalic_Y are independent in distribution P𝑃Pitalic_P.

Regularization such that ϕ(XZ)Z|Yperpendicular-toabsentperpendicular-toitalic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍conditional𝑍𝑌\phi(X^{\perp}_{Z})\mathrel{\perp\mspace{-10.0mu}\perp}Z\,|\,Yitalic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) start_RELOP ⟂ ⟂ end_RELOP italic_Z | italic_Y.
Proposition C.2 (Demographic parity).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y is sufficient for demographic parity, i.e. WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z.

Proof.
Q(W|Z)=YQ(W|Z,Y)Q(Y|Z)=(1)YQ(W|Y)Q(Y)=Q(W),𝑄conditional𝑊𝑍subscript𝑌𝑄conditional𝑊𝑍𝑌𝑄conditional𝑌𝑍superscript1subscript𝑌𝑄conditional𝑊𝑌𝑄𝑌𝑄𝑊\displaystyle Q(W\,|\,Z)=\sum_{Y}Q(W\,|\,Z,Y)Q(Y\,|\,Z)\stackrel{{\scriptstyle% (1)}}{{=}}\sum_{Y}Q(W\,|\,Y)Q(Y)=Q(W),italic_Q ( italic_W | italic_Z ) = ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_W | italic_Z , italic_Y ) italic_Q ( italic_Y | italic_Z ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( 1 ) end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_Q ( italic_W | italic_Y ) italic_Q ( italic_Y ) = italic_Q ( italic_W ) ,

where (1) holds by the assumption of balancing in which ZQYsubscriptperpendicular-toabsentperpendicular-to𝑄𝑍𝑌Z\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Yitalic_Z start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Y and regularization WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y. ∎

Proposition C.3 (Predictive parity).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y is sufficient for predictive parity, i.e. YQZ|Wsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌conditional𝑍𝑊Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Witalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_W.

Proof.
Q(Z|Y,W)𝑄conditional𝑍𝑌𝑊\displaystyle Q(Z\,|\,Y,W)italic_Q ( italic_Z | italic_Y , italic_W ) =Q(Z|Y)=Q(Z),absent𝑄conditional𝑍𝑌𝑄𝑍\displaystyle=Q(Z\,|\,Y)=Q(Z),= italic_Q ( italic_Z | italic_Y ) = italic_Q ( italic_Z ) ,

where both equalities hold by the assumption of balancing in which ZQYsubscriptperpendicular-toabsentperpendicular-to𝑄𝑍𝑌Z\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Yitalic_Z start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Y and regularization WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y. ∎

Proposition C.4 (Equalized odds).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y is sufficient for equalized odds, i.e. WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y.

Proof.

Regularization induces WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y and so equalized odds is satisfied by design. ∎

Remark: Note that balancing and regularization together are not always necessary, for example the section above shows that balancing on its own can be successful in some cases.

Regularization such that ϕ(XZ)Zperpendicular-toabsentperpendicular-toitalic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍𝑍\phi(X^{\perp}_{Z})\mathrel{\perp\mspace{-10.0mu}\perp}Zitalic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) start_RELOP ⟂ ⟂ end_RELOP italic_Z.
Proposition C.5 (Demographic parity).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z is sufficient for demographic parity, i.e. WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z.

Proof.

Regularization induces WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z and so demographic parity is satisfied by design. ∎

Proposition C.6 (Predictive parity).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z is not sufficient for predictive parity, i.e. YQZ|Wsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌conditional𝑍𝑊Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Witalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_W does not hold.

Proof.

We give a counter-example. Let A,B,C𝐴𝐵𝐶A,B,Citalic_A , italic_B , italic_C be three independent variables with values in {0,1}01\{0,1\}{ 0 , 1 }. Let XZ=𝟏{A=B},Y=𝟏{A=C},Z=𝟏{B=C}formulae-sequencesubscriptsuperscript𝑋perpendicular-to𝑍1𝐴𝐵formulae-sequence𝑌1𝐴𝐶𝑍1𝐵𝐶X^{\perp}_{Z}=\mathbf{1}\{A=B\},Y=\mathbf{1}\{A=C\},Z=\mathbf{1}\{B=C\}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT = bold_1 { italic_A = italic_B } , italic_Y = bold_1 { italic_A = italic_C } , italic_Z = bold_1 { italic_B = italic_C }. Let Q𝑄Qitalic_Q be a probability distribution over (XZ,Y,Z)subscriptsuperscript𝑋perpendicular-to𝑍𝑌𝑍(X^{\perp}_{Z},Y,Z)( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , italic_Y , italic_Z ). In particular, we could imagine Q𝑄Qitalic_Q to be generated after balancing and regularization since WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z and YQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑌𝑍Y\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z. However, conditioned on XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT, Y𝑌Yitalic_Y and Z𝑍Zitalic_Z determine each other and so predictive parity does not hold in Q𝑄Qitalic_Q. ∎

Proposition C.7 (Equalized odds).

Balancing and regularization such that W=ϕ(XZ)𝑊italic-ϕsubscriptsuperscript𝑋perpendicular-to𝑍W=\phi(X^{\perp}_{Z})italic_W = italic_ϕ ( italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) and WQZsubscriptperpendicular-toabsentperpendicular-to𝑄𝑊𝑍W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Zitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z is not sufficient for equalized odds, i.e. WQZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑄𝑊conditional𝑍𝑌W\mathrel{\perp\mspace{-10.0mu}\perp}_{Q}Z\,|\,Yitalic_W start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_Z | italic_Y does not hold.

Proof.

The counter-example above applies. ∎

Appendix D Experiments

D.1 Datasets

This work uses the MNIST [44, 17, http://yann.lecun.com/exdb/mnist/], Amazon reviews [52], ImageNet [16, https://image-net.org/] and CelebA [45, http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html] datasets, which are all openly accessible and can be used for research purposes.

MNIST semi-synthetic data: For simplicity, we binarize the digit recognition task to a label Y{0,1}𝑌01Y\in\{0,1\}italic_Y ∈ { 0 , 1 } according to whether the number in the image is <5absent5<5< 5 or 5absent5\geq 5≥ 5 such that Y𝑌Yitalic_Y matches the ground truth with probability 0.980.980.980.98. The top of the image is replaced by noise coloured in red for Z=0𝑍0Z=0italic_Z = 0 and blue for Z=1𝑍1Z=1italic_Z = 1 (see Figure 2). We can relate the confounder and the label such that 95%percent9595\%95 % (resp. 5%percent55\%5 %) of images with Y=0𝑌0Y=0italic_Y = 0 have a red (resp. blue) noise pattern, while 10%percent1010\%10 % (resp. 90%percent9090\%90 %) of the images with Y=1𝑌1Y=1italic_Y = 1 have a red (resp. blue) pattern, corresponding to our original distribution P𝑃Pitalic_P. In this distribution, the marginal distributions of Y𝑌Yitalic_Y and Z𝑍Zitalic_Z are (close to) uniform. We sample n=30,000𝑛30000n=30,000italic_n = 30 , 000 samples from P𝑃Pitalic_P, as well as a dataset jointly balanced on Y𝑌Yitalic_Y and Z𝑍Zitalic_Z (Q𝑄Qitalic_Q, n=30,000𝑛30000n=30,000italic_n = 30 , 000). We also sample test data based on a ground truth P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT generated with P0(Z=0|Y)=0.5superscript𝑃0𝑍conditional0𝑌0.5P^{0}(Z=0|Y)=0.5italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_Z = 0 | italic_Y ) = 0.5 (n=2,000𝑛2000n=2,000italic_n = 2 , 000). Finally, we generate an XZsubscriptsuperscript𝑋perpendicular-to𝑍X^{\perp}_{Z}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT dataset that contains white instead of colored noise.

MNIST semi-synthetic data with added confounder: We add V𝑉Vitalic_V and XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT to our data generating process where XVsubscript𝑋𝑉X_{V}italic_X start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT is a green cross either on the left or right of the image, with a fixed vertical position. The horizontal position of the cross is given by V𝑉Vitalic_V and V𝑉Vitalic_V is correlated with Y𝑌Yitalic_Y (P(V=0|Y=0)=0.2𝑃𝑉conditional0𝑌00.2P(V=0|Y=0)=0.2italic_P ( italic_V = 0 | italic_Y = 0 ) = 0.2, P(V=0|Y=1)=0.9𝑃𝑉conditional0𝑌10.9P(V=0|Y=1)=0.9italic_P ( italic_V = 0 | italic_Y = 1 ) = 0.9). We generate a confounded dataset (95/10) as previously, which we balance jointly on Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. We then train 5 replicates of the same architecture, and test our model on Q𝑄Qitalic_Q, as well as on the ground truth P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT where P(V=0|Y=0)=P(V=0|Y=1)=P(Z=0|Y=0)=P(Z=0|Y=1)=0.5𝑃𝑉conditional0𝑌0𝑃𝑉conditional0𝑌1𝑃𝑍conditional0𝑌0𝑃𝑍conditional0𝑌10.5P(V=0|Y=0)=P(V=0|Y=1)=P(Z=0|Y=0)=P(Z=0|Y=1)=0.5italic_P ( italic_V = 0 | italic_Y = 0 ) = italic_P ( italic_V = 0 | italic_Y = 1 ) = italic_P ( italic_Z = 0 | italic_Y = 0 ) = italic_P ( italic_Z = 0 | italic_Y = 1 ) = 0.5.

MNIST semi-synthetic data, entangled: We define the color of the noise based on an OR(Y,Z)OR𝑌𝑍\textsc{OR}(Y,Z)OR ( italic_Y , italic_Z ). We define Q𝑄Qitalic_Q by generating samples with P(Z=0|Y=0)=P(Z=0|Y=1)=0.5𝑃𝑍conditional0𝑌0𝑃𝑍conditional0𝑌10.5P(Z=0|Y=0)=P(Z=0|Y=1)=0.5italic_P ( italic_Z = 0 | italic_Y = 0 ) = italic_P ( italic_Z = 0 | italic_Y = 1 ) = 0.5, while P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is represented by the disentangled test dataset described above.

Amazon reviews with confounder: We refer to Veitch et al. [73] and define a causal task based on Amazon reviews for the clothing category which predicts whether the review was found to be helpful (i.e. obtained ‘thumbs up’ votes) or not based on the review’s text. We generate a random variable U𝑈Uitalic_U as the unobserved confounder, and define Y𝑌Yitalic_Y as the binary helpfulness label, randomly flip** the label based on U𝑈Uitalic_U (association: p=0.4). This leads to reviews with Y=0𝑌0Y=0italic_Y = 0 being more associated with U=0𝑈0U=0italic_U = 0. We define Z𝑍Zitalic_Z as Z=λU+(1λ)U2𝑍𝜆𝑈1𝜆subscript𝑈2Z=\lambda*U+(1-\lambda)*U_{2}italic_Z = italic_λ ∗ italic_U + ( 1 - italic_λ ) ∗ italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is another random variable distributed uniformly and λ𝜆\lambdaitalic_λ is a parameter that controls the relationship between U𝑈Uitalic_U and Z𝑍Zitalic_Z, and by transitivity, between Z𝑍Zitalic_Z and Y𝑌Yitalic_Y. In P𝑃Pitalic_P, λ𝜆\lambdaitalic_λ is selected to be 0.8, leading to a correlation of 0.35 between Y𝑌Yitalic_Y and Z𝑍Zitalic_Z. To create XYsubscriptsuperscript𝑋perpendicular-to𝑌X^{\perp}_{Y}italic_X start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, we add perturbations to the text based on the value of Z𝑍Zitalic_Z that wouldn’t (in theory) affect Y𝑌Yitalic_Y. We select the words {and, the, you, my, they} and add a suffix ‘xxxx’ (resp. ‘yyyy’) when Z=0𝑍0Z=0italic_Z = 0 (resp. Z=1𝑍1Z=1italic_Z = 1). Finally, Y𝑌Yitalic_Y is imbalanced, with only 5%percent55\%5 % of the dataset with Y=1𝑌1Y=1italic_Y = 1. We hence re-balance the classes before the modelling. This operation is also performed by the joint balancing.

D.2 Metric definitions and operationalization

Our work focuses on statistical group fairness criteria [5]. These can be translated as independence criteria on the model’s predictions.

Definition D.1 (Demographic parity).

A predictor f(X)𝑓𝑋f(X)italic_f ( italic_X ) is said to satisfy demographic parity w.r.t. sensitive attribute Z𝑍Zitalic_Z and distribution P𝑃Pitalic_P if f(X)PZsubscriptperpendicular-toabsentperpendicular-to𝑃𝑓𝑋𝑍f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Zitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z.

Definition D.2 (Predictive parity).

A predictor f(X)𝑓𝑋f(X)italic_f ( italic_X ) trained to predict an outcome Y𝑌Yitalic_Y is said to satisfy predictive parity w.r.t. sensitive attribute Z𝑍Zitalic_Z and distribution P𝑃Pitalic_P if YPZ|f(X)subscriptperpendicular-toabsentperpendicular-to𝑃𝑌conditional𝑍𝑓𝑋Y\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,f(X)italic_Y start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_f ( italic_X ).

Definition D.3 (Equalized odds).

A predictor f(X)𝑓𝑋f(X)italic_f ( italic_X ) trained to predict an outcome Y𝑌Yitalic_Y is said to satisfy equalized odds w.r.t. a sensitive attribute Z𝑍Zitalic_Z and distribution P𝑃Pitalic_P if f(X)PZ|Ysubscriptperpendicular-toabsentperpendicular-to𝑃𝑓𝑋conditional𝑍𝑌f(X)\mathrel{\perp\mspace{-10.0mu}\perp}_{P}Z\,|\,Yitalic_f ( italic_X ) start_RELOP ⟂ ⟂ end_RELOP start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_Z | italic_Y.

In our experiments, we estimate equalized odds as in Alabdulmohsin & Lučić [1]. For this metric, the lower, the better.

EO𝐸𝑂\displaystyle EOitalic_E italic_O =0.5maxz𝒵𝔼X[f(X)|Z=z,Y=0]minz𝒵𝔼X[f(X)|Z=z,Y=0]absent0.5subscript𝑧𝒵subscript𝔼𝑋delimited-[]formulae-sequenceconditional𝑓𝑋𝑍𝑧𝑌0subscript𝑧𝒵subscript𝔼𝑋delimited-[]formulae-sequenceconditional𝑓𝑋𝑍𝑧𝑌0\displaystyle=0.5*\max_{z\in\mathcal{Z}}\,\mathbb{E}_{X}[f(X)\,|\,Z=z,Y=0]\,-% \,\min_{z\in\mathcal{Z}}\,\mathbb{E}_{X}[f(X)\,|\,Z=z,Y=0]= 0.5 ∗ roman_max start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_f ( italic_X ) | italic_Z = italic_z , italic_Y = 0 ] - roman_min start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_f ( italic_X ) | italic_Z = italic_z , italic_Y = 0 ]
+0.5maxz𝒵𝔼X[f(X)|Z=z,Y=1]minz𝒵𝔼X[f(X)|Z=z,Y=1].0.5subscript𝑧𝒵subscript𝔼𝑋delimited-[]formulae-sequenceconditional𝑓𝑋𝑍𝑧𝑌1subscript𝑧𝒵subscript𝔼𝑋delimited-[]formulae-sequenceconditional𝑓𝑋𝑍𝑧𝑌1\displaystyle+0.5*\max_{z\in\mathcal{Z}}\,\mathbb{E}_{X}[f(X)\,|\,Z=z,Y=1]\,-% \,\min_{z\in\mathcal{Z}}\,\mathbb{E}_{X}[f(X)\,|\,Z=z,Y=1].+ 0.5 ∗ roman_max start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_f ( italic_X ) | italic_Z = italic_z , italic_Y = 1 ] - roman_min start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_f ( italic_X ) | italic_Z = italic_z , italic_Y = 1 ] .

In terms of robustness metrics, we evaluate a simplified version of risk-invariance by computing model performance on a test set sampled from P𝑃Pitalic_P, and contrasting this result with the model’s performance on a test set sampled from P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (when known), or from Q𝑄Qitalic_Q. We also estimate worst-group performance [63] as:

WG=minz𝒵𝔼X,y[𝟙[f(X)=y]|z=z]𝑊𝐺subscriptsuperscript𝑧𝒵subscript𝔼𝑋𝑦conditional1delimited-[]𝑓𝑋𝑦𝑧superscript𝑧WG=\min_{z^{\prime}\in\mathcal{Z}}\,\operatorname{\mathbb{E}}_{X,y}[\mathbbm{1% }[f(X)=y]\,|\,z=z^{\prime}]italic_W italic_G = roman_min start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Z end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_X , italic_y end_POSTSUBSCRIPT [ blackboard_1 [ italic_f ( italic_X ) = italic_y ] | italic_z = italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ]

An invariant model that is optimal would hence display high performance on both P𝑃Pitalic_P and P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT/Q𝑄Qitalic_Q, as well as high worst-group accuracy.

Metrics like risk-invariance or equalized odds provide insights on the model’s outputs, but do not probe the model’s representation. As we are interested in large-scale models that might be further fine-tuned, it is important to understand whether the model’s representation is invariant on 𝒫𝒫\mathcal{P}caligraphic_P. Defining a representation as ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ), we can write f(X)=h(ϕ(X))𝑓𝑋italic-ϕ𝑋f(X)=h(\phi(X))italic_f ( italic_X ) = italic_h ( italic_ϕ ( italic_X ) ) in which we assume the representation to be fixed (i.e. frozen model weights) and hhitalic_h is a learnable function. In Zemel et al. [80], the authors define a fair representation w.r.t. a binary Z𝑍Zitalic_Z as demographic parity on the representation:

𝔼XXZ=zϕ(X)=𝔼XXZ=zϕ(X),z,z𝒵,formulae-sequencesubscript𝔼𝑋superscript𝑋𝑍𝑧italic-ϕ𝑋subscript𝔼𝑋superscript𝑋𝑍superscript𝑧italic-ϕ𝑋for-all𝑧superscript𝑧𝒵\operatorname{\mathbb{E}}_{X\in X^{Z=z}}\phi(X)=\operatorname{\mathbb{E}}_{X% \in X^{Z=z^{\prime}}}\phi(X),\forall z,z^{\prime}\in\mathcal{Z},blackboard_E start_POSTSUBSCRIPT italic_X ∈ italic_X start_POSTSUPERSCRIPT italic_Z = italic_z end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ϕ ( italic_X ) = blackboard_E start_POSTSUBSCRIPT italic_X ∈ italic_X start_POSTSUPERSCRIPT italic_Z = italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ϕ ( italic_X ) , ∀ italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Z ,

where XZ=zsuperscript𝑋𝑍𝑧X^{Z=z}italic_X start_POSTSUPERSCRIPT italic_Z = italic_z end_POSTSUPERSCRIPT corresponds to the samples with Z=z𝑍𝑧Z=zitalic_Z = italic_z. This is equivalent to assessing the ‘encoding’ of Z𝑍Zitalic_Z in ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ), by training a linear layer h:ϕ(X)Z:italic-ϕ𝑋𝑍h:\phi(X)\rightarrow Zitalic_h : italic_ϕ ( italic_X ) → italic_Z [27, 8]. Chance level performance of h(ϕ(X))italic-ϕ𝑋h(\phi(X))italic_h ( italic_ϕ ( italic_X ) ) would then suggest that the representation is independent of Z𝑍Zitalic_Z. In the present work, we estimate the encoding of Z𝑍Zitalic_Z using P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT or Q𝑄Qitalic_Q such that assessing the encoding of Z𝑍Zitalic_Z is equivalent to assessing the encoding of Z|Yconditional𝑍𝑌Z|Yitalic_Z | italic_Y. Models that encode less of the auxiliary factor Z𝑍Zitalic_Z have been shown to reach a more ‘global’ optimum compared to models that encode the signal more strongly [independently of whether invariant predictions are obtained 79].

D.3 Model architectures

We consider multiple architectures in this work, with an attempt to cover different model sizes and characteristics.

  • Small convolutional network, similar in spirit to AlexNet [42]. It includes 5 convolution blocks with kernel sizes (4, 3, 2, 2, 2, 2) and output channels (3, 6, 9, 12, 12, 9), with max pooling after each convolution, as well as two dense layers with Relu non-linearity before the output head.

  • VGG network [67] with square kernels of size 3, output channels of dimensions (64, 64, 128, 128, 128, 256, 256, 256, 512, 512, 512) and strides (1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1).

  • Vision Transformers [18] of different sizes: ViT-micro (17M parameters), ViT-Tiny (44M), ViT-S (174M) and ViT-B (690M), with the Tiny sizes and up taken from [72].

  • For text data, we use the BERT architecture, as defined in TensorFlow Hub.

We use a stochastic gradient descent optimizer with Nesterov momentum of 0.90.90.90.9 for all models.

D.3.1 Hyper-parameter searches

We include a hyper-parameter search over the learning rate (5 values in log-scale between 9e59𝑒59e-59 italic_e - 5 and 0.10.10.10.1) coupled with a batch size search between sizes of 128, 256 and 512 examples. In terms of regularization, the small convolutional network include dropout in the dense layers (search on 0.1, 0.2, 0.3), while VGG includes batch normalization in the dense layers (as per their original implementations). We impose an L2-regularization of 1e41𝑒41e-41 italic_e - 4 during training for all architectures.

We note that hyper-parameters did not seem to make a difference on the MNIST results. For VGG, there was a larger variation, as well as a larger variance across multiple seeds.

When performing MMD conditional regularization, we vary the strength of the regularizer in [0.0,0.1,0.2,0.5,1.,2.,3.,4.,5.,6.,7.,8.,9.,10.][0.0,0.1,0.2,0.5,1.,2.,3.,4.,5.,6.,7.,8.,9.,10.][ 0.0 , 0.1 , 0.2 , 0.5 , 1 . , 2 . , 3 . , 4 . , 5 . , 6 . , 7 . , 8 . , 9 . , 10 . ], with 5 replicates for each value. To minimize computational expenses, we fix the learning rate to 0.0010.0010.0010.001, dropout rate to 0.10.10.10.1 and batch size to 64646464 (for downsampled datasets) or 256256256256.

D.4 Assets, code and resources

We use the BERT model bert_en_uncased_L-12_H768_A-12 from TensorFlow Hub. All other models are trained from scratch in our code infrastructure written in Python and JAX [7]. The results are then analyzed with Python and the numpy [30], matplotlib [32, https://matplotlib.org/] and pandas [49, https://pandas.pydata.org/] packages. For the small convolutional networks, training was performed with 4 GPUs (V100) and evaluation used 1 GPU per model instance. BERT used 2 Tensor Processing Units (TPUs) for training and 1 TPU for evaluation. For all other models, we used 4 Tensor Processing Units for training and 1 TPU or GPU (P100) for evaluation. We note that, apart from ViT-B and BERT, all experiments could be run on CPU.

Appendix E Results

E.1 Failure modes of data balancing with MNIST

Other confounder

We notice that correlation between V𝑉Vitalic_V and Z𝑍Zitalic_Z in Q𝑄Qitalic_Q is decreased (ρ=0.16𝜌0.16\rho=-0.16italic_ρ = - 0.16) compared to P𝑃Pitalic_P (ρ=0.60𝜌0.60\rho=-0.60italic_ρ = - 0.60) but is not null. In addition, we observe that the model relies on V𝑉Vitalic_V (accuracy on Q𝑄Qitalic_Q: 0.769±0.008plus-or-minus0.7690.0080.769\pm 0.0080.769 ± 0.008, on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT: 0.647±0.023plus-or-minus0.6470.0230.647\pm 0.0230.647 ± 0.023). As a consequence, models trained on Q𝑄Qitalic_Q display a bias w.r.t. Z𝑍Zitalic_Z (see equalized odds and worst group performance).

Entangled signals

During training, the model reaches 0.903±0.011plus-or-minus0.9030.0110.903\pm 0.0110.903 ± 0.011 accuracy on Q𝑄Qitalic_Q, but only 0.672±0.004plus-or-minus0.6720.0040.672\pm 0.0040.672 ± 0.004 accuracy on P0superscript𝑃0P^{0}italic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Worst-group accuracy is low and equalized odds high, displaying a failure mode of data balancing.

E.2 Celeb-A

E.2.1 Model performance

Model encoding and performance across different model sizes is displayed in Figure 9. We show that all models trained on the subsampled data display an encoding of the auxiliary factor Z𝑍Zitalic_Z.

Refer to caption
Refer to caption
Figure 9: Model encoding, accuracy, worst group accuracy and equalized odds for the VGG architecture, and different sizes of ViT (m: micro, Ti: tiny, S, B) when trained with balanced CelebA data. Each dot is a model replicate, while the dashed line represents the average across replicates.

E.2.2 Distinguishing between failure modes

Correlation patterns in balanced data We plot the Pearson correlation between Y𝑌Yitalic_Y and all other available attributes (39 in CelebA) in Figure 10 (left), and similarly for Z𝑍Zitalic_Z (right). We note that the correlation that increases most when balancing the data is between Y𝑌Yitalic_Y and the ‘black hair’ label. As this label has a low correlation with Z𝑍Zitalic_Z, this does not seem problematic. We also observe smaller changes in attributes related to hair (‘bushy-eyebrows’, ‘bald’) and accessories (‘wearing-hat’).

Refer to caption
Refer to caption
Figure 10: Pearson correlation between each attribute and Y𝑌Yitalic_Y (left), or Z𝑍Zitalic_Z (right) in a sample of the original data (teal), compared to a balanced sample (blue) of the training data.