CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests

Susanne Dandl,1{}^{\;\,,1}start_FLOATSUPERSCRIPT , 1 end_FLOATSUPERSCRIPT, Kristin Blesch*,2,3, Timo Freiesleben*,5, Gunnar König*,6,
Jan Kapar2,3, Bernd Bischl1, and Marvin N. Wright 3,4,5
Equal contribution as first authors.
   1Munich Center for Machine Learning (MCML) and Department of Statistics, LMU Munich    2Leibniz Institute for Prevention Research & Epidemiology – BIPS    3Faculty of Mathematics and Computer Science, University of Bremen    4Department of Public Health, University of Copenhagen    5Cluster: Machine Learning for Science, University of Tübingen    6Tübingen AI Center and University of Tübingen
[email protected]
Abstract

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

Keywords counterfactual explanations  \cdot explainable artificial intelligence  \cdot interpretable machine learning  \cdot adversarial random forest  \cdot tabular data  \cdot plausibility  \cdot model-agnostic.

1 Introduction

Machine learning (ML) algorithms are increasingly used in high-stakes scenarios. For example, they help to decide whether you receive a loan, if you are suitable for a job, or even which disease you are diagnosed with. While ML-based systems are powerful at detecting complex patterns in data, the reasoning behind their predictions is often not easy to discern for humans. Many ML models are black boxes with a complex mathematical structure that do not follow transparent logical rules [1].

The emerging field of interpretable machine learning (IML) (also known as explainable artificial intelligence or XAI for short) promises to open up these black boxes and aims to make the decisions of ML models transparent to humans (see [2, 3] for overviews). A particularly simple approach is to explain algorithmic decisions to end-users via so-called counterfactual explanations [4].

  • Example: Imagine you apply for a loan. You enter characteristics such as your age, salary, loan amount, etc. in the online application form and after a few seconds you receive the decision – your loan application has been denied. A counterfactual explanation could be: If your salary had been €5,000 higher, your loan would have been approved.

More generally, a counterfactual explanation points to a close alternative scenario (the so-called counterfactual) that, in contrast to the actual scenario, would have resulted in the desired outcome. Counterfactual explanations may be employed for various purposes, such as hel** to guide a person’s actions [5, 6], enabling them to contest adverse decisions [7], and providing insights into the decision behavior of the model [8]. For all these goals, counterfactuals must be plausible, which means the alternative scenarios they depict are realistic. For instance, in the example above, suggesting a negative loan amount or a real estate loan with an amount of €500 would not be very plausible counterfactuals.

When adding plausibility as another objective for generating counterfactuals, its trade-off with proximity, i.e., that the counterfactual is close to the point of interest, should be taken into account. Dandl et al. [9] were one of the first to address this trade-off by framing the counterfactual search as a multi-objective optimization problem. Their approach – multi-objective counterfactual explanations (MOC) – returns not just a single counterfactual, but a Pareto set of counterfactuals, which is advisable to account for the Rashomon effect, i.e., that multiple, diverse, equally good counterfactuals may exist [10].

An intuitive approach to plausibility is searching for only those counterfactuals that are close to actual instances in the dataset [11]. To operationalize this goal, one objective in MOC minimizes the distance between counterfactuals and the actual instances. However, as presented in Section 3.1, this approach has its limitations if, for example, there are low-density gaps close to 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT between high-density regions. Other approaches model plausibility via the joint probability density. They rely on computationally intensive neural network architectures such as variational autoencoders (VAEs) [12, 13, 14, 15] or generative adversarial networks (GANs) [16, 17]. While these architectures have merits for high-dimensional tensor data (e.g., images or text), they are less suited for tabular data (see our discussion in Section 3.2).

Contributions

We leverage a tree-based technique from generative modeling called adversarial random forests (ARF) [18] to generate plausible counterfactuals in a mixed (i.e., categorical and continuous) tabular data setting. We call these countARFactuals and propose two model-agnostic algorithms to generate them:

  1. 1.

    We integrate ARF into the multi-objective counterfactual explanation (MOC) framework [9] to speed up the counterfactual search and find more plausible counterfactuals (see Section 4.1).

  2. 2.

    We tailor ARF to directly generate plausible counterfactuals without an optimization algorithm (see Section 4.2).

A simulation study shows the advantages in plausibility and efficiency of our ARF-based approaches compared to competing methods (Section 5). Moreover, we apply our method on a real-world dataset, namely to explain coffee quality predictions (Section 6).

2 Related Work

There is widespread agreement in the counterfactual community that plausibility is an important concern [19, 11, 5, 20, 21, 22]. Various suggestions have been made to incorporate plausibility into the counterfactual search, for example using causal knowledge [6, 14], case-based reasoning [23], outlier detectors [24], restricting the search space [25], imputing feature combinations from real instances [26], respecting paths between datapoints [27], or, as described above, staying close to the training data [9].

Many define plausibility theoretically through the joint probability density [22]. Some works rely on VAEs or standard autoencoders: they directly generate counterfactuals [14, 15], use VAEs in the optimization [13] or just for measuring plausibility [12]. Other works rely on GANs to generate counterfactuals [17, 16]. However, these approaches differ substantially from our work, as they are tailored for neural network models [14], focus only on plausibility thereby ignoring other objectives like sparsity [14, 15] (see Section 3.1), or work only for continuous data [13, 16]. The closest works to ours are Brughmans et al. [12] and Dandl et al. [9]. Both are designed to generate plausible and sparse counterfactuals in mixed tabular data settings. Brughmans et al. [12] use the autoencoder reconstruction loss as a plausibility measure and Dandl et al. [9] use the distance to the klimit-from𝑘k-italic_k -nearest neighbors to evaluate plausibility. We show in our experiments in Section 5 that utilizing ARF to generate counterfactuals improves plausibility compared to those approaches while being computationally fast.

3 Background

Before we present our approaches, we provide background on the two methods we build upon: multi-objective counterfactual explanations (MOC) [9] and adversarial random forests (ARF) [18].

We consider a supervised learning setup with a binary classification or regression problem.111Our framework also generalizes to multi-class problems; we restrict ourselves here only for the sake of simplicity and notation. 𝒳𝒳\mathcal{X}caligraphic_X denotes a p𝑝pitalic_p-dimensional feature space. The respective vector 𝐗:=(X1,,Xp)Tassign𝐗superscriptsubscript𝑋1subscript𝑋𝑝𝑇\mathbf{X}:=(X_{1},\dots,X_{p})^{T}bold_X := ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT of random variables may contain both continuous and categorical features. With Y𝑌Y\in\mathbb{R}italic_Y ∈ blackboard_R, we denote a random variable reflecting the outcome. In case of a binary classification model, we restrict Y𝑌Yitalic_Y to {0,1}01\{0,1\}{ 0 , 1 }.

To predict Y𝑌Yitalic_Y from 𝐗𝐗\mathbf{X}bold_X, we trained an ML model f^:𝒳:^𝑓𝒳\hat{f}:\mathcal{X}\rightarrow\mathbb{R}over^ start_ARG italic_f end_ARG : caligraphic_X → blackboard_R on a dataset Dtrain:={(𝐱(1),y(1)),,(𝐱(ntrain),y(ntrain))}assignsubscript𝐷trainsuperscript𝐱1superscript𝑦1superscript𝐱subscript𝑛trainsuperscript𝑦subscript𝑛trainD_{\text{train}}:=\{(\mathbf{x}^{(1)},y^{(1)}),\dots,(\mathbf{x}^{(n_{\text{% train}})},y^{(n_{\text{train}})})\}italic_D start_POSTSUBSCRIPT train end_POSTSUBSCRIPT := { ( bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) , … , ( bold_x start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) } with ntrainsubscript𝑛trainn_{\text{train}}italic_n start_POSTSUBSCRIPT train end_POSTSUBSCRIPT observations. For binary classification, the model output is restricted to f^(𝐱)[0,1]^𝑓𝐱01\hat{f}(\mathbf{x})\in[0,1]over^ start_ARG italic_f end_ARG ( bold_x ) ∈ [ 0 , 1 ], reflecting the probability for Y=1𝑌1Y=1italic_Y = 1. Most counterfactual explanation methods require access to a dataset for generating counterfactuals. To reflect that this dataset can differ to Dtrainsubscript𝐷trainD_{\text{train}}italic_D start_POSTSUBSCRIPT train end_POSTSUBSCRIPT, we denote it as D𝐷Ditalic_D in the following and assume it to consist of n𝑛nitalic_n observations.

3.1 Multi-objective counterfactual explanations

Suppose we want to explain why a certain data point of interest 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT was predicted as f^(𝐱)^𝑓superscript𝐱\hat{f}(\mathbf{x}^{*})over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) instead of a desired prediction within Ydessubscript𝑌𝑑𝑒𝑠Y_{des}\subset\mathbb{R}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ⊂ blackboard_R. Wachter et al. [4] define counterfactuals as the closest possible input vector 𝐱cfsuperscript𝐱𝑐𝑓\mathbf{x}^{cf}bold_x start_POSTSUPERSCRIPT italic_c italic_f end_POSTSUPERSCRIPT to 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT according to some distance on 𝒳𝒳\mathcal{X}caligraphic_X such that f^(𝐱cf)Ydes^𝑓superscript𝐱𝑐𝑓subscript𝑌𝑑𝑒𝑠\hat{f}(\mathbf{x}^{cf})\in Y_{des}over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUPERSCRIPT italic_c italic_f end_POSTSUPERSCRIPT ) ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. This definition does not explicitly demand sparse or plausible changes. When integrating all these desiderata into an objective to generate counterfactuals, trade-offs between the different objectives must be taken into account since the objectives conflict each other. Figure 1(a) illustrates this for the properties plausibility and proximity to the original instance 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. If all high-density regions are far away from the decision boundary, enforcing proximity leads to unrealistic counterfactuals.

Refer to caption
(a) Plausibility-proximity trade-off
Refer to caption
(b) Limitation of MOC’s plausibility
Figure 1: (a) Proximity and plausibility can be conflicting objectives [9]; enforcing proximity may lead to unrealistic counterfactuals. (b) To have high proximity (i.e., low oproxsubscript𝑜proxo_{\text{prox}}italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT in Equation 2) and high plausibility (i.e., low oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT in Equation 3, with k=1𝑘1k=1italic_k = 1), the counterfactual may be in a low-density region.

To consider these trade-offs, Dandl et al. [9] turned the search for counterfactuals into a multi-objective optimization problem:

𝐱cfargmin𝐱𝒳(ovalid(f^(𝐱),Ydes),oprox(𝐱,𝐱),oplaus(𝐱,D),osparse(𝐱,𝐱)).superscript𝐱𝑐𝑓𝐱𝒳argminsubscript𝑜valid^𝑓𝐱subscript𝑌𝑑𝑒𝑠subscript𝑜prox𝐱superscript𝐱subscript𝑜plaus𝐱𝐷subscript𝑜sparse𝐱superscript𝐱\mathbf{x}^{cf}\in\underset{\mathbf{x}\in\mathcal{X}}{\operatorname*{arg\,min}% }\left(o_{\text{valid}}(\hat{f}(\mathbf{x}),Y_{des}),o_{\text{prox}}(\mathbf{x% },\mathbf{x}^{*}),o_{\text{plaus}}(\mathbf{x},D),o_{\text{sparse}}(\mathbf{x},% \mathbf{x}^{*})\right).bold_x start_POSTSUPERSCRIPT italic_c italic_f end_POSTSUPERSCRIPT ∈ start_UNDERACCENT bold_x ∈ caligraphic_X end_UNDERACCENT start_ARG roman_arg roman_min end_ARG ( italic_o start_POSTSUBSCRIPT valid end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG ( bold_x ) , italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) , italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT ( bold_x , italic_D ) , italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) . (1)

The different objectives denote:

  1. 1.

    Validity: Counterfactuals should have a predicted outcome in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT

    ovalid(f^(𝐱),Ydes):=infyYdes|f^(𝐱)y|.assignsubscript𝑜valid^𝑓𝐱subscript𝑌𝑑𝑒𝑠𝑦subscript𝑌𝑑𝑒𝑠infimum^𝑓𝐱𝑦o_{\text{valid}}(\hat{f}(\mathbf{x}),Y_{des}):=\underset{y\in Y_{des}}{\inf}|% \hat{f}(\mathbf{x})-y|.italic_o start_POSTSUBSCRIPT valid end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG ( bold_x ) , italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT ) := start_UNDERACCENT italic_y ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT end_UNDERACCENT start_ARG roman_inf end_ARG | over^ start_ARG italic_f end_ARG ( bold_x ) - italic_y | .
  2. 2.

    Proximity: Counterfactuals should be close to 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT according to the Gower distance dGowersubscript𝑑Gowerd_{\text{Gower}}italic_d start_POSTSUBSCRIPT Gower end_POSTSUBSCRIPT [28]

    oprox(𝐱,𝐱):=dGower(𝐱,𝐱).assignsubscript𝑜prox𝐱superscript𝐱subscript𝑑Gower𝐱superscript𝐱o_{\text{prox}}(\mathbf{x},\mathbf{x}^{*}):=d_{\text{Gower}}(\mathbf{x},% \mathbf{x}^{*}).italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := italic_d start_POSTSUBSCRIPT Gower end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (2)
  3. 3.

    Plausibility: Counterfactuals should describe a realistic data instance, with 𝐱[1],,𝐱[k]superscript𝐱delimited-[]1superscript𝐱delimited-[]𝑘\mathbf{x}^{[1]},\dots,\mathbf{x}^{[k]}bold_x start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , … , bold_x start_POSTSUPERSCRIPT [ italic_k ] end_POSTSUPERSCRIPT indicating the klimit-from𝑘k-italic_k -nearest neighbors to 𝐱𝐱\mathbf{x}bold_x within data D𝐷Ditalic_D and wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denoting weights with i=1kwi=1superscriptsubscript𝑖1𝑘subscript𝑤𝑖1\sum_{i=1}^{k}w_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1

    oplaus(𝐱,D):=i=1kwidGower(𝐱,𝐱[i]).assignsubscript𝑜plaus𝐱𝐷superscriptsubscript𝑖1𝑘subscript𝑤𝑖subscript𝑑Gower𝐱superscript𝐱delimited-[]𝑖o_{\text{plaus}}(\mathbf{x},D):=\sum\limits_{i=1}^{k}w_{i}d_{\text{Gower}}(% \mathbf{x},\mathbf{x}^{[i]}).italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT ( bold_x , italic_D ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT Gower end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) . (3)
  4. 4.

    Sparsity: Counterfactuals should vary from 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in only a few features

    osparse(𝐱,𝐱):=𝐱𝐱0=1pj=1p𝟙xjxj.assignsubscript𝑜sparse𝐱superscript𝐱subscriptnorm𝐱superscript𝐱01𝑝superscriptsubscript𝑗1𝑝subscript1subscript𝑥𝑗superscriptsubscript𝑥𝑗o_{\text{sparse}}(\mathbf{x},\mathbf{x}^{*}):=\|\mathbf{x}-\mathbf{x}^{*}\|_{0% }=\frac{1}{p}\sum\limits_{j=1}^{p}\mathbbm{1}_{x_{j}\neq x_{j}^{*}}.italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := ∥ bold_x - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT .

Dandl et al. [9] adapted the nondominated sorting genetic algorithm or short NSGA-II of Deb et al. [29] to solve the multi-objective optimization problem. This algorithm follows three steps:

  1. 1.

    It generates a set of candidate instances close to the point of interest 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Among these, it recombines and mutates the candidates that perform best according to the above criteria. Per default, the mutator does not take feature dependencies into account. To enhance plausibility, mutation can be optionally performed by sampling from conditional distributions learned on D𝐷Ditalic_D by conditional trees [30] – we refer to this MOC version as MOCCTREE.

  2. 2.

    Both new and old candidates are ranked using nondominated and crowding distance sorting. Nondominated sorting ranks according to optimality with respect to the above objectives (with the option to penalize invalid counterfactuals) and crowding distance ranks according to diversity.

  3. 3.

    Based on these rankings, optimal and diverse candidates are selected for the next iteration. The search for counterfactuals ends after either a fixed number of predefined iterations or when the generated counterfactuals are not significantly better according to the hypervolume of the objectives above. As a final step, the algorithm outputs the Pareto optimal set of counterfactuals over the generations.

The conceptualization of plausibility as Equation 3 has its limitations as, e.g., illustrated in Figure 1(b): With k=1𝑘1k=1italic_k = 1 (the default in MOC), counterfactuals with low values in Equation 3 might still end up in low-density regions.

3.2 Generative modeling and adversarial random forests

Generative modeling is concerned with models that generate synthetic data D~~𝐷\tilde{D}over~ start_ARG italic_D end_ARG that mimic the appearance of real data D𝐷Ditalic_D. A well-known approach are VAEs [31], which encode original data instances into a set of low-dimensional distribution parameters and then reconstruct these instances with a decoder neural network from samples of these distributions. Another common technique are GANs [32], where two different neural network models play a zero-sum game – the generator network aims to generate realistic instances, and the discriminator network aims to discriminate these instances from real data. Other generative models based on neural networks include normalizing flows [33], diffusion probabilistic models [34] and transformer-based models [35] (see [36, 37] for overviews). While there exist adaptions of neural network models to tabular data, tree-based approaches may be better suited [38, 39, 40].

ARFs are a tree-based procedure for generative modeling [18]. The ARF approach is similar to the approach of GANs, however, instead of a neural network as a base learner, ARF relies on random forests. An ARF is trained in three steps: (1) Fitting an unsupervised random forest [41], which generates a naive synthetic dataset D~1subscript~𝐷1\tilde{D}_{1}over~ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and subsequently trains a random forest g^1subscript^𝑔1\hat{g}_{1}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to distinguish between D𝐷Ditalic_D and D~1subscript~𝐷1\tilde{D}_{1}over~ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. (2) Sampling feature values marginally from the instances in the leaves of g^1subscript^𝑔1\hat{g}_{1}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to obtain a more realistic synthetic dataset D~2subscript~𝐷2\tilde{D}_{2}over~ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Another random forest g^2subscript^𝑔2\hat{g}_{2}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is trained to distinguish between D𝐷Ditalic_D and D~2subscript~𝐷2\tilde{D}_{2}over~ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. (3) This process is repeated until the random forest classifier can no longer distinguish synthetic from real data. We denote the final ARF model as g^superscript^𝑔\hat{g}^{*}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. As opposed to GANs, ARFs allow for both density estimation and generative modeling. The two algorithms are called forests for density estimation (FORDE) and forests for generative modeling (FORGE), respectively.

Density estimation with FORDE

leverages the mutual independence across features in the leaves after algorithm convergence, which allows to model the joint density p(x)𝑝𝑥p(x)italic_p ( italic_x ) as a mixture of univariate feature densities:

FORDE(𝐱)p^(𝐱)=l:𝐱Xlπlj=1pp^l,j(xj),FORDE𝐱^𝑝𝐱subscript:𝑙𝐱subscript𝑋𝑙subscript𝜋𝑙superscriptsubscriptproduct𝑗1𝑝subscript^𝑝𝑙𝑗subscript𝑥𝑗\text{FORDE}(\mathbf{x})\coloneqq\hat{p}(\mathbf{x})=\sum_{l:\mathbf{x}\in X_{% l}}\pi_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(x_{j}),FORDE ( bold_x ) ≔ over^ start_ARG italic_p end_ARG ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_l : bold_x ∈ italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (4)

where Xlsubscript𝑋𝑙X_{l}italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the hyperrectangle defined by the l𝑙litalic_l-th leaf, the corresponding mixture weights πlsubscript𝜋𝑙\pi_{l}italic_π start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are calculated as the share of real datapoints that fall into leaf l𝑙litalic_l normalized over all trees, and p^l,jsubscript^𝑝𝑙𝑗\hat{p}_{l,j}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT are (locally) estimated univariate density/mass functions for the j𝑗jitalic_j-th feature in leaf l𝑙litalic_l. The convergence of FORDE to the real data distribution of 𝐗𝐗\mathbf{X}bold_X for infinite data is proven under some mild conditions in Watson et al. [18]. A conditional density under a set of conditions 𝒞𝒞\mathcal{C}caligraphic_C, e.g., fixed values or intervals for certain features C{1,,p}𝐶1𝑝C\subseteq\{1,...,p\}italic_C ⊆ { 1 , … , italic_p }, can be derived from Equation 4 in the following way:

FORDE(𝐱𝒞)p^(𝐱𝒞)=l:𝐱Xlπlj=1pp^l,j(xj𝒞j),FORDEconditional𝐱𝒞^𝑝conditional𝐱𝒞subscript:𝑙𝐱subscript𝑋𝑙subscriptsuperscript𝜋𝑙superscriptsubscriptproduct𝑗1𝑝subscript^𝑝𝑙𝑗conditionalsubscript𝑥𝑗subscript𝒞𝑗\text{FORDE}(\mathbf{x}\mid\mathcal{C})\coloneqq\hat{p}(\mathbf{x}\mid\mathcal% {C})=\sum_{l:\mathbf{x}\in X_{l}}\pi^{\prime}_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(% x_{j}\mid\mathcal{C}_{j}),FORDE ( bold_x ∣ caligraphic_C ) ≔ over^ start_ARG italic_p end_ARG ( bold_x ∣ caligraphic_C ) = ∑ start_POSTSUBSCRIPT italic_l : bold_x ∈ italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ caligraphic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (5)

where 𝒞j𝒞subscript𝒞𝑗𝒞\mathcal{C}_{j}\subseteq\mathcal{C}caligraphic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊆ caligraphic_C denotes the subset of conditions concerning feature jC𝑗𝐶j\in Citalic_j ∈ italic_C, and the mixture weights πlsubscriptsuperscript𝜋𝑙\pi^{\prime}_{l}italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are updated to reflect how likely their corresponding leaves fulfill the condition. More formally, the mixture weights are updated and normalized using the univariate marginals by

πlπlj=1pp^l,j(𝒞j)m:𝐱Xmπmj=1pp^m,j(𝒞j)subscriptsuperscript𝜋𝑙subscript𝜋𝑙superscriptsubscriptproduct𝑗1𝑝subscript^𝑝𝑙𝑗subscript𝒞𝑗subscript:𝑚𝐱subscript𝑋𝑚subscript𝜋𝑚superscriptsubscriptproduct𝑗1𝑝subscript^𝑝𝑚𝑗subscript𝒞𝑗\pi^{\prime}_{l}\coloneqq\frac{\pi_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(\mathcal{C}% _{j})}{\sum_{m:\mathbf{x}\in X_{m}}\pi_{m}\prod_{j=1}^{p}\hat{p}_{m,j}(% \mathcal{C}_{j})}italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≔ divide start_ARG italic_π start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m : bold_x ∈ italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_m , italic_j end_POSTSUBSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG

if the denominator does not equal 00 and by πl0subscriptsuperscript𝜋𝑙0\pi^{\prime}_{l}\coloneqq 0italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≔ 0 otherwise. Note that in the case of conditioning on a fixed value or interval for a continuous feature j𝑗jitalic_j, the univariate densities p^l,jsubscript^𝑝𝑙𝑗\hat{p}_{l,j}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT collapse to the indicator function 𝟙𝒞jsubscript1subscript𝒞𝑗\mathbbm{1}_{\mathcal{C}_{j}}blackboard_1 start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT or the unconditional densities truncated on the conditioning interval, respectively.

Generative modeling with FORGE

is based on drawing a leaf l𝑙litalic_l from the forest according to the mixture weights in FORDE and sampling feature values from the estimated univariate (conditional) densities p^l,jsubscript^𝑝𝑙𝑗\hat{p}_{l,j}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_l , italic_j end_POSTSUBSCRIPT. Thereby, FORGE allows to draw samples that adhere to FORDE as an approximation to the real distribution of 𝐗𝐗\mathbf{X}bold_X or 𝐗𝒞conditional𝐗𝒞\mathbf{X}\mid\mathcal{C}bold_X ∣ caligraphic_C.

4 Methods

Our proposal is to leverage ARF for the efficient generation of counterfactual explanations, i.e., countARFactuals, in mixed tabular data settings. More specifically, we use and modify ARF to account for the desiderata that we discussed in Section 3.1:

  1. 1.

    Validity: We train ARF on D𝐷Ditalic_D but replace the target Y𝑌Yitalic_Y with the predictions Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG. Here, Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is treated just as any other feature in the data. Since FORGE allows for conditional sampling, we can sample from 𝐗𝐗\mathbf{X}bold_X conditioned on our desired outcomes Y^Ydes^𝑌subscript𝑌𝑑𝑒𝑠\hat{Y}\in Y_{des}over^ start_ARG italic_Y end_ARG ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. Note, however, that ARF may not learn a perfect representation of the prediction function Y^:=f^(𝐗)assign^𝑌^𝑓𝐗\hat{Y}:=\hat{f}(\mathbf{X})over^ start_ARG italic_Y end_ARG := over^ start_ARG italic_f end_ARG ( bold_X ). It therefore is not guaranteed that ARF-samples are valid, it only becomes more likely. In our algorithms, we only return those candidates with predictions in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT.

  2. 2.

    Proximity: We restrict the output of our two methods to those counterfactuals in the Pareto set, defined over the four objectives of Section 3.1, including proximity (Equation 2). In the first algorithm described below, we additionally use ARF combined with MOC, which accounts for proximity, as described in Section 3.1.

  3. 3.

    Plausibility: ARF allows us to both evaluate the plausibility of data points using FORDE (which is also used to determine the returned Pareto set) and efficiently generate plausible data with FORGE.

  4. 4.

    Sparsity: FORGE allows to sample feature values XSsubscript𝑋𝑆X_{S}italic_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT based on the observation XC=xCsubscript𝑋𝐶subscript𝑥𝐶X_{C}=x_{C}italic_X start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. By fixing certain features C𝐶Citalic_C to the value of 𝐱Csubscriptsuperscript𝐱𝐶\mathbf{x}^{*}_{C}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, we only change feature values in the sparse set S:={1,,p}Cassign𝑆1𝑝𝐶S:=\{1,\dots,p\}\setminus Citalic_S := { 1 , … , italic_p } ∖ italic_C.

With the desiderata in place, several decisions need to be made: Should we integrate plausibility via density estimation (FORDE) or generative modeling (FORGE)? What is an optimal trade-off between proximity and other objectives, such as plausibility and sparsity? How should we search for the conditioning set C𝐶Citalic_C for features that should not be changed? In the following, we provide two algorithms that decide on these questions in different ways. The first integrates ARF into MOC (Section 4.1). The second uses ARF as a standalone counterfactual generator (Section 4.2).

4.1 Algorithm 1: Integrating ARF into MOC

In MOC’s optimization problem (Equation 1), we substitute the plausibility measure (Equation 3) by the density estimator of FORDE (Equation 4). Since the individual objectives in MOC must map to a zero-one interval (with low values denoting desired properties), we transform p^(𝐱)^𝑝𝐱\hat{p}(\mathbf{x})over^ start_ARG italic_p end_ARG ( bold_x ), as estimated by FORDE, with the negative exponential function

oplaus(𝐱):=ep^(𝐱).assignsuperscriptsubscript𝑜plaus𝐱superscript𝑒^𝑝𝐱o_{\text{plaus}}^{*}(\mathbf{x}):=e^{-\hat{p}(\mathbf{x})}.italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_x ) := italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_p end_ARG ( bold_x ) end_POSTSUPERSCRIPT . (6)

We use FORGE as described above to sample plausible candidates in MOC in the mutation step of the NSGA-II. This is a strategy to efficiently limit the search space of MOC to plausible counterfactuals. Concerning sparsity, we find the conditioning set C𝐶Citalic_C through iterated mutation and recombination, just like in MOC, and we select candidates using NSGA-II according to optimality and diversity. At last, the output comprises only the valid Pareto-set of counterfactuals over the generations, i.e., counterfactuals that have a prediction in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT and are not dominated by other candidates that were generated. For details, we refer to the pseudocode in Appendix A.

4.2 Algorithm 2: ARF is all you need

For this algorithm, we leverage the ability of our modified ARF sampler to directly and efficiently generate many relevant counterfactuals. As described above, the modified FORGE method allows to generate plausible data points. To enforce sparsity, we sample m𝑚mitalic_m features with probabilities according to their local feature importance, calculated as the standard deviation of the individual conditional expectation (ICE) curve [9, 42]. The m𝑚mitalic_m selected features describe the features S𝑆Sitalic_S we aim to change because they, according to the local feature importance, impact the prediction the most. The remaining features then form the conditioning set C={1,,p}S𝐶1𝑝𝑆C=\{1,\dots,p\}\setminus Sitalic_C = { 1 , … , italic_p } ∖ italic_S.

As for Algorithm 1, we output only the valid and Pareto-optimal set of counterfactuals. The pseudocode for this method is given in Appendix B.

5 Experiments

We evaluate the quality of our proposed methods with respect to the following research questions:

  1. RQ (1)

    Do our proposed ARF-based methods generate more plausible counterfactuals compared to competing methods without major sacrifices in sparsity (osparsesubscript𝑜sparseo_{\text{sparse}}italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT), proximity (oproxsubscript𝑜proxo_{\text{prox}}italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT) and the runtime?

  2. RQ (2)

    Does oplaussubscriptsuperscript𝑜plauso^{*}_{\text{plaus}}italic_o start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT (Equation 6) better reflect the true plausibility compared to oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT (Equation 3)?

To objectively evaluate the plausibility of the generated counterfactuals, we require access to the ground-truth likelihood. Because ground-truth likelihoods are usually unavailable for real-world data, we evaluate our methods on synthetic data. An illustrative real-world application follows in Section 6.

5.1 Data-generating process

For the experiments, we constructed three illustrative two-dimensional datasets, namely cassini (inspired by [43]), two sines (inspired by the two moons dataset), and three blobs (inspired by [15]). Moreover, we generated four datasets from randomly sampled Bayesian networks of dimensionality 5555, 10101010, and 20202020, namely bn_5, bn_10, and bn_20, which all include both continuous and categorical features as well as nonlinear relationships. An XGBoost model was fitted on sampled datasets Dtrainsubscript𝐷trainD_{\text{train}}italic_D start_POSTSUBSCRIPT train end_POSTSUBSCRIPT of size 5 00050005\,0005 000 [44]. For each data-generating process (DGP), ten additional points were sampled as instances of interest 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The counterfactual generation methods received access to newly sampled datasets D𝐷Ditalic_D of size 5 00050005\,0005 000. Details on the dataset generation and model fit can be found in Appendix C and in the repository accompanying this paper.222https://github.com/bips-hb/countARFactuals.

5.2 Competing methods

We compare our proposed MOC version based on ARF of Section 4.1 (referred to as MOCARF) and the standalone ARF generator of Section 4.2 (referred to as ARF) to the following competitors: MOC and MOCCTREE (MOC with a conditional sampler, see Section 3.1) [9] and NICE [12] with a plausibility reward function (see Equation (4) in [12]). NICE generates counterfactuals by iteratively replacing one feature after the other in 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by the values of 𝐱nnsuperscript𝐱nn\mathbf{x}^{\text{nn}}bold_x start_POSTSUPERSCRIPT nn end_POSTSUPERSCRIPT, which denotes a nearest neighbor of 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in D𝐷Ditalic_D with f^(𝐱nn)Ydes^𝑓superscript𝐱nnsubscript𝑌𝑑𝑒𝑠\hat{f}(\mathbf{x}^{\text{nn}})\in Y_{des}over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUPERSCRIPT nn end_POSTSUPERSCRIPT ) ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. In each iteration, the algorithm keeps the feature change with the highest plausibility reward.

To allow for a fair comparison, all methods generate a set of counterfactual candidates. For NICE, we apply the extension of Dandl et al. [45]; instead of stop** the search once the point with the highest reward has a prediction in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT, the search continues until 𝐱nnsuperscript𝐱nn\mathbf{x}^{\text{nn}}bold_x start_POSTSUPERSCRIPT nn end_POSTSUPERSCRIPT is recovered and all intermediate instances with predictions in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT are returned. If possible, we selected the hyperparameters for the methods such that each method generated an equal number of candidates – namely, 1 00010001\,0001 000.333Specifying the exact number was possible for all methods besides NICE [45]. ARF requires a maximum set size for S𝑆Sitalic_S, reflecting how many features are maximally allowed to be changed. We set it according to the number of features p𝑝pitalic_p as mmax:=min(p+3,p)assignsubscript𝑚𝑚𝑎𝑥𝑚𝑖𝑛𝑝3𝑝m_{max}:=min(\lceil\sqrt{p}+3\rceil,p)italic_m start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT := italic_m italic_i italic_n ( ⌈ square-root start_ARG italic_p end_ARG + 3 ⌉ , italic_p ). Since also for all MOC-based methods the maximum number can be specified, we used the same mmaxsubscript𝑚𝑚𝑎𝑥m_{max}italic_m start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT for MOC, MOCARF and MOCCTREE. For the evaluation, we focused only on the unique counterfactuals that have predictions in Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT. We further reduced this set to the Pareto set, i.e., the set of counterfactuals that are nondominated according to proximity (oproxsubscript𝑜proxo_{\text{prox}}italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT), sparsity (osparsesubscript𝑜sparseo_{\text{sparse}}italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT) and plausibility. The definition of the plausibility objective differed between the methods, with oplaussubscriptsuperscript𝑜plauso^{\star}_{\text{plaus}}italic_o start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT for ARF and MOCARF, oplaussuperscript𝑜plauso^{\text{plaus}}italic_o start_POSTSUPERSCRIPT plaus end_POSTSUPERSCRIPT for MOC and MOCCTREE, and the autoencoder reconstruction error for NICE (as proposed by [12]).

5.3 Evaluation criteria

To answer RQ (1), we evaluated the generated counterfactuals with respect to the ground-truth likelihood (denoted as plausibility, in the following), validity ovalidsubscript𝑜valido_{\text{valid}}italic_o start_POSTSUBSCRIPT valid end_POSTSUBSCRIPT, proximity oproxsubscript𝑜proxo_{\text{prox}}italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT and sparsity osparsesubscript𝑜sparseo_{\text{sparse}}italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT (see Section 3.1). We aggregated the results per method, dataset and instance of interest 𝐱superscript𝐱\mathbf{x}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT by computing (scaled) dominated hypervolumes [46]. We also measured the number of nondominated counterfactuals and the runtime. To investigate the trade-off between plausibility and proximity, we also computed median attainment surfaces according to López-Ibáñez et al. [47] for each method and dataset. It reveals how the two objectives are distributed on average over the different 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. To answer RQ (2), all generated counterfactuals were evaluated with respect to oplaussuperscriptsubscript𝑜plauso_{\text{plaus}}^{\star}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT. Per method, dataset and 𝐱superscript𝐱\mathbf{x}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, we computed Spearman-rank correlations between the true plausibility and oplaussuperscriptsubscript𝑜plauso_{\text{plaus}}^{\star}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and between the true plausibility and oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT. With a Wilcoxon signed rank test, we tested whether oplaussuperscriptsubscript𝑜plauso_{\text{plaus}}^{*}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT has higher correlations to the true plausibility than oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT.

5.4 Results

Figure 2 presents the results for RQ (1) and shows the objective values per counterfactuals as well as the hypervolume, number of counterfactuals and runtime. On average, ARF and MOCARF generated more plausible counterfactuals compared to the other MOC-based approaches and NICE. In alignment with previous literature [9, 48], our results suggest that higher plausibility might be associated with lower proximity and sparsity. For further investigations on the trade-offs, Figure 4 and Figure 5 in Appendix D detail the median attainment surfaces per dataset and method. The plots reveal that ARF and MOCARF on average dominate the other methods in proximity, sparsity and plausibility, with the differences being greatest in plausibility. The hypervolume was on average similar for the different methods for low-dimensional datasets (ARF had lower hypervolumes in cassini due to its inferiority in proximity and sparsity), for higher-dimensional datasets (bn_10 and bn_20), ARF and MOCARF performed better than the competing methods. Concerning runtime, ARF generated counterfactuals the fastest on average, followed by NICE and MOC. MOCARF was faster than MOCCTREE for datasets with more than two features. The runtime differences increased with higher dimensional data. On average, ARF and MOCARF generated the largest set of nondominated counterfactuals compared to the other methods.

Refer to caption
Figure 2: Boxplots of the plausibility, proximity (1oprox1subscript𝑜prox1-o_{\text{prox}}1 - italic_o start_POSTSUBSCRIPT prox end_POSTSUBSCRIPT), sparsity (1osparse1subscript𝑜sparse1-o_{\text{sparse}}1 - italic_o start_POSTSUBSCRIPT sparse end_POSTSUBSCRIPT), hypervolume, number of counterfactuals and runtime for each method and dataset. Higher values are better, except for runtime.

Considering RQ (2), the Wilcoxon rank sum test had a p-value close to 00 (7.16e067.16𝑒067.16e-067.16 italic_e - 06), i.e., the correlation of our proposed plausibility measure oplaussuperscriptsubscript𝑜plauso_{\text{plaus}}^{*}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to the true plausibility was significantly higher than that of oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT. The median correlation to the true plausibility over all methods and datasets was 0.84 for oplaussuperscriptsubscript𝑜plauso_{\text{plaus}}^{*}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 0.69 for oplaussubscript𝑜plauso_{\text{plaus}}italic_o start_POSTSUBSCRIPT plaus end_POSTSUBSCRIPT.

Overall, our study shows that on average our proposed methods – ARF and MOCARF – generate a more plausible set of counterfactuals compared to our competitors without major sacrifices in sparsity and proximity. Notably, ARF achieves this with superiority in runtimes.

6 Real Data Example

We illustrate our approach on the publicly available coffee quality dataset444https://github.com/jldbc/coffee-quality-database. The data details the characteristics of several Arabica coffee beans, such as the country of origin and altitude at which the beans were cultivated. Further, the dataset includes information on a quality review score (cup points) specified by an expert jury within the Coffee Quality Institute [49].

In this example, we use a random forest to predict coffee quality from selected, actionable characteristics of the coffee beans. For simplicity, we binarize the target score cup points. Aiming for balanced classes of good and bad quality, we use the dataset’s median value of cup points as a cut-off point, i.e.,

quality={goodif cup points  median(cup points)badotherwise.qualitycasesgoodif cup points  median(cup points)badotherwise\texttt{quality}=\begin{cases}\text{{good}}&\text{if {cup points} $\geq$ % median({cup points})}\\ \text{{bad}}&\text{otherwise}.\end{cases}quality = { start_ROW start_CELL good end_CELL start_CELL if italic_cup italic_points ≥ median( italic_cup italic_points ) end_CELL end_ROW start_ROW start_CELL bad end_CELL start_CELL otherwise . end_CELL end_ROW (7)

For illustration, we generate counterfactual explanations for an instance of bad coffee quality, answering the question: Which characteristics would need to be changed to rate as good quality coffee?

This example illustrates the importance of taking into account the multiple objectives of counterfactual explanations, such as sparsity and plausibility. For example, a company that aims to improve the quality of their coffee may want to make as sparse changes to the coffee characteristics as possible for economic reasons. Similarly, some changes might not be plausible, think of changing the country of origin independently of the altitude of the coffee plantations or the variety of beans cultivated in the respective country (since the variety must suit the natural conditions in the respective country).

The generation of counterfactual explanations in this example is performed using Alg. 2 detailed in Section 4.2. In Figure 3, we present a set of the generated countARFactuals explanations for an instance of coffee beans belonging to the bad class that originate from Taiwan.

Refer to caption
Figure 3: Exemplary countARFactuals for an instance of bad coffee quality. Arrows indicate changes in comparison to 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., a feature’s value increase \uparrow, decrease \downarrow or change in category \leftrightarrow.

Figure 3 illustrates that countARFactuals yield plausible counterfactual explanations. For instance, for countARFactual #3, both the country of origin is changed from Taiwan to Colombia and the variety from Typica to Caturra. This seems reasonable because Typica was grown in only few Colombian instances in the training dataset, and instead, Caturra was the most frequently grown variety in Colombia. Further, the altitude at which the beans are grown is elevated only a little within Taiwanese countARFactuals (# 4 - 12), but more drastically for countries that – given the data – grow coffee on higher altitudes on average, such as Mexico (# 1) and Colombia (# 3).

7 Discussion

In this paper we show that adversarial random forests (ARF) can be modified to generate plausible counterfactuals, both as a subroutine to multi-objective counterfactual explanations (MOC) and as a standalone approach. Our experiments in Section 5 demonstrate that ARF can improve the plausibility of counterfactuals and the efficiency in their generation without substantially sacrificing other desiderata such as proximity and sparsity. In contrast to other generative modeling approaches for plausible counterfactuals, ARF handles mixed tabular data directly without, e.g., one-hot-encoding categorical features, thereby improving data-efficiency. Moreover, ARF-based counterfactual generation allows for sparsity via conditional sampling and is an off-the-shelf methodology that requires minimal efforts in tuning and computational resources.

Our work faces some limitations. For example, we define the plausibility of counterfactuals via the joint density. However, as highlighted by Keane et al. [20], there are different conceptualizations of plausibility, for example, based on the feasibility of actions or user perceived plausibility [5, 6, 27]. One might even question if staying in the manifold is always desirable, e.g., if changing the class requires extrapolation so should our counterfactuals. It should be noted that plausible counterfactuals, in general, cannot be interpreted as action recommendations. Although they provide hints about which alternative feature values would yield acceptance by the predictor, they do not guide the user on which interventions yield the desired change in the real world. To guide action, causal knowledge is required [50]. Furthermore, in the context of recourse, improvement of the underlying target is more desirable than acceptance by a specific predictor, which counterfactual explanations do not target [6].

Proximity and plausibility are conflicting objectives [12, 9]. Oftentimes, there is only little data close to the decision boundary, and jum** just over the boundary can lead to implausible counterfactuals [19]. A trade-off between the two objectives is desirable, which we implicitly address by generating a Pareto-optimal set of diverse counterfactuals. In future work, one could already incorporate such trade-offs in the counterfactual generation, e.g., by a parameter that directly controls for the proximity-plausibility trade-off. One option would be to set a threshold for plausibility instead of a trade-off parameter, as suggested by Brughmans et al. [12].

Like all works on counterfactual explanations, we face the Rashomon effect: There exist many plausible counterfactuals that explain the same data point. This raises the question of which one we should show to the user [20, 22]? As a bottom line, we return only the Pareto-optimal set of counterfactuals, which at least guarantees that no strictly dominated option is shown. In future work, integrating user preferences or considering additional objectives may improve the final selection.

Our framework is tailored for mixed tabular data settings. For other data modalities like image or text data, we advise for using neural network based approaches for density estimation and generative modeling such as VAEs and GANs. Finally, our framework is designed for binary classification and regression but can be extended to multi-class classification.

In future work, we plan to investigate the role of the ML model in the ARF approach to counterfactuals. We could also generate counterfactuals with ARF without the model by directly training ARF on Y𝑌Yitalic_Y rather than the predictions Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG. We would then get plausible counterfactuals that hint towards improvement instead of acceptance [6]. While such counterfactuals appear different from those discussed in the XAI literature so far, in fact, they essentially just turn the generative model that conditions on 𝐗=𝐱𝐗𝐱\mathbf{X}=\mathbf{x}bold_X = bold_x into a prediction algorithm.

Acknowledgments

MNW and KB were supported by the German Research Foundation (DFG), Grant Number 437611051. MNW was supported by the German Research Foundation (DFG), Grant Number 459360854. KB was supported by a PhD grant of the Minds, Media, Machines Integrated Graduate School Bremen. MNW and JK were supported by the U Bremen Research Alliance/AI Center for Health Care, financially supported by the Federal State of Bremen. GK and TF were supported by the German Research Foundation through the Cluster of Excellence “Machine Learning - New Perspectives for Science" (EXC 2064/1 number 390727645). TF has been supported by the Carl Zeiss Foundation through the project “Certification and Foundations of Safe Machine Learning Systems in Healthcare”.

References

  • [1] Jenna Burrell. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1):2053951715622512, 2016.
  • [2] Amina Adadi and Mohammed Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access, 6:52138–52160, 2018.
  • [3] Christoph Molnar. Interpretable Machine Learning. 2 edition, 2022.
  • [4] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31:841, 2017.
  • [5] Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. A survey of algorithmic recourse: Contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5):1–29, 2022.
  • [6] Gunnar König, Timo Freiesleben, and Moritz Grosse-Wentrup. Improvement-focused causal recourse (ICR). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11847–11855, 2023.
  • [7] Henrietta Lyons, Eduardo Velloso, and Tim Miller. Conceptualising contestability: Perspectives on contesting algorithmic decisions. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–25, 2021.
  • [8] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288, 2019.
  • [9] Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature, pages 448–469. Springer, 2020.
  • [10] Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001.
  • [11] Riccardo Guidotti. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery, pages 1–55, 2022.
  • [12] Dieter Brughmans, Pieter Leyman, and David Martens. NICE: An algorithm for nearest instance counterfactual explanations. Data Mining and Knowledge Discovery, pages 1–39, 2023.
  • [13] Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint arXiv:1907.09615, 2019.
  • [14] Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
  • [15] Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of the web conference 2020, pages 3126–3132, 2020.
  • [16] Daniel Nemirovsky, Nicolas Thiebaut, Ye Xu, and Abhishek Gupta. Countergan: Generating counterfactuals for real-time recourse and interpretability using residual gans. In Uncertainty in Artificial Intelligence, pages 1488–1497. PMLR, 2022.
  • [17] Arnaud Van Looveren, Janis Klaise, Giovanni Vacanti, and Oliver Cobb. Conditional generative models for counterfactual explanations. arXiv preprint arXiv:2101.10123, 2021.
  • [18] David S Watson, Kristin Blesch, Jan Kapar, and Marvin N Wright. Adversarial random forests for density estimation and generative modeling. In Proceedings of the 26thsuperscript26𝑡26^{th}26 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Artificial Intelligence and Statistics, pages 5357–5375. PMLR, 2023.
  • [19] Timo Freiesleben. The intriguing relation between counterfactual explanations and adversarial examples. Minds and Machines, 32(1):77–109, 2022.
  • [20] Mark T Keane, Eoin M Kenny, Eoin Delaney, and Barry Smyth. If only we had better counterfactual explanations: Five key deficits to rectify in the evaluation of counterfactual XAI techniques. arXiv preprint arXiv:2103.01035, 2021.
  • [21] Ilia Stepin, Jose M Alonso, Alejandro Catala, and Martín Pereira-Fariña. A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access, 9:11974–12001, 2021.
  • [22] Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan E Hines, John P Dickerson, and Chirag Shah. Counterfactual explanations and algorithmic recourses for machine learning: A review. arXiv preprint arXiv:2010.10596, 2020.
  • [23] Mark T Keane and Barry Smyth. Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable ai (XAI). In Case-Based Reasoning Research and Development: 28th International Conference, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, pages 163–178. Springer, 2020.
  • [24] Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, and Hiroki Arimura. Dace: Distribution-aware counterfactual explanation by mixed-integer linear optimization. In IJCAI, pages 2855–2862, 2020.
  • [25] André Artelt and Barbara Hammer. Convex density constraints for computing plausible counterfactual explanations. In Artificial Neural Networks and Machine Learning–ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part I 29, pages 353–365. Springer, 2020.
  • [26] Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. In International Conference on Machine Learning, pages 2376–2384. PMLR, 2019.
  • [27] Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. Face: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 344–350, 2020.
  • [28] John C Gower. A general coefficient of similarity and some of its properties. Biometrics, 27(4):857–871, 1971.
  • [29] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  • [30] Torsten Hothorn and Achim Zeileis. Predictive distribution modeling using transformation forests. Journal of Computational and Graphical Statistics, 30(4):1181–1196, March 2021.
  • [31] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the 2ndsuperscript2𝑛𝑑2^{nd}2 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT International Conference on Learning Representations, 2014.
  • [32] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, 2014.
  • [33] Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32thsuperscript32𝑡32^{th}32 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Conference on Machine Learning, pages 1530–1538. PMLR, 2015.
  • [34] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020.
  • [35] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • [36] Sam Bond-Taylor, Adam Leach, Yang Long, and Chris G Willcocks. Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7327–7347, 2021.
  • [37] David Foster. Generative deep learning: Teaching machines to paint, write, compose, and play. O’Reilly Media, Inc., 2ndsuperscript2𝑛𝑑2^{nd}2 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT edition, 2022.
  • [38] Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • [39] Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In Advances in Neural Information Processing Systems, volume 35, pages 507–520, 2022.
  • [40] Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022.
  • [41] Tao Shi and Steve Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1):118–138, 2006.
  • [42] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65, January 2015.
  • [43] Friedrich Leisch and Evgenia Dimitriadou. mlbench: Machine learning benchmark problems, 2021. R package version 2.1-3.1.
  • [44] Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, 2016. ACM.
  • [45] Susanne Dandl, Andreas Hofheinz, Martin Binder, Bernd Bischl, and Giuseppe Casalicchio. counterfactuals: An R package for counterfactual explanation methods, 2023.
  • [46] Eckart Zitzler and Lothar Thiele. Multiobjective optimization using evolutionary algorithms — A comparative case study, page 292–301. Springer Berlin Heidelberg, 1998.
  • [47] Manuel López-Ibáñez, Luís Paquete, and Thomas Stützle. Exploratory analysis of stochastic local search algorithms in biobjective optimization, page 209–222. Springer Berlin Heidelberg, 2010.
  • [48] Javier Del Ser, Alejandro Barredo-Arrieta, Natalia Díaz-Rodríguez, Francisco Herrera, Anna Saranti, and Andreas Holzinger. On generating trustworthy counterfactual explanations. Information Sciences, 655:119898, 2024.
  • [49] Coffee Quality Insitute. https://www.coffeeinstitute.org/. Last accessed: 2024-03-12.
  • [50] Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 353–362, 2021.

Appendix A Algorithm 1: Integrating ARF into MOC

The following pseudocode is based on Algorithm 1 in [45]. Blue lines highlight the steps that differ from the original MOC algorithm proposed by [9].

Algorithm 1 MOC with ARF-based Sampler and Evaluation

Inputs:
Datapoint to explain prediction for 𝐱𝒳superscript𝐱𝒳\mathbf{x}^{\star}\in\mathcal{X}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_X
Desired outcome (range) Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT
Prediction function f^:𝒳:^𝑓𝒳\hat{f}:\mathcal{X}\rightarrow\mathbb{R}over^ start_ARG italic_f end_ARG : caligraphic_X → blackboard_R
Observed data D𝐷Ditalic_D
ARF g^superscript^𝑔\hat{g}^{*}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT trained on (𝐱i,f^(𝐱i))i=1nsuperscriptsubscriptsubscript𝐱𝑖^𝑓subscript𝐱𝑖𝑖1𝑛(\mathbf{x}_{i},\hat{f}(\mathbf{x}_{i}))_{i=1}^{n}( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with 𝐱iDsubscript𝐱𝑖𝐷\mathbf{x}_{i}\in Dbold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_D
Number of generations ngenerationssubscript𝑛generationsn_{\text{generations}}italic_n start_POSTSUBSCRIPT generations end_POSTSUBSCRIPT
Size of population μ𝜇\muitalic_μ
Recombination and mutation methods including probabilities
Selection method for features in the conditioning set and initialization method
Stop** criterion
(Additional user inputs, e.g., range of numerical features, immutable features, distance function, see [9])

0:  Initialize population P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with |P0|=μsubscript𝑃0𝜇|P_{0}|=\mu| italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | = italic_μ (ICE-curve-based, see [9])
0:   Evaluate candidates according to four objectives:
  • Validity (L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT)

  • Sparsity (L0subscript𝐿0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT)

  • Proximity (Gower distance)

  • Plausibility (ARF-based likelihood transformed with exsuperscript𝑒𝑥e^{-x}italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT)

0:   Set t=0𝑡0t=0italic_t = 0
0:   for r{1,,niterations}𝑟1subscript𝑛iterationsr\in\{1,...,n_{\text{iterations}}\}italic_r ∈ { 1 , … , italic_n start_POSTSUBSCRIPT iterations end_POSTSUBSCRIPT }
0:             Ct=subscript𝐶𝑡absentC_{t}=italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = create_offspring(Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), |Ct|=μsubscript𝐶𝑡𝜇|C_{t}|=\mu| italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | = italic_μ with given probabilities
  1. 1.

    Select best candidates (acc. to validity objective)

  2. 2.

    Recombine these pairwise

  3. 3.

    Mutate values jointly using g^superscript^𝑔\hat{g}^{*}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT: generate new datapoints with FORGE

0:             Combine parents and offspring Rt=CtPtsubscript𝑅𝑡subscript𝐶𝑡subscript𝑃𝑡R_{t}=C_{t}\cup P_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∪ italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
0:             Assign candidates to a front according to their objective values:            (F1,F2,,Fm)=subscript𝐹1subscript𝐹2subscript𝐹𝑚absent(F_{1},F_{2},...,F_{m})=( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = nondominated_sorting(Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
0:             for i=1,,m𝑖1𝑚i=1,...,mitalic_i = 1 , … , italic_m
0:                     Sort candidates acc. to diversity (objective and feature space):                    F~isubscript~𝐹𝑖\tilde{F}_{i}over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = crowding_distance_sort(Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT)
0:            end for
0:            Set Pt+1=subscript𝑃𝑡1P_{t+1}=\emptysetitalic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = ∅ and i=1𝑖1i=1italic_i = 1
0:            while |Pt+1|+|F~i|μsubscript𝑃𝑡1subscript~𝐹𝑖𝜇|P_{t+1}|+|\tilde{F}_{i}|\leq\mu| italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | + | over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_μ
0:                     Pt+1=Pt+1F~isubscript𝑃𝑡1subscript𝑃𝑡1subscript~𝐹𝑖P_{t+1}=P_{t+1}\cup\tilde{F}_{i}italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∪ over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
0:                     i = i + 1
0:            end while
0:            Choose first μ|Pt+1|𝜇subscript𝑃𝑡1\mu-|P_{t+1}|italic_μ - | italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | elements of F~isubscript~𝐹𝑖\tilde{F}_{i}over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: Pt+1=Pt+1F~i[1:(μ|Pt+1|)]P_{t+1}=P_{t+1}\cup\tilde{F}_{i}[1:(\mu-|P_{t+1}|)]italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∪ over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ 1 : ( italic_μ - | italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | ) ]
0:            t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1
0:   end for
0:   Return unique, non-dominated candidates of k=0tPk𝐱superscriptsubscript𝑘0𝑡subscript𝑃𝑘superscript𝐱\bigcup_{k=0}^{t}P_{k}\setminus\mathbf{x}^{\star}⋃ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∖ bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT with f^(𝐱CF)Ydes^𝑓subscript𝐱𝐶𝐹subscript𝑌𝑑𝑒𝑠\hat{f}(\mathbf{x}_{CF})\in Y_{des}over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT ) ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT

Appendix B Algorithm 2: ARF is all you need

Algorithm 2 ARF-based Counterfactual Generator

Inputs:
Datapoint to explain prediction for 𝐱𝒳superscript𝐱𝒳\mathbf{x}^{\star}\in\mathcal{X}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ caligraphic_X
Desired outcome (range) Ydessubscript𝑌𝑑𝑒𝑠Y_{des}italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT
Prediction function f^:𝒳:^𝑓𝒳\hat{f}:\mathcal{X}\rightarrow\mathbb{R}over^ start_ARG italic_f end_ARG : caligraphic_X → blackboard_R
Observed data D𝐷Ditalic_D
ARF g^superscript^𝑔\hat{g}^{*}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT trained on data (𝐱i,f^(𝐱i))i=1nsuperscriptsubscriptsubscript𝐱𝑖^𝑓subscript𝐱𝑖𝑖1𝑛(\mathbf{x}_{i},\hat{f}(\mathbf{x}_{i}))_{i=1}^{n}( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with 𝐱iDsubscript𝐱𝑖𝐷\mathbf{x}_{i}\in Dbold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_D
Maximum number of feature changes mmaxsubscript𝑚𝑚𝑎𝑥m_{max}italic_m start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT
Number of iterations niterationssubscript𝑛iterationsn_{\text{iterations}}italic_n start_POSTSUBSCRIPT iterations end_POSTSUBSCRIPT
Number of samples generated in each iteration nsynthsubscript𝑛𝑠𝑦𝑛𝑡n_{synth}italic_n start_POSTSUBSCRIPT italic_s italic_y italic_n italic_t italic_h end_POSTSUBSCRIPT
(Additional user inputs, e.g., immutable features)

0:   Derive local importances (fij)j=1psuperscriptsubscriptsubscriptfi𝑗𝑗1𝑝(\text{fi}_{j})_{j=1}^{p}( fi start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT for each feature j{1,,p}𝑗1𝑝j\in\{1,...,p\}italic_j ∈ { 1 , … , italic_p } (ICE-curve-based, see [9])
0:   for r{1,,niterations}𝑟1subscript𝑛iterationsr\in\{1,...,n_{\text{iterations}}\}italic_r ∈ { 1 , … , italic_n start_POSTSUBSCRIPT iterations end_POSTSUBSCRIPT }
0:             msample(1,,mmax)𝑚sample1subscript𝑚𝑚𝑎𝑥m\leftarrow\texttt{sample}(1,...,m_{max})italic_m ← sample ( 1 , … , italic_m start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT )
0:             Select set C{1,,p}𝐶1𝑝C\subset\{1,...,p\}italic_C ⊂ { 1 , … , italic_p } by randomly sampling m𝑚mitalic_m features with probability            proportional to how unimportant feature is
0:             CF𝐶𝐹absentCF\leftarrowitalic_C italic_F ← sample nsynthsubscript𝑛𝑠𝑦𝑛𝑡n_{synth}italic_n start_POSTSUBSCRIPT italic_s italic_y italic_n italic_t italic_h end_POSTSUBSCRIPT observations with FORGE derived from g^superscript^𝑔\hat{g}^{*}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT under            condition that jC:Xj=xj:for-all𝑗𝐶subscript𝑋𝑗superscriptsubscript𝑥𝑗\forall j\in C:X_{j}=x_{j}^{\star}∀ italic_j ∈ italic_C : italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT & Y^Ydes^𝑌subscript𝑌𝑑𝑒𝑠\hat{Y}\in Y_{des}over^ start_ARG italic_Y end_ARG ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT
0:   𝐗CF(𝐗CF,CF)subscript𝐗𝐶𝐹subscript𝐗𝐶𝐹𝐶𝐹\mathbf{X}_{CF}\leftarrow(\mathbf{X}_{CF},CF)bold_X start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT ← ( bold_X start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT , italic_C italic_F )
0:   end for
0:   Return unique, nondominated candidates 𝐱CF𝐗CFsubscript𝐱𝐶𝐹subscript𝐗𝐶𝐹\mathbf{x}_{CF}\in\mathbf{X}_{CF}bold_x start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT with f^(𝐱CF)Ydes^𝑓subscript𝐱𝐶𝐹subscript𝑌𝑑𝑒𝑠\hat{f}(\mathbf{x}_{CF})\in Y_{des}over^ start_ARG italic_f end_ARG ( bold_x start_POSTSUBSCRIPT italic_C italic_F end_POSTSUBSCRIPT ) ∈ italic_Y start_POSTSUBSCRIPT italic_d italic_e italic_s end_POSTSUBSCRIPT

Appendix C Synthetic Data

As follows, we describe the three illustrative datasets as well as the sampling of the randomly generated data-generating processes. The code that was used to generate the datasets and pair plots visualizing their distribution can be found in the repository accompanying the paper (https://github.com/bips-hb/countARFactuals).555For an explanation of how to run the code, we refer to python/README.md. The visualization can be found in the folder python/visualizations/.

C.1 Illustrative datasets

Cassini

The DGP, inspired by [43], is defined as follows:

Y𝑌\displaystyle Yitalic_Y Y1+Y1Y2withY1Bern(2/3),Y2Bern(0.5)formulae-sequencesimilar-toabsentsubscript𝑌1subscript𝑌1subscript𝑌2withformulae-sequencesimilar-tosubscript𝑌1𝐵𝑒𝑟𝑛23similar-tosubscript𝑌2𝐵𝑒𝑟𝑛0.5\displaystyle\sim Y_{1}+Y_{1}Y_{2}\quad\text{with}\quad Y_{1}\sim Bern(2/3),\;% Y_{2}\sim Bern(0.5)∼ italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_B italic_e italic_r italic_n ( 2 / 3 ) , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_B italic_e italic_r italic_n ( 0.5 )
X1|Y1conditionalsubscript𝑋1subscript𝑌1\displaystyle X_{1}|Y_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT {N(0,0.2),if Y1=0N(0,0.5),otherwisesimilar-toabsentcases𝑁00.2if subscript𝑌10𝑁00.5otherwise\displaystyle\sim\begin{cases}N(0,0.2),&\text{if }Y_{1}=0\\ N(0,0.5),&\text{otherwise}\end{cases}∼ { start_ROW start_CELL italic_N ( 0 , 0.2 ) , end_CELL start_CELL if italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_CELL end_ROW start_ROW start_CELL italic_N ( 0 , 0.5 ) , end_CELL start_CELL otherwise end_CELL end_ROW
X2|X1,Y1,Y2conditionalsubscript𝑋2subscript𝑋1subscript𝑌1subscript𝑌2\displaystyle X_{2}|X_{1},Y_{1},Y_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT {N(0,0.2)if Y1=0N((1)Y2cos(X1),0.2)otherwisesimilar-toabsentcases𝑁00.2if subscript𝑌10𝑁superscript1subscript𝑌2𝑐𝑜𝑠subscript𝑋10.2otherwise\displaystyle\sim\begin{cases}N(0,0.2)&\text{if }Y_{1}=0\\ N((-1)^{Y_{2}}cos(X_{1}),0.2)&\text{otherwise}\end{cases}∼ { start_ROW start_CELL italic_N ( 0 , 0.2 ) end_CELL start_CELL if italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_CELL end_ROW start_ROW start_CELL italic_N ( ( - 1 ) start_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c italic_o italic_s ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , 0.2 ) end_CELL start_CELL otherwise end_CELL end_ROW

Two Sines

The DGP, inspired by the two moons dataset, is specified as:

YBern(0.5),X1|YN(Y,3.0),X2|Y,X1N(sin(X1)2Y+1,0.3)formulae-sequencesimilar-to𝑌𝐵𝑒𝑟𝑛0.5formulae-sequencesimilar-toconditionalsubscript𝑋1𝑌𝑁𝑌3.0similar-toconditionalsubscript𝑋2𝑌subscript𝑋1𝑁𝑠𝑖𝑛subscript𝑋12𝑌10.3\displaystyle Y\sim Bern(0.5),\quad X_{1}|Y\sim N(Y,3.0),\quad X_{2}|Y,X_{1}% \sim N(sin(X_{1})-2Y+1,0.3)italic_Y ∼ italic_B italic_e italic_r italic_n ( 0.5 ) , italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_Y ∼ italic_N ( italic_Y , 3.0 ) , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_Y , italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_N ( italic_s italic_i italic_n ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - 2 italic_Y + 1 , 0.3 )

Pawelczyk

The DGP, taken from [15], is defined below, where I2subscript𝐼2I_{2}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT refers to the 2x22𝑥22x22 italic_x 2 identity matrix and Cat(13,13,13)𝐶𝑎𝑡131313Cat({\textstyle\frac{1}{3},\frac{1}{3},\frac{1}{3}})italic_C italic_a italic_t ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) to the uniform categorical distribution with values 0,1,20120,1,20 , 1 , 2.

L𝐿\displaystyle Litalic_L Cat(13,13,13)similar-toabsent𝐶𝑎𝑡131313\displaystyle\sim Cat({\textstyle\frac{1}{3},\frac{1}{3},\frac{1}{3}})∼ italic_C italic_a italic_t ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG )
X|μconditional𝑋𝜇\displaystyle X|\muitalic_X | italic_μ N(μ,I2),μ|L={(10,5)Tif L=0(0,5)Tif L=1(0,0)Totherwiseformulae-sequencesimilar-toabsent𝑁𝜇subscript𝐼2conditional𝜇𝐿casessuperscript105𝑇if 𝐿0superscript05𝑇if 𝐿1superscript00𝑇otherwise\displaystyle\sim N(\mu,I_{2}),\quad\mu|L=\begin{cases}(-10,5)^{T}&\text{if }L% =0\\ (0,5)^{T}&\text{if }L=1\\ (0,0)^{T}&\text{otherwise}\\ \end{cases}∼ italic_N ( italic_μ , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_μ | italic_L = { start_ROW start_CELL ( - 10 , 5 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL if italic_L = 0 end_CELL end_ROW start_ROW start_CELL ( 0 , 5 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL if italic_L = 1 end_CELL end_ROW start_ROW start_CELL ( 0 , 0 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW
Y(X)𝑌𝑋\displaystyle Y(X)italic_Y ( italic_X ) :=X2>6assignabsentsubscript𝑋26\displaystyle:=X_{2}>6:= italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 6

C.2 Randomly generated DGPs

For the generation of bn_5, bn_10, and bn_20, we randomly sample Bayesian networks with categorical and continuous distributions as well as linear and nonlinear relationships.

  1. 1.

    First, we randomly sample a Directed Acyclic Graph (DAG) using the networkx package. We select Y𝑌Yitalic_Y as the root node. To make sure that Y𝑌Yitalic_Y is related to many of the features, for each node that is not directly neighboring Y𝑌Yitalic_Y, a directed edge is added with probability 0.50.50.50.5 (directed such that the graph remains acyclic).

  2. 2.

    From all nodes, 20%percent2020\%20 % are randomly selected to be categorical nodes; Y𝑌Yitalic_Y is always selected to be a categorical node.

  3. 3.

    For every node j𝑗jitalic_j, an aggregation function g𝑔gitalic_g is sampled that maps the parent values xpa(j)subscript𝑥𝑝𝑎𝑗x_{pa(j)}italic_x start_POSTSUBSCRIPT italic_p italic_a ( italic_j ) end_POSTSUBSCRIPT to an aggregate, which then parameterizes the distribution of the respective node.

    g(x)=β+β1h(x)+β2h(x)2withh(x)=sin(ipa(j)wixi)formulae-sequence𝑔𝑥𝛽subscript𝛽1𝑥subscript𝛽2superscript𝑥2with𝑥𝑠𝑖𝑛subscript𝑖𝑝𝑎𝑗subscript𝑤𝑖subscript𝑥𝑖g(x)=\beta+\beta_{1}h(x)+\beta_{2}h(x)^{2}\quad\text{with}\quad h(x)=sin\left(% \sum_{i\in pa(j)}w_{i}x_{i}\right)italic_g ( italic_x ) = italic_β + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_h ( italic_x ) + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_h ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with italic_h ( italic_x ) = italic_s italic_i italic_n ( ∑ start_POSTSUBSCRIPT italic_i ∈ italic_p italic_a ( italic_j ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

    The weights w𝑤witalic_w are sampled from Unif(1,1)+3Bern(3/d)𝑈𝑛𝑖𝑓113𝐵𝑒𝑟𝑛3𝑑Unif(-1,1)+3Bern(3/d)italic_U italic_n italic_i italic_f ( - 1 , 1 ) + 3 italic_B italic_e italic_r italic_n ( 3 / italic_d ). To make it more likely that Y𝑌Yitalic_Y can be predicted well from its covariates, weights concerning Y𝑌Yitalic_Y are increased by dUnif(3,4)similar-to𝑑𝑈𝑛𝑖𝑓34d\sim Unif(3,4)italic_d ∼ italic_U italic_n italic_i italic_f ( 3 , 4 ) with probability 0.10.10.10.1. The weight vector w𝑤witalic_w is normalized. The coefficients β𝛽\betaitalic_β are sampled from Unif(1,1)𝑈𝑛𝑖𝑓11Unif(-1,1)italic_U italic_n italic_i italic_f ( - 1 , 1 ).

  4. 4.

    If the feature is categorical, the respective Bernoulli is parameterized with the sigmoid of the aggregate of the parents Bern(ς(g(x)))𝐵𝑒𝑟𝑛𝜍𝑔𝑥Bern(\varsigma(g(x)))italic_B italic_e italic_r italic_n ( italic_ς ( italic_g ( italic_x ) ) ). Continuous features follow N(μ,σ)𝑁𝜇𝜎N(\mu,\sigma)italic_N ( italic_μ , italic_σ ) with μN(g(x),1)similar-to𝜇𝑁𝑔𝑥1\mu\sim N(g(x),1)italic_μ ∼ italic_N ( italic_g ( italic_x ) , 1 ) and σN(0,2)similar-to𝜎𝑁02\sigma\sim N(0,2)italic_σ ∼ italic_N ( 0 , 2 ).

To ensure that a prediction model fitted on the data can discriminate between the classes and that changing the prediction to the desirable class is feasible, we randomly generated datasets until we found one with balanced labels (0.4<E[Y]<0.60.4𝐸delimited-[]𝑌0.60.4<E[Y]<0.60.4 < italic_E [ italic_Y ] < 0.6), and where a xgboost model demonstrated good accuracy (>0.95absent0.95>0.95> 0.95) and balanced predictions (0.3<E[Y^]<0.70.3𝐸delimited-[]^𝑌0.70.3<E[\hat{Y}]<0.70.3 < italic_E [ over^ start_ARG italic_Y end_ARG ] < 0.7).

Appendix D Additional empirical results

Refer to caption
(a) cassini
Refer to caption
(b) pawelczyk
Refer to caption
(c) two_sines
Refer to caption
(d) bn_5
Refer to caption
(e) bn_10
Refer to caption
(f) bn_20
Figure 4: Median empirical attainment function [47] for the negative plausibility and negative proximity. Lower values are better.
Refer to caption
(a) cassini
Refer to caption
(b) pawelczyk
Refer to caption
(c) two_sines
Refer to caption
(d) bn_5
Refer to caption
(e) bn_10
Refer to caption
(f) bn_20
Figure 5: Median empirical attainment function [47] for the negative plausibility and negative sparsity. Lower values are better.