CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests

Susanne Dandl

{}^{\;\,,1}

, Kristin Blesch^*^,2,3, Timo Freiesleben^*^,5, Gunnar König^*^,6,
Jan Kapar^2,3, Bernd Bischl¹, and Marvin N. Wright ^3,4,5 Equal contribution as first authors. ¹Munich Center for Machine Learning (MCML) and Department of Statistics, LMU Munich ²Leibniz Institute for Prevention Research & Epidemiology – BIPS ³Faculty of Mathematics and Computer Science, University of Bremen ⁴Department of Public Health, University of Copenhagen ⁵Cluster: Machine Learning for Science, University of Tübingen ⁶Tübingen AI Center and University of Tübingen
[email protected]

Abstract

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

Keywords counterfactual explanations $\cdot$ explainable artificial intelligence $\cdot$ interpretable machine learning $\cdot$ adversarial random forest $\cdot$ tabular data $\cdot$ plausibility $\cdot$ model-agnostic.

1 Introduction

Machine learning (ML) algorithms are increasingly used in high-stakes scenarios. For example, they help to decide whether you receive a loan, if you are suitable for a job, or even which disease you are diagnosed with. While ML-based systems are powerful at detecting complex patterns in data, the reasoning behind their predictions is often not easy to discern for humans. Many ML models are black boxes with a complex mathematical structure that do not follow transparent logical rules [1].

The emerging field of interpretable machine learning (IML) (also known as explainable artificial intelligence or XAI for short) promises to open up these black boxes and aims to make the decisions of ML models transparent to humans (see [2, 3] for overviews). A particularly simple approach is to explain algorithmic decisions to end-users via so-called counterfactual explanations [4].

Example: Imagine you apply for a loan. You enter characteristics such as your age, salary, loan amount, etc. in the online application form and after a few seconds you receive the decision – your loan application has been denied. A counterfactual explanation could be: If your salary had been €5,000 higher, your loan would have been approved.

More generally, a counterfactual explanation points to a close alternative scenario (the so-called counterfactual) that, in contrast to the actual scenario, would have resulted in the desired outcome. Counterfactual explanations may be employed for various purposes, such as hel** to guide a person’s actions [5, 6], enabling them to contest adverse decisions [7], and providing insights into the decision behavior of the model [8]. For all these goals, counterfactuals must be plausible, which means the alternative scenarios they depict are realistic. For instance, in the example above, suggesting a negative loan amount or a real estate loan with an amount of €500 would not be very plausible counterfactuals.

When adding plausibility as another objective for generating counterfactuals, its trade-off with proximity, i.e., that the counterfactual is close to the point of interest, should be taken into account. Dandl et al. [9] were one of the first to address this trade-off by framing the counterfactual search as a multi-objective optimization problem. Their approach – multi-objective counterfactual explanations (MOC) – returns not just a single counterfactual, but a Pareto set of counterfactuals, which is advisable to account for the Rashomon effect, i.e., that multiple, diverse, equally good counterfactuals may exist [10].

An intuitive approach to plausibility is searching for only those counterfactuals that are close to actual instances in the dataset [11]. To operationalize this goal, one objective in MOC minimizes the distance between counterfactuals and the actual instances. However, as presented in Section 3.1, this approach has its limitations if, for example, there are low-density gaps close to $\mathbf{x}^{*}$ between high-density regions. Other approaches model plausibility via the joint probability density. They rely on computationally intensive neural network architectures such as variational autoencoders (VAEs) [12, 13, 14, 15] or generative adversarial networks (GANs) [16, 17]. While these architectures have merits for high-dimensional tensor data (e.g., images or text), they are less suited for tabular data (see our discussion in Section 3.2).

Contributions

We leverage a tree-based technique from generative modeling called adversarial random forests (ARF) [18] to generate plausible counterfactuals in a mixed (i.e., categorical and continuous) tabular data setting. We call these countARFactuals and propose two model-agnostic algorithms to generate them:

1.

We integrate ARF into the multi-objective counterfactual explanation (MOC) framework [9] to speed up the counterfactual search and find more plausible counterfactuals (see Section 4.1).
2.

We tailor ARF to directly generate plausible counterfactuals without an optimization algorithm (see Section 4.2).

A simulation study shows the advantages in plausibility and efficiency of our ARF-based approaches compared to competing methods (Section 5). Moreover, we apply our method on a real-world dataset, namely to explain coffee quality predictions (Section 6).

2 Related Work

There is widespread agreement in the counterfactual community that plausibility is an important concern [19, 11, 5, 20, 21, 22]. Various suggestions have been made to incorporate plausibility into the counterfactual search, for example using causal knowledge [6, 14], case-based reasoning [23], outlier detectors [24], restricting the search space [25], imputing feature combinations from real instances [26], respecting paths between datapoints [27], or, as described above, staying close to the training data [9].

Many define plausibility theoretically through the joint probability density [22]. Some works rely on VAEs or standard autoencoders: they directly generate counterfactuals [14, 15], use VAEs in the optimization [13] or just for measuring plausibility [12]. Other works rely on GANs to generate counterfactuals [17, 16]. However, these approaches differ substantially from our work, as they are tailored for neural network models [14], focus only on plausibility thereby ignoring other objectives like sparsity [14, 15] (see Section 3.1), or work only for continuous data [13, 16]. The closest works to ours are Brughmans et al. [12] and Dandl et al. [9]. Both are designed to generate plausible and sparse counterfactuals in mixed tabular data settings. Brughmans et al. [12] use the autoencoder reconstruction loss as a plausibility measure and Dandl et al. [9] use the distance to the $k-$ nearest neighbors to evaluate plausibility. We show in our experiments in Section 5 that utilizing ARF to generate counterfactuals improves plausibility compared to those approaches while being computationally fast.

3 Background

Before we present our approaches, we provide background on the two methods we build upon: multi-objective counterfactual explanations (MOC) [9] and adversarial random forests (ARF) [18].

We consider a supervised learning setup with a binary classification or regression problem.¹¹1Our framework also generalizes to multi-class problems; we restrict ourselves here only for the sake of simplicity and notation. $\mathcal{X}$ denotes a $p$ -dimensional feature space. The respective vector $\mathbf{X}:=(X_{1},\dots,X_{p})^{T}$ of random variables may contain both continuous and categorical features. With $Y\in\mathbb{R}$ , we denote a random variable reflecting the outcome. In case of a binary classification model, we restrict $Y$ to $\{0,1\}$ .

To predict $Y$ from $\mathbf{X}$ , we trained an ML model $\hat{f}:\mathcal{X}\rightarrow\mathbb{R}$ on a dataset $D_{\text{train}}:=\{(\mathbf{x}^{(1)},y^{(1)}),\dots,(\mathbf{x}^{(n_{\text{% train}})},y^{(n_{\text{train}})})\}$ with $n_{\text{train}}$ observations. For binary classification, the model output is restricted to $\hat{f}(\mathbf{x})\in[0,1]$ , reflecting the probability for $Y=1$ . Most counterfactual explanation methods require access to a dataset for generating counterfactuals. To reflect that this dataset can differ to $D_{\text{train}}$ , we denote it as $D$ in the following and assume it to consist of $n$ observations.

3.1 Multi-objective counterfactual explanations

Suppose we want to explain why a certain data point of interest $\mathbf{x}^{*}$ was predicted as $\hat{f}(\mathbf{x}^{*})$ instead of a desired prediction within $Y_{des}\subset\mathbb{R}$ . Wachter et al. [4] define counterfactuals as the closest possible input vector $\mathbf{x}^{cf}$ to $\mathbf{x}^{*}$ according to some distance on $\mathcal{X}$ such that $\hat{f}(\mathbf{x}^{cf})\in Y_{des}$ . This definition does not explicitly demand sparse or plausible changes. When integrating all these desiderata into an objective to generate counterfactuals, trade-offs between the different objectives must be taken into account since the objectives conflict each other. Figure 1(a) illustrates this for the properties plausibility and proximity to the original instance $\mathbf{x}^{*}$ . If all high-density regions are far away from the decision boundary, enforcing proximity leads to unrealistic counterfactuals.

Refer to caption — (a) Plausibility-proximity trade-off

To consider these trade-offs, Dandl et al. [9] turned the search for counterfactuals into a multi-objective optimization problem:

\mathbf{x}^{cf}\in\underset{\mathbf{x}\in\mathcal{X}}{\operatorname*{arg\,min}% }\left(o_{\text{valid}}(\hat{f}(\mathbf{x}),Y_{des}),o_{\text{prox}}(\mathbf{x% },\mathbf{x}^{*}),o_{\text{plaus}}(\mathbf{x},D),o_{\text{sparse}}(\mathbf{x},% \mathbf{x}^{*})\right).

(1)

The different objectives denote:

Validity: Counterfactuals should have a predicted outcome in $Y_{des}$

o_{\text{valid}}(\hat{f}(\mathbf{x}),Y_{des}):=\underset{y\in Y_{des}}{\inf}|% \hat{f}(\mathbf{x})-y|.

Proximity: Counterfactuals should be close to $\mathbf{x}^{*}$ according to the Gower distance $d_{\text{Gower}}$ [28]

o_{\text{prox}}(\mathbf{x},\mathbf{x}^{*}):=d_{\text{Gower}}(\mathbf{x},% \mathbf{x}^{*}).

(2)

Plausibility: Counterfactuals should describe a realistic data instance, with $\mathbf{x}^{[1]},\dots,\mathbf{x}^{[k]}$ indicating the $k-$ nearest neighbors to $\mathbf{x}$ within data $D$ and $w_{i}$ denoting weights with $\sum_{i=1}^{k}w_{i}=1$

o_{\text{plaus}}(\mathbf{x},D):=\sum\limits_{i=1}^{k}w_{i}d_{\text{Gower}}(% \mathbf{x},\mathbf{x}^{[i]}).

(3)

Sparsity: Counterfactuals should vary from $\mathbf{x}^{*}$ in only a few features

o_{\text{sparse}}(\mathbf{x},\mathbf{x}^{*}):=\|\mathbf{x}-\mathbf{x}^{*}\|_{0% }=\frac{1}{p}\sum\limits_{j=1}^{p}\mathbbm{1}_{x_{j}\neq x_{j}^{*}}.

Dandl et al. [9] adapted the nondominated sorting genetic algorithm or short NSGA-II of Deb et al. [29] to solve the multi-objective optimization problem. This algorithm follows three steps:

1.

It generates a set of candidate instances close to the point of interest $\mathbf{x}^{*}$ . Among these, it recombines and mutates the candidates that perform best according to the above criteria. Per default, the mutator does not take feature dependencies into account. To enhance plausibility, mutation can be optionally performed by sampling from conditional distributions learned on $D$ by conditional trees [30] – we refer to this MOC version as MOCCTREE.
2.

Both new and old candidates are ranked using nondominated and crowding distance sorting. Nondominated sorting ranks according to optimality with respect to the above objectives (with the option to penalize invalid counterfactuals) and crowding distance ranks according to diversity.
3.

Based on these rankings, optimal and diverse candidates are selected for the next iteration. The search for counterfactuals ends after either a fixed number of predefined iterations or when the generated counterfactuals are not significantly better according to the hypervolume of the objectives above. As a final step, the algorithm outputs the Pareto optimal set of counterfactuals over the generations.

The conceptualization of plausibility as Equation 3 has its limitations as, e.g., illustrated in Figure 1(b): With $k=1$ (the default in MOC), counterfactuals with low values in Equation 3 might still end up in low-density regions.

3.2 Generative modeling and adversarial random forests

Generative modeling is concerned with models that generate synthetic data $\tilde{D}$ that mimic the appearance of real data $D$ . A well-known approach are VAEs [31], which encode original data instances into a set of low-dimensional distribution parameters and then reconstruct these instances with a decoder neural network from samples of these distributions. Another common technique are GANs [32], where two different neural network models play a zero-sum game – the generator network aims to generate realistic instances, and the discriminator network aims to discriminate these instances from real data. Other generative models based on neural networks include normalizing flows [33], diffusion probabilistic models [34] and transformer-based models [35] (see [36, 37] for overviews). While there exist adaptions of neural network models to tabular data, tree-based approaches may be better suited [38, 39, 40].

ARFs are a tree-based procedure for generative modeling [18]. The ARF approach is similar to the approach of GANs, however, instead of a neural network as a base learner, ARF relies on random forests. An ARF is trained in three steps: (1) Fitting an unsupervised random forest [41], which generates a naive synthetic dataset $\tilde{D}_{1}$ and subsequently trains a random forest $\hat{g}_{1}$ to distinguish between $D$ and $\tilde{D}_{1}$ . (2) Sampling feature values marginally from the instances in the leaves of $\hat{g}_{1}$ to obtain a more realistic synthetic dataset $\tilde{D}_{2}$ . Another random forest $\hat{g}_{2}$ is trained to distinguish between $D$ and $\tilde{D}_{2}$ . (3) This process is repeated until the random forest classifier can no longer distinguish synthetic from real data. We denote the final ARF model as $\hat{g}^{*}$ . As opposed to GANs, ARFs allow for both density estimation and generative modeling. The two algorithms are called forests for density estimation (FORDE) and forests for generative modeling (FORGE), respectively.

Density estimation with FORDE

leverages the mutual independence across features in the leaves after algorithm convergence, which allows to model the joint density $p(x)$ as a mixture of univariate feature densities:

\text{FORDE}(\mathbf{x})\coloneqq\hat{p}(\mathbf{x})=\sum_{l:\mathbf{x}\in X_{% l}}\pi_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(x_{j}),

(4)

where $X_{l}$ is the hyperrectangle defined by the $l$ -th leaf, the corresponding mixture weights $\pi_{l}$ are calculated as the share of real datapoints that fall into leaf $l$ normalized over all trees, and $\hat{p}_{l,j}$ are (locally) estimated univariate density/mass functions for the $j$ -th feature in leaf $l$ . The convergence of FORDE to the real data distribution of $\mathbf{X}$ for infinite data is proven under some mild conditions in Watson et al. [18]. A conditional density under a set of conditions $\mathcal{C}$ , e.g., fixed values or intervals for certain features $C\subseteq\{1,...,p\}$ , can be derived from Equation 4 in the following way:

\text{FORDE}(\mathbf{x}\mid\mathcal{C})\coloneqq\hat{p}(\mathbf{x}\mid\mathcal% {C})=\sum_{l:\mathbf{x}\in X_{l}}\pi^{\prime}_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(% x_{j}\mid\mathcal{C}_{j}),

(5)

where $\mathcal{C}_{j}\subseteq\mathcal{C}$ denotes the subset of conditions concerning feature $j\in C$ , and the mixture weights $\pi^{\prime}_{l}$ are updated to reflect how likely their corresponding leaves fulfill the condition. More formally, the mixture weights are updated and normalized using the univariate marginals by

\pi^{\prime}_{l}\coloneqq\frac{\pi_{l}\prod_{j=1}^{p}\hat{p}_{l,j}(\mathcal{C}% _{j})}{\sum_{m:\mathbf{x}\in X_{m}}\pi_{m}\prod_{j=1}^{p}\hat{p}_{m,j}(% \mathcal{C}_{j})}

if the denominator does not equal $0$ and by $\pi^{\prime}_{l}\coloneqq 0$ otherwise. Note that in the case of conditioning on a fixed value or interval for a continuous feature $j$ , the univariate densities $\hat{p}_{l,j}$ collapse to the indicator function $\mathbbm{1}_{\mathcal{C}_{j}}$ or the unconditional densities truncated on the conditioning interval, respectively.

Generative modeling with FORGE

is based on drawing a leaf $l$ from the forest according to the mixture weights in FORDE and sampling feature values from the estimated univariate (conditional) densities $\hat{p}_{l,j}$ . Thereby, FORGE allows to draw samples that adhere to FORDE as an approximation to the real distribution of $\mathbf{X}$ or $\mathbf{X}\mid\mathcal{C}$ .

4 Methods

Our proposal is to leverage ARF for the efficient generation of counterfactual explanations, i.e., countARFactuals, in mixed tabular data settings. More specifically, we use and modify ARF to account for the desiderata that we discussed in Section 3.1:

1.

Validity: We train ARF on $D$ but replace the target $Y$ with the predictions $\hat{Y}$ . Here, $\hat{Y}$ is treated just as any other feature in the data. Since FORGE allows for conditional sampling, we can sample from $\mathbf{X}$ conditioned on our desired outcomes $\hat{Y}\in Y_{des}$ . Note, however, that ARF may not learn a perfect representation of the prediction function $\hat{Y}:=\hat{f}(\mathbf{X})$ . It therefore is not guaranteed that ARF-samples are valid, it only becomes more likely. In our algorithms, we only return those candidates with predictions in $Y_{des}$ .
2.

Proximity: We restrict the output of our two methods to those counterfactuals in the Pareto set, defined over the four objectives of Section 3.1, including proximity (Equation 2). In the first algorithm described below, we additionally use ARF combined with MOC, which accounts for proximity, as described in Section 3.1.
3.

Plausibility: ARF allows us to both evaluate the plausibility of data points using FORDE (which is also used to determine the returned Pareto set) and efficiently generate plausible data with FORGE.
4.

Sparsity: FORGE allows to sample feature values $X_{S}$ based on the observation $X_{C}=x_{C}$ . By fixing certain features $C$ to the value of $\mathbf{x}^{*}_{C}$ , we only change feature values in the sparse set $S:=\{1,\dots,p\}\setminus C$ .

With the desiderata in place, several decisions need to be made: Should we integrate plausibility via density estimation (FORDE) or generative modeling (FORGE)? What is an optimal trade-off between proximity and other objectives, such as plausibility and sparsity? How should we search for the conditioning set $C$ for features that should not be changed? In the following, we provide two algorithms that decide on these questions in different ways. The first integrates ARF into MOC (Section 4.1). The second uses ARF as a standalone counterfactual generator (Section 4.2).

4.1 Algorithm 1: Integrating ARF into MOC

In MOC’s optimization problem (Equation 1), we substitute the plausibility measure (Equation 3) by the density estimator of FORDE (Equation 4). Since the individual objectives in MOC must map to a zero-one interval (with low values denoting desired properties), we transform $\hat{p}(\mathbf{x})$ , as estimated by FORDE, with the negative exponential function

o_{\text{plaus}}^{*}(\mathbf{x}):=e^{-\hat{p}(\mathbf{x})}.

(6)

We use FORGE as described above to sample plausible candidates in MOC in the mutation step of the NSGA-II. This is a strategy to efficiently limit the search space of MOC to plausible counterfactuals. Concerning sparsity, we find the conditioning set $C$ through iterated mutation and recombination, just like in MOC, and we select candidates using NSGA-II according to optimality and diversity. At last, the output comprises only the valid Pareto-set of counterfactuals over the generations, i.e., counterfactuals that have a prediction in $Y_{des}$ and are not dominated by other candidates that were generated. For details, we refer to the pseudocode in Appendix A.

4.2 Algorithm 2: ARF is all you need

For this algorithm, we leverage the ability of our modified ARF sampler to directly and efficiently generate many relevant counterfactuals. As described above, the modified FORGE method allows to generate plausible data points. To enforce sparsity, we sample $m$ features with probabilities according to their local feature importance, calculated as the standard deviation of the individual conditional expectation (ICE) curve [9, 42]. The $m$ selected features describe the features $S$ we aim to change because they, according to the local feature importance, impact the prediction the most. The remaining features then form the conditioning set $C=\{1,\dots,p\}\setminus S$ .

As for Algorithm 1, we output only the valid and Pareto-optimal set of counterfactuals. The pseudocode for this method is given in Appendix B.

5 Experiments

We evaluate the quality of our proposed methods with respect to the following research questions:

RQ (1)

Do our proposed ARF-based methods generate more plausible counterfactuals compared to competing methods without major sacrifices in sparsity ( $o_{\text{sparse}}$ ), proximity ( $o_{\text{prox}}$ ) and the runtime?
RQ (2)

Does $o^{*}_{\text{plaus}}$ (Equation 6) better reflect the true plausibility compared to $o_{\text{plaus}}$ (Equation 3)?

To objectively evaluate the plausibility of the generated counterfactuals, we require access to the ground-truth likelihood. Because ground-truth likelihoods are usually unavailable for real-world data, we evaluate our methods on synthetic data. An illustrative real-world application follows in Section 6.

5.1 Data-generating process

For the experiments, we constructed three illustrative two-dimensional datasets, namely cassini (inspired by [43]), two sines (inspired by the two moons dataset), and three blobs (inspired by [15]). Moreover, we generated four datasets from randomly sampled Bayesian networks of dimensionality $5$ , $10$ , and $20$ , namely bn_5, bn_10, and bn_20, which all include both continuous and categorical features as well as nonlinear relationships. An XGBoost model was fitted on sampled datasets $D_{\text{train}}$ of size $5\,000$ [44]. For each data-generating process (DGP), ten additional points were sampled as instances of interest $\mathbf{x}^{*}$ . The counterfactual generation methods received access to newly sampled datasets $D$ of size $5\,000$ . Details on the dataset generation and model fit can be found in Appendix C and in the repository accompanying this paper.²²2https://github.com/bips-hb/countARFactuals.

5.2 Competing methods

We compare our proposed MOC version based on ARF of Section 4.1 (referred to as MOCARF) and the standalone ARF generator of Section 4.2 (referred to as ARF) to the following competitors: MOC and MOCCTREE (MOC with a conditional sampler, see Section 3.1) [9] and NICE [12] with a plausibility reward function (see Equation (4) in [12]). NICE generates counterfactuals by iteratively replacing one feature after the other in $\mathbf{x}^{*}$ by the values of $\mathbf{x}^{\text{nn}}$ , which denotes a nearest neighbor of $\mathbf{x}^{*}$ in $D$ with $\hat{f}(\mathbf{x}^{\text{nn}})\in Y_{des}$ . In each iteration, the algorithm keeps the feature change with the highest plausibility reward.

To allow for a fair comparison, all methods generate a set of counterfactual candidates. For NICE, we apply the extension of Dandl et al. [45]; instead of stop** the search once the point with the highest reward has a prediction in $Y_{des}$ , the search continues until $\mathbf{x}^{\text{nn}}$ is recovered and all intermediate instances with predictions in $Y_{des}$ are returned. If possible, we selected the hyperparameters for the methods such that each method generated an equal number of candidates – namely, $1\,000$ .³³3Specifying the exact number was possible for all methods besides NICE [45]. ARF requires a maximum set size for $S$ , reflecting how many features are maximally allowed to be changed. We set it according to the number of features $p$ as $m_{max}:=min(\lceil\sqrt{p}+3\rceil,p)$ . Since also for all MOC-based methods the maximum number can be specified, we used the same $m_{max}$ for MOC, MOCARF and MOCCTREE. For the evaluation, we focused only on the unique counterfactuals that have predictions in $Y_{des}$ . We further reduced this set to the Pareto set, i.e., the set of counterfactuals that are nondominated according to proximity ( $o_{\text{prox}}$ ), sparsity ( $o_{\text{sparse}}$ ) and plausibility. The definition of the plausibility objective differed between the methods, with $o^{\star}_{\text{plaus}}$ for ARF and MOCARF, $o^{\text{plaus}}$ for MOC and MOCCTREE, and the autoencoder reconstruction error for NICE (as proposed by [12]).

5.3 Evaluation criteria

To answer RQ (1), we evaluated the generated counterfactuals with respect to the ground-truth likelihood (denoted as plausibility, in the following), validity $o_{\text{valid}}$ , proximity $o_{\text{prox}}$ and sparsity $o_{\text{sparse}}$ (see Section 3.1). We aggregated the results per method, dataset and instance of interest $\mathbf{x}^{\star}$ by computing (scaled) dominated hypervolumes [46]. We also measured the number of nondominated counterfactuals and the runtime. To investigate the trade-off between plausibility and proximity, we also computed median attainment surfaces according to López-Ibáñez et al. [47] for each method and dataset. It reveals how the two objectives are distributed on average over the different $\mathbf{x}^{*}$ . To answer RQ (2), all generated counterfactuals were evaluated with respect to $o_{\text{plaus}}^{\star}$ and $o_{\text{plaus}}$ . Per method, dataset and $\mathbf{x}^{\star}$ , we computed Spearman-rank correlations between the true plausibility and $o_{\text{plaus}}^{\star}$ and between the true plausibility and $o_{\text{plaus}}$ . With a Wilcoxon signed rank test, we tested whether $o_{\text{plaus}}^{*}$ has higher correlations to the true plausibility than $o_{\text{plaus}}$ .

5.4 Results

Figure 2 presents the results for RQ (1) and shows the objective values per counterfactuals as well as the hypervolume, number of counterfactuals and runtime. On average, ARF and MOCARF generated more plausible counterfactuals compared to the other MOC-based approaches and NICE. In alignment with previous literature [9, 48], our results suggest that higher plausibility might be associated with lower proximity and sparsity. For further investigations on the trade-offs, Figure 4 and Figure 5 in Appendix D detail the median attainment surfaces per dataset and method. The plots reveal that ARF and MOCARF on average dominate the other methods in proximity, sparsity and plausibility, with the differences being greatest in plausibility. The hypervolume was on average similar for the different methods for low-dimensional datasets (ARF had lower hypervolumes in cassini due to its inferiority in proximity and sparsity), for higher-dimensional datasets (bn_10 and bn_20), ARF and MOCARF performed better than the competing methods. Concerning runtime, ARF generated counterfactuals the fastest on average, followed by NICE and MOC. MOCARF was faster than MOCCTREE for datasets with more than two features. The runtime differences increased with higher dimensional data. On average, ARF and MOCARF generated the largest set of nondominated counterfactuals compared to the other methods.

Considering RQ (2), the Wilcoxon rank sum test had a p-value close to $0$ ( $7.16e-06$ ), i.e., the correlation of our proposed plausibility measure $o_{\text{plaus}}^{*}$ to the true plausibility was significantly higher than that of $o_{\text{plaus}}$ . The median correlation to the true plausibility over all methods and datasets was 0.84 for $o_{\text{plaus}}^{*}$ and 0.69 for $o_{\text{plaus}}$ .

Overall, our study shows that on average our proposed methods – ARF and MOCARF – generate a more plausible set of counterfactuals compared to our competitors without major sacrifices in sparsity and proximity. Notably, ARF achieves this with superiority in runtimes.

6 Real Data Example

We illustrate our approach on the publicly available coffee quality dataset⁴⁴4https://github.com/jldbc/coffee-quality-database. The data details the characteristics of several Arabica coffee beans, such as the country of origin and altitude at which the beans were cultivated. Further, the dataset includes information on a quality review score (cup points) specified by an expert jury within the Coffee Quality Institute [49].

In this example, we use a random forest to predict coffee quality from selected, actionable characteristics of the coffee beans. For simplicity, we binarize the target score cup points. Aiming for balanced classes of good and bad quality, we use the dataset’s median value of cup points as a cut-off point, i.e.,

\texttt{quality}=\begin{cases}\text{{good}}&\text{if {cup points} $\geq$ % median({cup points})}\\ \text{{bad}}&\text{otherwise}.\end{cases}

(7)

For illustration, we generate counterfactual explanations for an instance of bad coffee quality, answering the question: Which characteristics would need to be changed to rate as good quality coffee?

This example illustrates the importance of taking into account the multiple objectives of counterfactual explanations, such as sparsity and plausibility. For example, a company that aims to improve the quality of their coffee may want to make as sparse changes to the coffee characteristics as possible for economic reasons. Similarly, some changes might not be plausible, think of changing the country of origin independently of the altitude of the coffee plantations or the variety of beans cultivated in the respective country (since the variety must suit the natural conditions in the respective country).

The generation of counterfactual explanations in this example is performed using Alg. 2 detailed in Section 4.2. In Figure 3, we present a set of the generated countARFactuals explanations for an instance of coffee beans belonging to the bad class that originate from Taiwan.

Figure 3 illustrates that countARFactuals yield plausible counterfactual explanations. For instance, for countARFactual #3, both the country of origin is changed from Taiwan to Colombia and the variety from Typica to Caturra. This seems reasonable because Typica was grown in only few Colombian instances in the training dataset, and instead, Caturra was the most frequently grown variety in Colombia. Further, the altitude at which the beans are grown is elevated only a little within Taiwanese countARFactuals (# 4 - 12), but more drastically for countries that – given the data – grow coffee on higher altitudes on average, such as Mexico (# 1) and Colombia (# 3).

7 Discussion

In this paper we show that adversarial random forests (ARF) can be modified to generate plausible counterfactuals, both as a subroutine to multi-objective counterfactual explanations (MOC) and as a standalone approach. Our experiments in Section 5 demonstrate that ARF can improve the plausibility of counterfactuals and the efficiency in their generation without substantially sacrificing other desiderata such as proximity and sparsity. In contrast to other generative modeling approaches for plausible counterfactuals, ARF handles mixed tabular data directly without, e.g., one-hot-encoding categorical features, thereby improving data-efficiency. Moreover, ARF-based counterfactual generation allows for sparsity via conditional sampling and is an off-the-shelf methodology that requires minimal efforts in tuning and computational resources.

Our work faces some limitations. For example, we define the plausibility of counterfactuals via the joint density. However, as highlighted by Keane et al. [20], there are different conceptualizations of plausibility, for example, based on the feasibility of actions or user perceived plausibility [5, 6, 27]. One might even question if staying in the manifold is always desirable, e.g., if changing the class requires extrapolation so should our counterfactuals. It should be noted that plausible counterfactuals, in general, cannot be interpreted as action recommendations. Although they provide hints about which alternative feature values would yield acceptance by the predictor, they do not guide the user on which interventions yield the desired change in the real world. To guide action, causal knowledge is required [50]. Furthermore, in the context of recourse, improvement of the underlying target is more desirable than acceptance by a specific predictor, which counterfactual explanations do not target [6].

Proximity and plausibility are conflicting objectives [12, 9]. Oftentimes, there is only little data close to the decision boundary, and jum** just over the boundary can lead to implausible counterfactuals [19]. A trade-off between the two objectives is desirable, which we implicitly address by generating a Pareto-optimal set of diverse counterfactuals. In future work, one could already incorporate such trade-offs in the counterfactual generation, e.g., by a parameter that directly controls for the proximity-plausibility trade-off. One option would be to set a threshold for plausibility instead of a trade-off parameter, as suggested by Brughmans et al. [12].

Like all works on counterfactual explanations, we face the Rashomon effect: There exist many plausible counterfactuals that explain the same data point. This raises the question of which one we should show to the user [20, 22]? As a bottom line, we return only the Pareto-optimal set of counterfactuals, which at least guarantees that no strictly dominated option is shown. In future work, integrating user preferences or considering additional objectives may improve the final selection.

Our framework is tailored for mixed tabular data settings. For other data modalities like image or text data, we advise for using neural network based approaches for density estimation and generative modeling such as VAEs and GANs. Finally, our framework is designed for binary classification and regression but can be extended to multi-class classification.

In future work, we plan to investigate the role of the ML model in the ARF approach to counterfactuals. We could also generate counterfactuals with ARF without the model by directly training ARF on $Y$ rather than the predictions $\hat{Y}$ . We would then get plausible counterfactuals that hint towards improvement instead of acceptance [6]. While such counterfactuals appear different from those discussed in the XAI literature so far, in fact, they essentially just turn the generative model that conditions on $\mathbf{X}=\mathbf{x}$ into a prediction algorithm.

Acknowledgments

MNW and KB were supported by the German Research Foundation (DFG), Grant Number 437611051. MNW was supported by the German Research Foundation (DFG), Grant Number 459360854. KB was supported by a PhD grant of the Minds, Media, Machines Integrated Graduate School Bremen. MNW and JK were supported by the U Bremen Research Alliance/AI Center for Health Care, financially supported by the Federal State of Bremen. GK and TF were supported by the German Research Foundation through the Cluster of Excellence “Machine Learning - New Perspectives for Science" (EXC 2064/1 number 390727645). TF has been supported by the Carl Zeiss Foundation through the project “Certification and Foundations of Safe Machine Learning Systems in Healthcare”.

References

[1] Jenna Burrell. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1):2053951715622512, 2016.
[2] Amina Adadi and Mohammed Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access, 6:52138–52160, 2018.
[3] Christoph Molnar. Interpretable Machine Learning. 2 edition, 2022.
[4] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31:841, 2017.
[5] Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. A survey of algorithmic recourse: Contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5):1–29, 2022.
[6] Gunnar König, Timo Freiesleben, and Moritz Grosse-Wentrup. Improvement-focused causal recourse (ICR). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11847–11855, 2023.
[7] Henrietta Lyons, Eduardo Velloso, and Tim Miller. Conceptualising contestability: Perspectives on contesting algorithmic decisions. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–25, 2021.
[8] Brent Mittelstadt, Chris Russell, and Sandra Wachter. Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pages 279–288, 2019.
[9] Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature, pages 448–469. Springer, 2020.
[10] Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001.
[11] Riccardo Guidotti. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery, pages 1–55, 2022.
[12] Dieter Brughmans, Pieter Leyman, and David Martens. NICE: An algorithm for nearest instance counterfactual explanations. Data Mining and Knowledge Discovery, pages 1–39, 2023.
[13] Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint arXiv:1907.09615, 2019.
[14] Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
[15] Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of the web conference 2020, pages 3126–3132, 2020.
[16] Daniel Nemirovsky, Nicolas Thiebaut, Ye Xu, and Abhishek Gupta. Countergan: Generating counterfactuals for real-time recourse and interpretability using residual gans. In Uncertainty in Artificial Intelligence, pages 1488–1497. PMLR, 2022.
[17] Arnaud Van Looveren, Janis Klaise, Giovanni Vacanti, and Oliver Cobb. Conditional generative models for counterfactual explanations. arXiv preprint arXiv:2101.10123, 2021.
[18] David S Watson, Kristin Blesch, Jan Kapar, and Marvin N Wright. Adversarial random forests for density estimation and generative modeling. In Proceedings of the $26^{th}$ International Conference on Artificial Intelligence and Statistics, pages 5357–5375. PMLR, 2023.
[19] Timo Freiesleben. The intriguing relation between counterfactual explanations and adversarial examples. Minds and Machines, 32(1):77–109, 2022.
[20] Mark T Keane, Eoin M Kenny, Eoin Delaney, and Barry Smyth. If only we had better counterfactual explanations: Five key deficits to rectify in the evaluation of counterfactual XAI techniques. arXiv preprint arXiv:2103.01035, 2021.
[21] Ilia Stepin, Jose M Alonso, Alejandro Catala, and Martín Pereira-Fariña. A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access, 9:11974–12001, 2021.
[22] Sahil Verma, Varich Boonsanong, Minh Hoang, Keegan E Hines, John P Dickerson, and Chirag Shah. Counterfactual explanations and algorithmic recourses for machine learning: A review. arXiv preprint arXiv:2010.10596, 2020.
[23] Mark T Keane and Barry Smyth. Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable ai (XAI). In Case-Based Reasoning Research and Development: 28th International Conference, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, pages 163–178. Springer, 2020.
[24] Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, and Hiroki Arimura. Dace: Distribution-aware counterfactual explanation by mixed-integer linear optimization. In IJCAI, pages 2855–2862, 2020.
[25] André Artelt and Barbara Hammer. Convex density constraints for computing plausible counterfactual explanations. In Artificial Neural Networks and Machine Learning–ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part I 29, pages 353–365. Springer, 2020.
[26] Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. In International Conference on Machine Learning, pages 2376–2384. PMLR, 2019.
[27] Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. Face: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 344–350, 2020.
[28] John C Gower. A general coefficient of similarity and some of its properties. Biometrics, 27(4):857–871, 1971.
[29] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
[30] Torsten Hothorn and Achim Zeileis. Predictive distribution modeling using transformation forests. Journal of Computational and Graphical Statistics, 30(4):1181–1196, March 2021.
[31] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the $2^{nd}$ International Conference on Learning Representations, 2014.
[32] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, 2014.
[33] Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the $32^{th}$ International Conference on Machine Learning, pages 1530–1538. PMLR, 2015.
[34] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020.
[35] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
[36] Sam Bond-Taylor, Adam Leach, Yang Long, and Chris G Willcocks. Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7327–7347, 2021.
[37] David Foster. Generative deep learning: Teaching machines to paint, write, compose, and play. O’Reilly Media, Inc., $2^{nd}$ edition, 2022.
[38] Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
[39] Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In Advances in Neural Information Processing Systems, volume 35, pages 507–520, 2022.
[40] Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022.
[41] Tao Shi and Steve Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1):118–138, 2006.
[42] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65, January 2015.
[43] Friedrich Leisch and Evgenia Dimitriadou. mlbench: Machine learning benchmark problems, 2021. R package version 2.1-3.1.
[44] Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, 2016. ACM.
[45] Susanne Dandl, Andreas Hofheinz, Martin Binder, Bernd Bischl, and Giuseppe Casalicchio. counterfactuals: An R package for counterfactual explanation methods, 2023.
[46] Eckart Zitzler and Lothar Thiele. Multiobjective optimization using evolutionary algorithms — A comparative case study, page 292–301. Springer Berlin Heidelberg, 1998.
[47] Manuel López-Ibáñez, Luís Paquete, and Thomas Stützle. Exploratory analysis of stochastic local search algorithms in biobjective optimization, page 209–222. Springer Berlin Heidelberg, 2010.
[48] Javier Del Ser, Alejandro Barredo-Arrieta, Natalia Díaz-Rodríguez, Francisco Herrera, Anna Saranti, and Andreas Holzinger. On generating trustworthy counterfactual explanations. Information Sciences, 655:119898, 2024.
[49] Coffee Quality Insitute. https://www.coffeeinstitute.org/. Last accessed: 2024-03-12.
[50] Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 353–362, 2021.

Appendix A Algorithm 1: Integrating ARF into MOC

The following pseudocode is based on Algorithm 1 in [45]. Blue lines highlight the steps that differ from the original MOC algorithm proposed by [9].

Algorithm 1 MOC with ARF-based Sampler and Evaluation

Inputs:
Datapoint to explain prediction for $\mathbf{x}^{\star}\in\mathcal{X}$
Desired outcome (range) $Y_{des}$
Prediction function $\hat{f}:\mathcal{X}\rightarrow\mathbb{R}$
Observed data $D$
ARF $\hat{g}^{*}$ trained on $(\mathbf{x}_{i},\hat{f}(\mathbf{x}_{i}))_{i=1}^{n}$ with $\mathbf{x}_{i}\in D$
Number of generations $n_{\text{generations}}$
Size of population $\mu$
Recombination and mutation methods including probabilities
Selection method for features in the conditioning set and initialization method
Stop** criterion
(Additional user inputs, e.g., range of numerical features, immutable features, distance function, see [9])

0: Initialize population

P_{0}

with

|P_{0}|=\mu

(ICE-curve-based, see [9])

0: Evaluate candidates according to four objectives:

•

Validity ( $L_{1}$ )
•

Sparsity ( $L_{0}$ )
•

Proximity (Gower distance)
•

Plausibility (ARF-based likelihood transformed with $e^{-x}$ )

0: Set

t=0

0: for

r\in\{1,...,n_{\text{iterations}}\}

C_{t}=

create_offspring(

P_{t}

|C_{t}|=\mu

with given probabilities

1.

Select best candidates (acc. to validity objective)
2.

Recombine these pairwise
3.

Mutate values jointly using $\hat{g}^{*}$ : generate new datapoints with FORGE

0: Combine parents and offspring

R_{t}=C_{t}\cup P_{t}

0: Assign candidates to a front according to their objective values:

(F_{1},F_{2},...,F_{m})=

nondominated_sorting(

R_{t}

)

0: for

i=1,...,m

0: Sort candidates acc. to diversity (objective and feature space):

\tilde{F}_{i}

= crowding_distance_sort(

F_{i}

)

0: end for

0: Set

P_{t+1}=\emptyset

and

i=1

0: while

|P_{t+1}|+|\tilde{F}_{i}|\leq\mu

P_{t+1}=P_{t+1}\cup\tilde{F}_{i}

0: i = i + 1

0: end while

0: Choose first

\mu-|P_{t+1}|

elements of

\tilde{F}_{i}

P_{t+1}=P_{t+1}\cup\tilde{F}_{i}[1:(\mu-|P_{t+1}|)]

t=t+1

0: end for

0: Return unique, non-dominated candidates of

\bigcup_{k=0}^{t}P_{k}\setminus\mathbf{x}^{\star}

with

\hat{f}(\mathbf{x}_{CF})\in Y_{des}

Appendix B Algorithm 2: ARF is all you need

Algorithm 2 ARF-based Counterfactual Generator

Inputs:
Datapoint to explain prediction for $\mathbf{x}^{\star}\in\mathcal{X}$
Desired outcome (range) $Y_{des}$
Prediction function $\hat{f}:\mathcal{X}\rightarrow\mathbb{R}$
Observed data $D$
ARF $\hat{g}^{*}$ trained on data $(\mathbf{x}_{i},\hat{f}(\mathbf{x}_{i}))_{i=1}^{n}$ with $\mathbf{x}_{i}\in D$
Maximum number of feature changes $m_{max}$
Number of iterations $n_{\text{iterations}}$
Number of samples generated in each iteration $n_{synth}$
(Additional user inputs, e.g., immutable features)

0: Derive local importances

(\text{fi}_{j})_{j=1}^{p}

for each feature

j\in\{1,...,p\}

(ICE-curve-based, see [9])

0: for

r\in\{1,...,n_{\text{iterations}}\}

m\leftarrow\texttt{sample}(1,...,m_{max})

0: Select set

C\subset\{1,...,p\}

by randomly sampling

m

features with probability proportional to how unimportant feature is

CF\leftarrow

sample

n_{synth}

observations with FORGE derived from

\hat{g}^{*}

under condition that

\forall j\in C:X_{j}=x_{j}^{\star}

\hat{Y}\in Y_{des}

\mathbf{X}_{CF}\leftarrow(\mathbf{X}_{CF},CF)

0: end for

0: Return unique, nondominated candidates

\mathbf{x}_{CF}\in\mathbf{X}_{CF}

with

\hat{f}(\mathbf{x}_{CF})\in Y_{des}

Appendix C Synthetic Data

As follows, we describe the three illustrative datasets as well as the sampling of the randomly generated data-generating processes. The code that was used to generate the datasets and pair plots visualizing their distribution can be found in the repository accompanying the paper (https://github.com/bips-hb/countARFactuals).⁵⁵5For an explanation of how to run the code, we refer to python/README.md. The visualization can be found in the folder python/visualizations/.

C.1 Illustrative datasets

Cassini

The DGP, inspired by [43], is defined as follows:

	$\displaystyle Y$	$\displaystyle\sim Y_{1}+Y_{1}Y_{2}\quad\text{with}\quad Y_{1}\sim Bern(2/3),\;% Y_{2}\sim Bern(0.5)$
	$\displaystyle X_{1}\|Y_{1}$	$\displaystyle\sim\begin{cases}N(0,0.2),&\text{if }Y_{1}=0\\ N(0,0.5),&\text{otherwise}\end{cases}$
	$\displaystyle X_{2}\|X_{1},Y_{1},Y_{2}$	$\displaystyle\sim\begin{cases}N(0,0.2)&\text{if }Y_{1}=0\\ N((-1)^{Y_{2}}cos(X_{1}),0.2)&\text{otherwise}\end{cases}$

Two Sines

The DGP, inspired by the two moons dataset, is specified as:

\displaystyle Y\sim Bern(0.5),\quad X_{1}|Y\sim N(Y,3.0),\quad X_{2}|Y,X_{1}% \sim N(sin(X_{1})-2Y+1,0.3)

Pawelczyk

The DGP, taken from [15], is defined below, where $I_{2}$ refers to the $2x2$ identity matrix and $Cat({\textstyle\frac{1}{3},\frac{1}{3},\frac{1}{3}})$ to the uniform categorical distribution with values $0,1,2$ .

	$\displaystyle L$	$\displaystyle\sim Cat({\textstyle\frac{1}{3},\frac{1}{3},\frac{1}{3}})$
	$\displaystyle X\|\mu$	$\displaystyle\sim N(\mu,I_{2}),\quad\mu\|L=\begin{cases}(-10,5)^{T}&\text{if }L% =0\\ (0,5)^{T}&\text{if }L=1\\ (0,0)^{T}&\text{otherwise}\\ \end{cases}$
	$\displaystyle Y(X)$	$\displaystyle:=X_{2}>6$

C.2 Randomly generated DGPs

For the generation of bn_5, bn_10, and bn_20, we randomly sample Bayesian networks with categorical and continuous distributions as well as linear and nonlinear relationships.

1.

First, we randomly sample a Directed Acyclic Graph (DAG) using the networkx package. We select $Y$ as the root node. To make sure that $Y$ is related to many of the features, for each node that is not directly neighboring $Y$ , a directed edge is added with probability $0.5$ (directed such that the graph remains acyclic).
2.

From all nodes, $20\%$ are randomly selected to be categorical nodes; $Y$ is always selected to be a categorical node.

For every node $j$ , an aggregation function $g$ is sampled that maps the parent values $x_{pa(j)}$ to an aggregate, which then parameterizes the distribution of the respective node.

g(x)=\beta+\beta_{1}h(x)+\beta_{2}h(x)^{2}\quad\text{with}\quad h(x)=sin\left(% \sum_{i\in pa(j)}w_{i}x_{i}\right)

The weights $w$ are sampled from $Unif(-1,1)+3Bern(3/d)$ . To make it more likely that $Y$ can be predicted well from its covariates, weights concerning $Y$ are increased by $d\sim Unif(3,4)$ with probability $0.1$ . The weight vector $w$ is normalized. The coefficients $\beta$ are sampled from $Unif(-1,1)$ .

4.

If the feature is categorical, the respective Bernoulli is parameterized with the sigmoid of the aggregate of the parents $Bern(\varsigma(g(x)))$ . Continuous features follow $N(\mu,\sigma)$ with $\mu\sim N(g(x),1)$ and $\sigma\sim N(0,2)$ .

To ensure that a prediction model fitted on the data can discriminate between the classes and that changing the prediction to the desirable class is feasible, we randomly generated datasets until we found one with balanced labels ( $0.4<E[Y]<0.6$ ), and where a xgboost model demonstrated good accuracy ( $>0.95$ ) and balanced predictions ( $0.3<E[\hat{Y}]<0.7$ ).

CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests

Abstract

1 Introduction

Contributions

2 Related Work

3 Background

3.1 Multi-objective counterfactual explanations

3.2 Generative modeling and adversarial random forests

Density estimation with FORDE

Generative modeling with FORGE

4 Methods

4.1 Algorithm 1: Integrating ARF into MOC

4.2 Algorithm 2: ARF is all you need

5 Experiments

5.1 Data-generating process

5.2 Competing methods

5.3 Evaluation criteria

5.4 Results

6 Real Data Example

7 Discussion

Acknowledgments

References

Appendix A Algorithm 1: Integrating ARF into MOC

Appendix B Algorithm 2: ARF is all you need

Appendix C Synthetic Data

C.1 Illustrative datasets

Cassini

Two Sines

Pawelczyk

C.2 Randomly generated DGPs

Appendix D Additional empirical results