\newmdenv

[innerlinewidth=0.5pt,roundcorner=4pt,innerleftmargin=4.25pt, innerrightmargin=4.25pt,innertopmargin=4.25pt,innerbottommargin=4.25pt, linecolor=myblue,backgroundcolor=myblue!25!white]mybluebox \newmdenv[innerlinewidth=0.5pt,roundcorner=4pt,innerleftmargin=4.25pt, innerrightmargin=4.25pt,innertopmargin=4.25pt,innerbottommargin=4.25pt, linecolor=mygreen,backgroundcolor=mygreen!25!white]mygreenbox

LIMEtree: Consistent and Faithful Multi-class Explanations

Kacper Sokol [email protected] 0000-0002-9869-5896 Intelligent Systems Laboratory, University of BristolUnited Kingdom and Peter Flach [email protected] 0000-0001-6857-5810 Intelligent Systems Laboratory, University of BristolUnited Kingdom

Abstract.

Explainable artificial intelligence provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a complete view may be difficult since they can present competing or contradictory evidence. To address this challenge we introduce the novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees – called LIMEtree – that offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. On top of strong fidelity guarantees, our implementation delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with respect to explainability desiderata, through quantitative experiments and via a pilot user study, on image and tabular data classification tasks, comparing it to LIME, which is a state-of-the-art surrogate explainer. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set of scenarios.

Model-agnostic • Post-hoc • Surrogate • Decision Tree • Explainability • Interpretability • Machine Learning • Artificial Intelligence

^†^†copyright: none

{mygreenbox}

\faLightbulbOHighlights

\faLemonO

The paper proposes a novel XAI paradigm to coherently explain predictions of multiple classes for a selected data point.
\faTree

This approach helps to mitigate automation bias and is compatible with the hypothesis-driven decision support conceptualisation of XAI.
\faLemonO

Our explainer – called LIMEtree – is a post-hoc, model-agnostic and data-universal surrogate based on multi-output regression trees.
\faTree

It achieves high – and in certain cases full – fidelity, thus offers robust, reliable and trustworthy explanations.

{mybluebox}\faGithub

Source Code
https://github.com/So-Cool/bLIMEy/tree/master/ECML-PKDD_2023

1. Introduction

Explainability of predictive systems based on artificial intelligence (AI) algorithms has become one of their most desirable properties (Rudin, 2019; Sokol and Flach, 2021). While a wide array of explanation types – supplemented by numerous techniques to generate them – is available (Guidotti et al., 2018), contrastive statements are dominant (Miller, 2018; Wachter et al., 2017; Poyiadzi et al., 2020; Romashov et al., 2022; Waa et al., 2018). Their particular realisation in the form of counterfactual examples is the most ubiquitous given its everyday usage among humans and solid foundations in social sciences (Miller, 2018) as well as its compliance with various legal frameworks (Wachter et al., 2017). Such insights are usually of the form: “Had certain aspects of the given case been different, the predictive model would behave like so instead.” The conditional part of this proposition usually prescribes a change in the feature vector of a particular data point, whereas the hypothetical fragment of the statement tends to capture the resulting difference in class prediction.

While offering a very appealing recipe for swaying an automated decision, these explanations are intrinsically restricted to a pair of outcomes, which may impact their utility, effectiveness and comprehensibility. They can either highlight an explicit contrast between two classes – “Why $A$ rather than $B$ ?” – or be implicit instead – “Why $A$ (as opposed to anything else)?” As a result, counterfactuals, but also single-class explanations more broadly, have been shown to simply justify conclusions of AI systems, which may be counterproductive as it implicitly limits the number of possibilities that the explainees consider, thus bias their perception, impede independent reasoning and yield unwarranted reliance on AI or prevent trust from develo** altogether (Byrne, 2023).

In human explainability this limitation can be overcome with follow-up questions, progressively exploring and narrowing down the scope of the lack of understanding until finally eliminating it. One could imagine generating multiple counterfactuals across all the foils to mimic this process, e.g., “Why $A$ (and not $B$ or $C$ )?”, “Why $A$ rather than $B$ ?”, “Why $A$ instead of $C$ ?”, “Why $B$ (and not $A$ or $C$ )?”, “Why $B$ instead of $A$ ?”, etc., for three outcomes $A$ , $B$ and $C$ . Other explainability methods could also be employed in this scenario to provide a wider gamut of insights varying in scope, complexity and explanation target. Such approaches embody the recent hypothesis-driven decision support conceptualisation of explainable AI (XAI), which aims to provide diverse evidence for a data-driven prediction instead of offering a recommendation to simply accept or reject a pre-selected AI decision (Miller, 2023); this process keeps the explainees engaged instead of displacing them, utilises their expertise, and mitigates over- and under-dependence on automation.

However, implementing this paradigm with current XAI tools is likely to fail given that they tend to generate independent insights whose one-class limitation prevents them from capturing and communicating a congruent bigger picture. The lack of a single origin and shared context may yield insights that do not overlap or are outright contradictory – different conditionals used by counterfactuals and disparate pieces of evidence output by other techniques – preventing the explainees from drawing coherent conclusions and adversely affecting their trust and decision-making capabilities (Weld and Bansal, 2019). While a promising research direction, to the best of our knowledge the challenge of generating inherently consistent explanations of multiple classes has neither been addressed for counterfactuals nor any other explanation type. In this paper we fill this gap by introducing the novel concept of multi-class explanations, where individual insights pertaining to different predictions (of a selected instance) originate from a single explanatory source.

To this end, we:

(i)

define a multi-class explainability optimisation objective;
(ii)

operationalise it in the form of a local surrogate;
(iii)

offer an algorithm for building multi-class explainers; and
(iv)

implement it with multi-output regression trees.

We evaluate our method – called LIMEtree – along three dimensions: analytical assessment of human-centred XAI desiderata; quantitative experiments on tabular and image data measuring explainer fidelity; and qualitative user study capturing explainees’ preferences. We choose to demonstrate multi-class explainability with a surrogate since this design yields an explainer that is post-hoc – i.e., capable of being retrofitted to pre-existing AI systems – model-agnostic – i.e., compatible with any predictive algorithm – and data-universal – i.e., suitable for tabular, text and image domains. Additionally, by using a tree as the surrogate, LIMEtree offers a broad range of explanation types such as model structure visualisation, feature importance, exemplars, logical rules, what-ifs, and, most importantly, counterfactuals (Sokol, 2021). This suite of investigative mechanisms supports diverse explanation scopes spanning model simplification, sub-space approximation and prediction rationales.

LIMEtree offers solutions to many shortcomings of currently available surrogate explainers in addition to addressing limitations found across the social and technical dimensions of XAI. Specifically, by using (shallow) regression trees as the surrogate models, it can guarantee full fidelity of the explanations with respect to the investigated black box under certain conditions, thus addressing one of the major criticisms of post-hoc approaches (Rudin, 2019). The flexible explanation generation process additionally enables it to comply with a range of desiderata such as feasibility and actionability (Poyiadzi et al., 2020) as well as facilitate algorithmic recourse (Karimi et al., 2021), to name just a few (Sokol and Flach, 2020a). The availability of multiple diverse explanation types also allows it to provide explainability to a broad range stakeholders and satisfy their diverse needs. With all of these contributions we hope to launch multi-class explainability as a novel XAI research direction.

Refer to caption — (a) Interpretable representation.

2. Related Work and Background

LIMEtree builds upon two prominent findings in XAI: counterfactuals (Miller, 2018; Wachter et al., 2017) and surrogate explainers (Ribeiro et al., 2016; Sokol et al., 2019; Sokol and Flach, 2024; Sokol, 2021; Sokol et al., 2022a). As noted earlier, the former are lauded for their human-centred aspects, and the latter exhibit numerous appealing technical properties, making them one of the most flexible type of explainers. In a nutshell, surrogates mimic the behaviour of more complex, hence opaque, predictive systems either locally or globally with simpler, inherently interpretable models, thereby offering human-comprehensible insights into their operation (Craven and Shavlik, 1996; Ribeiro et al., 2016). Unlike surrogates and counterfactuals, multi-class explainability is a largely under-explored topic. While counterfactual explanations can be generated for multiple classes (Carlevaro et al., 2023), such insights may not present a coherent perspective given that they can be conditioned on different sets of features. One of the very few pieces of work – if not the only – that directly addresses this challenge expands Generalised Additive Models (GAMs (Hastie and Tibshirani, 1986)) – which are inherently transparent and powerful predictors popular in high stakes domains (Lou et al., 2012) – to multiple classes (Zhang et al., 2019b).

LIME (Ribeiro et al., 2016) is one of the most popular surrogate approaches; it uses sparse linear regression to explain (probabilistic) black-box predictions. It augments the classic paradigm of surrogate explainers with interpretable representations (IR) of raw data, which makes them compatible with a variety of data domains (such as images and text) and extends their applicability beyond inherently interpretable features (of tabular data). High modularity and flexibility of these explainers (Sokol et al., 2019) encouraged the research community to compose their different variants, some of which use decision trees as the (local) surrogate model (Waa et al., 2018; Shi et al., 2019; Sokol and Flach, 2024; Sokol, 2021). For example, Waa et al. (2018) showed how a local one-vs-rest classification tree can be used to produce contrastive explanations; and Shi et al. (2019) fitted a local shallow regression tree whose structure constitutes an explanation. Interpretability of decision trees and their ensembles have also been investigated outside of the surrogate explainability context (Tolomei et al., 2017; Sokol and Flach, 2018, 2020b; Sokol, 2021). Sokol and Flach (2018, 2020b) demonstrated how to interactively extract personalised counterfactuals from a decision tree, and Tolomei et al. (2017) introduced a method to explain predictions made by tree ensembles also with counterfactuals.

More specifically, LIME builds a local surrogate model $g\in\mathcal{G}$ to explain the prediction of an instance $\mathring{x}\in\mathcal{X}$ with respect to a selected class $c$ for a probabilistic black box $f:\mathcal{X}\mapsto\mathcal{Y}$ , where $\mathcal{G}$ is the space of (sparse linear) surrogate models, $\mathcal{X}$ is the input data domain, $\mathcal{Y}=[0,1]^{n}$ for $n\in\mathcal{N}^{+}$ target classes, and $c\in[1,\ldots,n]$ . To this end, it employs a user-defined interpretable representation transformation function $\mathit{IR}:\mathcal{X}\mapsto\mathcal{X}^{\prime}$ , which encodes presence ( $1$ ) and absence ( $0$ ) of $d\in\mathcal{N}^{+}$ selected human-comprehensible concepts found in a data point $x\in\mathcal{X}$ , i.e., $\mathcal{X}^{\prime}=\{0,1\}^{d}$ . Additionally, $\mathit{IR}$ is defined such that the explained instance is assumed to have all of the concepts present, i.e., $\mathit{IR}(\mathring{x})=\mathring{x}^{\prime}=[1,\ldots,1]$ , which is an all- $1$ vector. This step allows us to generate “conceptual” variations of $\mathring{x}$ by drawing a collection of binary vectors $X^{\prime}=\{x^{\prime}:x^{\prime}\in\mathcal{X}^{\prime}\}$ .

Next, $X^{\prime}$ is converted back to the original data domain $\mathcal{X}$ using the inverse of the interpretable representation transformation function $\mathit{IR}^{-1}:\mathcal{X}^{\prime}\mapsto\mathcal{X}$ , i.e., $X=\{\mathit{IR}^{-1}(x^{\prime}):x^{\prime}\in X^{\prime}\}$ , which facilitates predicting these instances with the explained black box $f$ , focusing on the probabilities of the explained class $c$ , i.e., $Y_{c}=\{f_{c}(x):x\in X\}$ . These predictions capture the influence of (the presence of) each human-comprehensible concept on the (change in) prediction of class $c$ . We can quantify this dependence by fitting sparse linear regression to the binary sample $X^{\prime}$ and probabilities $Y_{c}$ . This procedure can be focused on a specific aspect of the data sample by computing its distance $\ell$ to the explained instance either in the original or interpretable representation – i.e., $\ell:\mathcal{X}\times\mathcal{X}\mapsto\mathbb{R}$ or $\ell:\mathcal{X}^{\prime}\times\mathcal{X}^{\prime}\mapsto\mathbb{R}$ – then transformed into a similarity measure by passing it through a kernel $\kappa:\mathbb{R}\mapsto\mathbb{R}$ and used as weight factor for training the surrogate model. This step allows to prioritise smaller changes to the instance, e.g., give more significance to samples with fewer alterations in the concept space.

LIME optimises fidelity of the surrogate, i.e., its ability to approximate the predictive behaviour of the explained black box, and complexity of the resulting explanation, i.e., its human-comprehensibility, which objective $\mathcal{O}$ is formalised in Equation 1. Complexity $\Omega$ , in case of linear models, is computed as the number of non-zero (or significantly larger than zero) coefficients $\Theta_{g}$ of the surrogate $g$ – see Equation 2. High fidelity entails small loss $\mathcal{L}$ – Equation 3 – calculated between the outputs of the black box $f$ and the surrogate $g$ , which is measured empirically on the data sampled “around” the explained instance. Individual loss components are weighted by similarity scores – $\omega(x;\;\mathring{x})$ for $x\in X$ or $\omega(x^{\prime};\;\mathring{x}^{\prime})$ for $x^{\prime}\in X^{\prime}$ depending on the domain – derived by kernelising distance between the explained instance and sampled data. This loss is inspired by Weighted Least Squares, where the weights are similarity scores.

(1)

\mathcal{O}(\mathcal{G};\;f)=\operatorname*{arg\,min}_{g\in\mathcal{G}}% \underbrace{\mathcal{L}(f,g)}_{\text{\makebox[42.0pt][c]{fidelity}}}+% \underbrace{\Omega(g)}_{\text{\makebox[42.0pt][c]{complexity}}}

(2)

\Omega(g)=\sum_{\theta\in\Theta_{g}}\mathds{1}(|\theta|>0)\;/\;|\Theta_{g}|

(3)

\displaystyle\begin{split}\mathcal{L}(f,g;\;X^{\prime},\mathring{x},c)&=\frac{% 1}{\sum_{x^{\prime}\in X^{\prime}}\omega\left(x^{\prime};\;\mathit{IR}(% \mathring{x})\right)}\\ &\sum_{x^{\prime}\in X^{\prime}}\omega\left(x^{\prime};\;\mathit{IR}(\mathring% {x})\right)\;\left(f_{c}\left(\mathit{IR}^{-1}(x^{\prime})\right)-g(x^{\prime}% )\right)^{2}\\ &\text{where}\quad\omega(x^{\prime};\;\mathring{x}^{\prime})=\kappa\left(\ell% \left(x^{\prime},\;\mathring{x}^{\prime}\right)\right)\end{split}

The precise definition of the interpretable representation transformation function $\mathit{IR}$ depends on the data domain. For text, $\mathit{IR}$ splits it into $d$ tokens, e.g., using the bag-of-words approach, whose presence ( $1$ ) or absence ( $0$ ) is encoded by $\mathcal{X}^{\prime}$ ; setting a component of this domain to $0$ is thus equivalent to removing a token from a text excerpt. For images, this domain transformation relies on super-pixel partition of a picture into $d$ non-overlap** patches whose binary vector encoding indicates whether a particular segment is preserved ( $1$ ) or discarded ( $0$ ); since parts of an image cannot be removed directly, an occlusion proxy that replaces selected patches with a predetermined colour is used. Figure 1 shows an interpretable representation of an image and its LIME explanations for the top three predictions. For tabular data, the $\mathit{IR}$ function is more complex; continuous features are first discretised and then, together with any remaining categorical attributes, binarised. The latter step assigns, separately for every feature, $1$ to the discrete partition where the explained instance is located, with all the other partitions merged and represented by $0$ . As a result, the map** between $\mathcal{X}^{\prime}$ and $\mathcal{X}$ tends to be non-deterministic, unlike the corresponding $\mathit{IR}$ transformation for image and text data. Further information about surrogate explainers – including their generalisation and in-depth analysis of individual building blocks – can be found in the literature (Sokol et al., 2019, 2022a; Sokol and Flach, 2024; Sokol, 2021).

3. LIMEtree

LIME fits a separate surrogate model to the probabilities of each class of interest. This makes the process of discovering the dependencies between multiple classes challenging as each explanation needs to be interpreted in isolation. A surrogate fitted to class $A$ is implicitly a one-vs-rest explainer since it can only answer questions about the probability of this single class, with the complementary probability $p(\neg A)=1-p(A)$ modelling the union of all the other classes $\neg A\equiv B\cup C\cup\cdots$ . Interpreting the magnitude of the probability $p(A)$ output by a surrogate trained for class $A$ can also be problematic when explaining multi-class black boxes. For example, if $p(A)\leq 0.5$ , we cannot be certain whether there is a single class $B$ with $p(B)>p(A)$ , or alternatively the combined probability of all the complementary classes $p(\neg A)$ is greater than or equal to $p(A)$ with no single class dominating over $p(A)$ .

Moreover, linear surrogates are unable to model target variables that are non-linear with respect to input features – a property that does not necessarily hold for high-level features such as the concepts encoded by IRs. Their high inter-dependence may also have adverse effects on explanation quality. Additionally, modelling probabilities with linear regression risks confusing the explainees who expect an output bounded between $0$ and $1$ but may be given a numerical prediction outside of this range.

We address the challenge of simultaneously explaining multiple classes of a prediction output by a probabilistic model by proposing a first-of-a-kind surrogate explainer based on multi-output regression trees. It facilitates multi-class modelling in a regression setting, allowing the surrogate to capture the interactions between multiple classes, hence explain them coherently. Each node of such a tree approximates the probabilities of every explained class – which level of detail is impossible to achieve with surrogate multi-class classifiers – thus reflecting how individual interventions in the interpretable domain affect the predictions. Figure 2 shows an example of a surrogate multi-output regression tree. This is a significant improvement over training a separate regression surrogate for each explained class, which may produce diverse, inconsistent, competing or contradictory explanations – thus risk confusing the explainees and put their trust at stake – whenever these models do not share a common tree structure or split on different feature subsets. Our contributions establish a new direction in XAI research – concerned with consistent and faithful explanations of multiple classes – and offer a pioneering method to address this challenge.

Moreover, using decision trees (Breiman et al., 1984) as surrogates overcomes the shortcomings identified when linear models are used to this end (Sokol et al., 2019; Sokol and Flach, 2024; Sokol, 2021). Trees neither presuppose independence of features nor existence of a linear relationship between them and the target variable. While surrogate regression trees that approximate the probability of a single class are guaranteed to output a number within the $[0,1]$ range – since the estimate is calculated as an average – this may not necessarily hold for multi-output trees. Approximating probabilities of multiple classes by averaging their values across a number of instances may yield estimates whose sum is greater than $1$ , nonetheless these values can be rescaled to avoid confusing the explainees.

While surrogates based on linear models are limited to (interpretable) feature influence explanations – see Figure 1 – employing trees offers a broad selection of diverse explanation types. These include: (1) visualisation of the tree structure; (2) tree-based (interpretable) feature importance (Gini importance (Breiman, 2001)); (3) logical conditions extracted from root-to-leaf paths; (4) exemplar explanations taken from the training data assigned to the same leaf; (5) answers to what-if questions generated based on the tree structure (e.g., by querying the model); and (6) counterfactualsretrieved by comparing and applying logical reasoning to different tree paths (Sokol, 2021). The first two explanation types uncover the behaviour of a black box in a given data sub-space; the remainder targets specific predictions. Since all six explanation types – see Section 5 for their examples – are derived from a single (surrogate) model, they are guaranteed to be coherent and their diversity should appeal to a wide range of audiences.

To ensure low complexity and high fidelity of our multi-output regression trees, we employ the optimisation objective $\mathcal{O}$ from Equation 1. Since we are using surrogate trees, we modify the model complexity function $\Omega$ to measure the depth or width (number of leaves) of the tree as given by Equation 4, where $d$ is the dimensionality of the binary interpretable domain $\mathcal{X}^{\prime}$ . This choice depends on the type of the explanation that we want to extract from the surrogate tree; e.g., depth may be preferred when visualising the tree structure or extracting decision rules. In some cases, such as unbalanced trees, optimising for width or a mixture of the two may be more desirable. We also adapt the loss function $\mathcal{L}$ to account for the surrogate tree $g$ outputting multiple values in a single prediction as shown in Equation 5, where $C\subseteq[1,\ldots,n]$ are the classes to be explained by $g$ , for which the $c$ subscript in $g_{c}(x^{\prime})$ indicates the prediction of a selected class $c\in C$ for the data point $x^{\prime}$ .

(4)

\Omega(g;\;d)=\frac{\text{depth}(g)}{d}\qquad\text{or}\qquad\Omega(g;\;d)=% \frac{\text{width}(g)}{2^{d}}

(5)

\begin{gathered}\mathcal{L}(f,g;\;X^{\prime},\mathring{x},C)=\frac{1}{\sum_{x^% {\prime}\in X^{\prime}}\omega(x^{\prime};\;\mathit{IR}(\mathring{x}))}\\ \sum_{x^{\prime}\in X^{\prime}}\left(\frac{\omega(x^{\prime};\;\mathit{IR}(% \mathring{x}))}{1+\mathds{1}(|C|>1)}\;\sum_{c\in C}\left(f_{c}\left(\mathit{IR% }^{-1}(x^{\prime})\right)-g_{c}(x^{\prime})\right)^{2}\right)\end{gathered}

Note that the inner sum over the explained classes $\sum_{c\in C}$ is normalised by $(1+\mathds{1}(|C|>1))^{-1}$ , which is $1$ when the surrogate is built for a single class and becomes $\nicefrac{{1}}{{2}}$ for more classes. The loss given by Equation 5 is thus equivalent to the one in Equation 3 in the former case, and in the latter the scaling factor ensures that the inner sum is between $0$ and $1$ since the biggest squared difference is $2$ , which happens when the predictions of $f$ and $g$ assign a probability of $1$ to two different classes, e.g., $[1,0,0]$ and $[0,0,1]$ . An additional assumption is that the sum of values predicted by each leaf of the surrogate tree is at most $1$ , which as noted earlier may in some cases require normalisation. In practice, the surrogate explainer is built by iteratively adding splits to a multi-output regression tree – thus incrementally increasing its complexity $\Omega(g;\;d)$ but also improving its predictive power – which allows to progressively minimise the loss $\mathcal{L}$ and optimise the objective $\mathcal{O}$ . This procedure terminates when the loss $\mathcal{L}$ (calculated with Equation 5) reaches a certain, user-defined level $\epsilon\in[0,1]$ , which corresponds to the fidelity of the local surrogate, i.e., $\mathcal{L}\left(f,g;\;X^{\prime},\mathring{x},C\right)\leq\epsilon$ .

4. Fidelity Guarantees

The flexibility of surrogate explainers – they are post-hoc, model-agnostic and, often, data-universal – also contributes to the instability and occasional unreliability of their explanations (Laugel et al., 2018; Zhang et al., 2019a; Sokol and Flach, 2024; Sokol et al., 2019, 2022a). Their subpar fidelity, i.e., the predictive coherence with respect to the underlying black box, is thus a major barrier for their uptake (Rudin, 2019). In addition to remedying the shortcomings of linear surrogates, LIMEtree comes with strong fidelity guarantees, which can be achieved in practice while kee** low explanation complexity.

To imbue LIMEtree with near-full or full fidelity, we identify the minimal IR set $X^{\prime}_{\textit{min},T}\subseteq\mathcal{X}^{\prime}$ . It is unique to a tree and composed of binary vectors $x^{\prime}_{\textit{min},t}$ drawn from the IR – one per leaf $t\in T$ of the surrogate tree – that have the least number of $0$ components while still being assigned to the leaf $t$ . The construction of this set is formalised in Definition 4.1 and can be understood as seeking instances with the highest number of human-interpretable concepts being present, e.g., minimal occlusion for images, for each leaf.

Definition 0 (Minimal Representation).

Assume a binary decision tree $g\in\mathcal{G}$ fitted to a binary $d$ -dimensional data space $\mathcal{X}^{\prime}=\{0,1\}^{d}$ , with $T$ denoting its set of leaves. This tree assigns a leaf $t\in T$ to a data point $x^{\prime}\in\mathcal{X}^{\prime}$ with the function $g_{\mathit{id}}(x^{\prime})=t$ . For a given tree leaf $t$ , its unique minimal data point $x^{\prime}_{\textit{min},t}$ is

x^{\prime}_{\textit{min},t}=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}% ^{\prime}}\sum_{i=1}^{d}x^{\prime}_{i}\quad\text{s.t.}\quad g_{\mathit{id}}(x^% {\prime})=t\text{,}

where $x^{\prime}_{i}$ is the $i$ ^th component of the binary vector $x^{\prime}$ . We can further define a minimal set of data points $X^{\prime}_{\textit{min},T}\subseteq\mathcal{X}^{\prime}$ – uniquely representing a tree $g$ and the set of its leaves $T$ – that is composed of all the minimal data points for this tree:

X^{\prime}_{\textit{min},T}=\{x^{\prime}_{\textit{min},t}:t\in T\}\text{.}

Next, we transform this minimal representation set $X^{\prime}_{\textit{min},T}$ from the interpretable into the original domain using the inverse of the IR transformation function: $X_{\textit{min},T}=\{\mathit{IR}^{-1}(x^{\prime}_{\textit{min},t}):x^{\prime}_% {\textit{min},t}\in X^{\prime}_{\textit{min},T}\}$ . We then predict class probabilities for each instance in $X_{\textit{min},T}$ with the black box $f$ and replace the values estimated by the surrogate tree with these probabilities for each leaf $t\in T$ , i.e., modify the surrogate tree by overriding its predictions. Doing so is only feasible for the tree leaves as the minimal data points for some of the splitting nodes are indistinguishable; e.g., all the nodes on the root-to-leaf path that decides every interpretable feature to be $1$ are non-unique and all would be represented by the unmodified explained instance.

This variant of LIMEtree – which we refer to as TREE – guarantees full fidelity of the surrogate tree with respect to the explanations derived from the tree structure such as counterfactuals and root-to-leaf decision rules (see Section 5 for their examples). However, for this property to hold the function $\mathit{IR}$ transforming data from their original domain into the interpretable representation has to be deterministic (Sokol and Flach, 2024), which holds for image and text but not for tabular data (refer back to Section 2). The rationale behind this claim is outlined in Lemma 1, which follows from the subsequent discussion.

Lemma 1 (Structural Fidelity).

A surrogate tree can achieve full fidelity with respect to the explanations derived from its structure – i.e., model-driven explanations – if the interpretable representation transformation function $\mathit{IR}$ is deterministic. Therefore, an instance $x\in\mathcal{X}$ can be translated into a unique point $\mathit{IR}(x)=x^{\prime}\in\mathcal{X}^{\prime}$ and vice versa $\mathit{IR}^{-1}(x^{\prime})=x$ , i.e., the map** is one-to-one.

Lemma 1 guarantees that each leaf in the surrogate tree is associated with only one data point $x_{\textit{min},t}$ in the original representation $\mathcal{X}$ . This instance is derived from the minimal interpretable data point $x^{\prime}_{\textit{min},t}$ by applying the inverse of the interpretable representation transformation function $\mathit{IR}^{-1}$ , i.e., $x_{\textit{min},t}=\mathit{IR}^{-1}(x^{\prime}_{\textit{min},t})$ . Therefore, $x_{\textit{min},t}$ represents the explained instance with the smallest possible number of concepts deleted from it such that $g_{\mathit{id}}(x^{\prime}_{\textit{min},t})=t$ . By assigning the probabilities predicted by the explained black box for each data point $x_{\textit{min},t}$ to the corresponding leaf $t$ of the surrogate, it achieves full fidelity for the minimal representation set $X_{\textit{min},T}$ , which is the backbone of model-driven explanations.

While such an approach ensures full fidelity of model-driven explanations, the same is not guaranteed for data-driven explanations such as answers to what-if questions, e.g., “What if concept $x^{\prime}_{i}$ is absent?” Root-to-leaf paths that do not condition on all of the binary interpretable features allow for more than one data point to be assigned to that leaf; e.g., for three binary features $[x^{\prime}_{1},x^{\prime}_{2},x^{\prime}_{3}]\in\{0,1\}^{3}$ , a root-to-leaf path with a $x^{\prime}_{1}<0.5\land x^{\prime}_{3}<0.5$ condition assigns $[0,0,0]$ and $[0,1,0]$ to this leaf. This observation motivates the minimal interpretable representation $X^{\prime}_{\textit{min},t}$ (Definition 4.1), which selects a single data point to represent each leaf thereby facilitating full fidelity of model-driven explanations without additional assumptions. However, for data-driven explanations to achieve full fidelity, the surrogate tree must faithfully model the entire interpretable feature space, i.e., have one leaf for every data point in $\mathcal{X}^{\prime}$ , which can be thought of as extreme overfitting. Since the cardinality of a binary $d$ -dimensional space $\mathbb{B}^{d}=\{0,1\}^{d}$ is given by $|\mathbb{B}^{d}|=2^{d}$ , and a complete and balanced binary decision tree of $2^{d}$ width (number of leaves) is $d$ deep, relaxing the tree complexity bound $\Omega$ accordingly guarantees full fidelity of all the explanations, which property is captured by Corollary 2.

Corollary 2 (Full Fidelity).

If the complexity bound (width) $\Omega$ of a surrogate tree $g$ is relaxed to equal the cardinality of the binary interpretable domain $\mathcal{X}^{\prime}$ , i.e., $\Omega(g;\;|\mathcal{X}^{\prime}|)=\frac{\text{width}(g)}{2^{|\mathcal{X}^{% \prime}|}}=\frac{2^{|\mathcal{X}^{\prime}|}}{2^{|\mathcal{X}^{\prime}|}}=1$ , then the surrogate is guaranteed to achieve full fidelity. This property applies to explanations that are both data-driven – i.e., derived from any data point in the interpretable representation – and model-driven – i.e., derived from the structure of the surrogate tree.

Therefore, a surrogate tree that guarantees faithfulness of model-driven explanations (Lemma 1) can only deliver trustworthy counterfactuals and exemplar explanations sourced from the minimal representation set. This may be an attractive alternative to more complex surrogate trees that additionally guarantee faithfulness of data-driven explanations (Corollary 2). The latter surrogate type, which usually yields deeper trees, can deliver a broader spectrum of trustworthy explanations: tree structure-based explanations, feature importance, decision rules (root-to-leaf paths), answers to what-if questions, and exemplar explanations based on any data point, in addition to counterfactuals.

5. Discussion

LIMEtree explanations are versatile and appealing but achieving their full fidelity presupposes a deterministic IR transformation function (Lemma 1) and a complete surrogate tree (Corollary 2). This is not a problem for image and text data since the corresponding IRs can be built to be deterministic and of low dimensionality (given by the number of desired human-comprehensible concepts). The IR of tabular data, however, is inherently non-deterministic (Sokol and Flach, 2024) – due to the many-to-one map** introduced by discretisation and binarisation (refer back to Section 2) – with its dimensionality equal to the size of the original feature space. Nonetheless, since uniquely for tabular data the surrogate tree can be trained directly on their original representation, thus implicitly constructing a locally faithful and meaningful IR instead of relying on an external one (Sokol et al., 2019; Sokol and Flach, 2024), the surrogate can be overfitted to maximise its fidelity. While LIMEtree offers a close approximation in both cases, full fidelity cannot be guaranteed since a complete surrogate tree is unable to achieve full coverage for non-deterministic IRs. The consequences of this shortcoming can be seen in our experimental results given later in Table 1, which shows that a complete surrogate tree – labelled TREE^† – can reach full fidelity for image but not for tabular data.

In practice, full fidelity of surrogates based on deterministic IRs (Lemma 1) is achieved by adjusting the sample size $|X^{\prime}|$ and relaxing the tree complexity bound $\Omega$ . Recall that a $d$ -dimensional binary interpretable representation $\mathcal{X}^{\prime}\equiv\mathbb{B}^{d}=\{0,1\}^{d}$ has $|\mathcal{X}^{\prime}|=2^{d}$ unique instances, and the width, i.e., the number of leaves, of a complete, balanced binary decision tree of depth $d$ is $2^{d}$ (Corollary 2). Therefore, we can use all of these data points – there is no benefit from oversampling – to easily train a local surrogate with its complexity bound $\Omega$ removed to allow complete trees of depth $d$ , i.e., with one leaf per instance, guaranteeing full fidelity and access to a diverse range of faithful and comprehensible explanations. The depth bound and the sample size can be adjust dynamically prior to training the surrogate to ensure its optimality since the size of the interpretable domain is known beforehand.

Since for images as well as text each dimension of the IR captures a human-comprehensible concept, their number is expected to be low, especially that tokens in text excerpts and segments in images do not have to be adjacent to constitute a single concept. For every additional feature in the interpretable space, the number of sampled data points doubles and the tree depth is incremented by one in order to provide the interpretable domain and the surrogate tree with enough capacity to preserve the full fidelity guarantee. While this exponential growth in the number of interpretable data points may seem overwhelming, training decision trees on binary data spaces is fast given the predetermined $\nicefrac{{1}}{{2}}$ split at every node. The exponential growth of the width of the surrogate tree that guarantees its full fidelity increases its complexity and can have adverse effects on the comprehensibility of some explanation types, however, as we show next, it does not affect the most important and versatile explanation kinds.

Guaranteeing full fidelity of a surrogate tree requires relaxing its complexity bound $\Omega$ , which the optimisation objective $\mathcal{O}$ tries to minimise (Equation 1). Since in this setting a moderate number of interpretable features may yield a relatively large tree, the increased complexity of the resulting explanations is concerning. While a complex surrogate tree may render the explanations based on its structure, e.g., model visualisations, incomprehensible, these are not the most appealing explanation types and their appreciation often requires AI expertise. The (interpretable) feature importance, what-if explanations, counterfactuals and exemplars are not affected by the tree size in any way and remain highly compact and comprehensible – see Figure 3 for some examples. Notably, a complete surrogate tree with full fidelity will produce more counterfactual explanations for every data point, making it more interpretable.

The decision rules – logical conditions extracted from root-to-leaf paths – may indeed become overwhelmingly long, in fact as long as the tree depth, however this does not impact all the data types equally and an appropriate presentation medium can alleviate this issue regardless of the tree complexity. For image and text data such rules will always be comprehensible, no matter their length, since they cannot have more literals than the dimensionality of the underlying interpretable domain, i.e., the number of segments for images and word-based tokens for text. Presenting this rule in the former case corresponds to displaying an image with its various segments occluded (e.g., see Figure 3(d)) and in the latter producing a text excerpt with selected tokens removed. For tabular data, however, these rules may become relatively long and incomprehensible since this domain lacks a similar human-friendly representation; the exception are root-to-leaf paths that impose multiple logical conditions on a single feature (in the original domain), allowing for their compression. Regardless of the presentation medium, a general criticism of rule-based explanations is the difficulty of understanding how each logical condition affects the prediction, making them less appealing than other explanation types.

In view of these observations, if explanations based on the structure of the surrogate tree are not required for image and text data, and additionally rule-based explanations are not needed for tabular data, the model complexity $\Omega$ does not have to be minimised. It can therefore be removed from the optimisation objective $\mathcal{O}$ given in Equation 1, paving the way for full surrogate fidelity.

6. Evaluation

We assess the explanatory power of LIMEtree with a multi-tier evaluation approach that consists of an assessment guided by XAI desiderata (Section 6.1) as well as functionally-grounded (Section 6.2) and human-grounded (Section 6.3) experiments (Doshi-Velez and Kim, 2017; Sokol and Flach, 2020a). The first judges our approach against a number of criteria important for XAI systems; the second involves a (synthetic) proxy task in which we compare the (numerical) fidelity of LIME with multiple variants of LIMEtree on image and tabular data; the third reports results of a pilot user study, which is based on image classification to enable straightforward qualitative evaluation of explanations by means of visual inspection, thus alleviating the need for technical expertise.

6.1. Desiderata

XAI systems follow two distinct steps: explanation generation and presentation, which separation allows us to better identify, evaluate and report the unique desiderata important at each stage (Sokol and Vogt, 2024). Given that LIMEtree is a surrogate explainer, the insights that it generates are post-hoc, therefore they may not reflect the true behaviour of the underlying black box (Rudin, 2019). This discrepancy – measured as fidelity – is an important indicator of explanation truthfulness, which should always be communicated to the explainees, especially in high stakes applications. While LIMEtree can achieve full fidelity without sacrificing explanation comprehensibility, this desideratum is limited to IRs that are deterministic. To take advantage of this property it is therefore important to design an IR that addresses the explainability needs of each particular use case, which may require additional effort to build such a bespoke module despite the explainer itself being model-agnostic (Sokol and Flach, 2024; Mittelstadt et al., 2019; Sokol et al., 2022a). More broadly, the truthfulness is a major advantage of our approach given that it allows to retrofit explainability into pre-existing black boxes. Whatever explanation type, presentation format and communication medium are chosen, this property guarantees that the explanatory insights are based on an accurate reflection of the black-box model’s behaviour.

Before reviewing desiderata of specific explanation types, we discuss a set of general properties that are expected of all explanatory insights (Sokol and Flach, 2020a). LIMEtree excels when it comes to explanation plurality and diversity – especially so given their consistency – allowing the explainees to explore distinct aspects of the underlying black box without running into spuriously contradictory observations, further improving the trustworthiness of its explanations. While some of them are inherently static, others can be operationalised within an interactive explanatory protocol (Sokol and Flach, 2020b), enabling the explainees to customise and personalise them in a natural way – see Figure 3 for examples. This breadth of explanatory insights and access to their source – the surrogate tree structure (Figure 2) – enables their contextualisation, which makes them particularly appealing since good explanations do not only communicate what information is used by a predictive model but also how it is used (Rudin, 2019).

By simultaneously accounting for multiple classes, LIMEtree offers a more comprehensive picture of the explained model’s predictive behaviour and facilitates user-driven exploration, which, as noted in Section 1, mitigates the severity of automation bias, especially so for counterfactuals (Byrne, 2023). Also, recall that our method is compatible with hypothesis-driven XAI since the breadth of its insights allows the explainees to consider multiple congruent explanations for different predictions of a given instance instead of only receiving a justification of the top prediction (Miller, 2023). Given that our method operates as a surrogate, we can freely tweak and tune the target, breadth and scope of its explanations by adjusting its configuration, which further adds to its flexibility (Sokol and Flach, 2020a; Sokol et al., 2019; Sokol and Flach, 2024; Sokol et al., 2022a; Sokol and Flach, 2020b).

While LIMEtree offers a broad spectrum of explanation types – whose diversity makes it appealing to a wide range of audiences – we anticipate the counterfactuals to be the most attractive given their ubiquity in XAI (Miller, 2018). Notably, these insights are ante-hoc with respect to the surrogate tree, therefore their truthfulness is guaranteed in this regard (Sokol and Vogt, 2023). Their generation procedure allows to account for plausibility and actionability of their conditional part as well as other (human-centred) properties that may be desired (Sokol and Flach, 2020a; Keane et al., 2021; Sokol and Flach, 2020b, 2018). Counterfactual explanations are known to be intrinsically comprehensible given their parsimony and low complexity, making them an attractive choice across a diverse range of applications (Sokol and Flach, 2020a; Miller, 2018).

$\times 10^{-2}$		ImageNet + Inception v3				CIFAR-10 + ResNet 56				CIFAR-100 + RepVGG
$\times 10^{-2}$		LIME	TREE@66%	TREE@75%	TREE^†	LIME	TREE@66%	TREE@75%	TREE^†	LIME	TREE@66%	TREE@75%	TREE^†
n^th top	1^st	$3.67\pm 2.18$	$\boldsymbol{0.60}\pm 0.61$	$0.64\pm 0.73$	$\boldsymbol{0}\pm 0$	$7.34\pm 2.96$	$\boldsymbol{2.17}\pm 1.25$	$2.77\pm 1.66$	$\boldsymbol{0}\pm 0$	$3.33\pm 1.80$	$\boldsymbol{0.59}\pm 0.56$	$0.66\pm 0.63$	$\boldsymbol{0}\pm 0$
	2^nd	$1.14\pm 1.77$	$\boldsymbol{0.24}\pm 0.42$	$0.25\pm 0.40$	$\boldsymbol{0}\pm 0$	$3.91\pm 3.98$	$\boldsymbol{1.28}\pm 1.31$	$1.69\pm 1.76$	$\boldsymbol{0}\pm 0$	$0.97\pm 1.46$	$\boldsymbol{0.24}\pm 0.36$	$0.26\pm 0.40$	$\boldsymbol{0}\pm 0$
	3^rd	$0.63\pm 1.36$	$\boldsymbol{0.13}\pm 0.25$	$0.16\pm 0.33$	$\boldsymbol{0}\pm 0$	$2.57\pm 3.37$	$\boldsymbol{0.89}\pm 1.15$	$1.10\pm 1.44$	$\boldsymbol{0}\pm 0$	$0.56\pm 1.13$	$\boldsymbol{0.14}\pm 0.29$	$0.16\pm 0.32$	$\boldsymbol{0}\pm 0$
top n	1	$3.67\pm 2.18$	$\boldsymbol{0.60}\pm 0.61$	$0.64\pm 0.73$	$\boldsymbol{0}\pm 0$	$7.34\pm 2.96$	$\boldsymbol{2.17}\pm 1.25$	$2.77\pm 1.66$	$\boldsymbol{0}\pm 0$	$3.33\pm 1.80$	$\boldsymbol{0.59}\pm 0.56$	$0.66\pm 0.63$	$\boldsymbol{0}\pm 0$
	2	$2.41\pm 1.40$	$\boldsymbol{0.42}\pm 0.42$	$0.44\pm 0.45$	$\boldsymbol{0}\pm 0$	$5.63\pm 2.69$	$\boldsymbol{1.73}\pm 1.03$	$2.23\pm 1.42$	$\boldsymbol{0}\pm 0$	$2.15\pm 1.15$	$\boldsymbol{0.41}\pm 0.36$	$0.46\pm 0.40$	$\boldsymbol{0}\pm 0$
	3	$2.72\pm 1.58$	$\boldsymbol{0.48}\pm 0.47$	$0.53\pm 0.50$	$\boldsymbol{0}\pm 0$	$6.91\pm 3.26$	$\boldsymbol{2.17}\pm 1.28$	$2.78\pm 1.73$	$\boldsymbol{0}\pm 0$	$2.42\pm 1.29$	$\boldsymbol{0.48}\pm 0.41$	$0.54\pm 0.45$	$\boldsymbol{0}\pm 0$

(a) Image data sets and the corresponding (pre-trained) neural networks (Chen, 2021): ImageNet (Deng et al., 2009) (1,659 samples, 256

\times

256 pixels, 1,000 classes) + Inception v3 (77% acc.); CIFAR-10 (Krizhevsky and Hinton, 2009) (9,714 samples, 32

\times

32 pixels, 10 classes) + ResNet 56 (94% acc.); and CIFAR-100 (Krizhevsky and Hinton, 2009) (9,665 samples, 32

\times

32 pixels, 100 classes) + RepVGG (77% acc.). We use all validation set images for which an interpretable representation can be built; however, for ImageNet we first pre-select images that are square and at least 256

\times

256, which we resize to these dimensions. The results are scaled up by

10^{2}

$\times 10^{-1}$		Wine + Logistic Regression				Forest Covertypes + Multilayer Perceptron
$\times 10^{-1}$		LIME	TREE@66%	TREE@100%	TREE^†	LIME	TREE@66%	TREE@100%	TREE^†
n^th top	1^st	$0.29\pm 0.27$	$\boldsymbol{0.08}\pm 0.11$	$5.54\pm 3.43$	$\boldsymbol{0.07}\pm 0.11$	$0.59\pm 0.26$	$\boldsymbol{0.06}\pm 0.06$	$4.56\pm 2.12$	$\boldsymbol{0.06}\pm 0.06$
	2^nd	$0.14\pm 0.16$	$\boldsymbol{0.03}\pm 0.04$	$2.35\pm 3.26$	$\boldsymbol{0.03}\pm 0.04$	$0.51\pm 0.29$	$\boldsymbol{0.05}\pm 0.05$	$1.88\pm 1.21$	$\boldsymbol{0.05}\pm 0.05$
	3^rd	$0.20\pm 0.28$	$\boldsymbol{0.07}\pm 0.12$	$3.73\pm 4.18$	$\boldsymbol{0.06}\pm 0.11$	$0.13\pm 0.21$	$\boldsymbol{0.02}\pm 0.04$	$0.57\pm 0.94$	$\boldsymbol{0.02}\pm 0.04$
top n	1	$0.29\pm 0.27$	$\boldsymbol{0.08}\pm 0.11$	$5.54\pm 3.43$	$\boldsymbol{0.07}\pm 0.11$	$0.59\pm 0.26$	$\boldsymbol{0.06}\pm 0.06$	$4.56\pm 2.12$	$\boldsymbol{0.06}\pm 0.06$
	2	$0.22\pm 0.19$	$\boldsymbol{0.06}\pm 0.07$	$3.94\pm 2.67$	$\boldsymbol{0.05}\pm 0.07$	$0.55\pm 0.26$	$\boldsymbol{0.06}\pm 0.05$	$3.22\pm 1.04$	$\boldsymbol{0.06}\pm 0.05$
	3	$0.32\pm 0.29$	$\boldsymbol{0.09}\pm 0.12$	$5.80\pm 3.56$	$\boldsymbol{0.08}\pm 0.12$	$0.62\pm 0.29$	$\boldsymbol{0.07}\pm 0.06$	$3.51\pm 1.09$	$\boldsymbol{0.07}\pm 0.06$

(b) Tabular data sets and the corresponding models (trained with scikit-learn (Pedregosa et al., 2011)): Wine (Aeberhard and Forina, 1991) (36 samples, 13 features, 3 classes) + Logistic Regression (93% balanced acc.); and Forest Covertypes (Blackard, 1998) (2,500 samples, 54 features, 7 classes) + Multilayer Perceptron (86% balanced acc.). For Wine we use all the test set samples; given their small number we repeated the study on the entire data set (178 samples) with comparable results. The Forest Covertypes test set has 116,203 samples, from which we draw a stratified subset of size 2,500. The results are scaled up by

10^{1}

Table 1. Fidelity loss (mean

\pm

standard deviation, smaller is better) computed: (n^th top) separately for each of the top three black-box predictions with the LIME loss (Equation 3); and (top n) collectively for the top one, two and three black-box predictions with the LIMEtree loss (Equation 5). We report results for (1(a)) three image and (1(b)) two tabular data sets with four surrogates: LIME and three variants of LIMEtree (TREE, TREE & TREE^†). The percentage shown after the explainer name specifies the tree complexity

\Omega

– i.e., its depth divided by its maximum possible depth determined by the number of features in the interpretable representation – at which loss is computed; TREE^† is equivalent to TREE@100%. See Figure 4 for examples of the loss behaviour.

6.2. Synthetic Experiments

We evaluate the trustworthiness and comprehensibility of LIMEtree explanations using the two components of the optimisation objective $\mathcal{O}$ (Equation 1) – fidelity $\mathcal{L}$ and complexity $\Omega$ – as computational proxies. The former measures the faithfulness of the surrogate with respect to the black box, i.e., its ability to mimic the black box, which is the only metric capable of reporting the reliability of all the diverse explanation types extracted from the surrogate. To this end, we employ the formulation of fidelity used by LIME (Equation 3) and LIMEtree (Equation 5); we compute this property when modelling the top three classes predicted by the black box for each test instance. We additionally analyse the complexity of LIMEtree surrogates (Equation 4), i.e., the tree depth normalised by the dimensionality of the IR, and compare it to the size, i.e., number of coefficients, of LIME surrogates (Equation 2).

We study three variants of LIMEtree, all of which minimise fidelity but differ in complexity constraints and post-processing:

TREE:: optimises a surrogate tree for complexity, i.e., it determines the shallowest tree that offers the desired level of fidelity;
TREE:: is a variant of TREE whose predictions are post-processed to guarantee full fidelity of model-driven explanations; and
TREE^†:: constructs a surrogate tree without any complexity constraints, allowing the algorithm to build a complete tree.

We compare the fidelity of these explainers to LIME with disabled feature selection, which allows it to achieve maximum fidelity at the expense of explanation size. Our study is limited to fidelity and complexity since XAI lacks metrics suitable for multi-class explainability or for cases when multiple explanation types are derived from a single source as well as for explanations that rely on probabilities instead of crisp predictions (to mitigate automation bias). LIME is our only baseline given the general lack of multi-class explainers or methods whose underlying surrogate model can be accessed.

Table 1, which reports the results of our evaluation, also summarises our experimental setup. We use a collection of popular multi-class image and tabular data sets; with the former we rely on a selection of pre-trained neural networks, and with the latter we split the data 80%–20% into stratified training and test sets and fit the models ourselves. LIME and LIMEtree are implemented following best practice described in the literature (Sokol et al., 2019; Sokol and Flach, 2024; Garreau and Luxburg, 2020; Sokol et al., 2020, 2022b, 2022a). For images we use an IR built upon SLIC (edge-based) segmentation (Achanta et al., 2012) with black colour occlusion as the information removal proxy; given its deterministic transformation function we operate directly on the binary interpretable domain and generate its full set of instances instead of their random sample to enable the surrogate to reach full fidelity. For tabular data we sample 10,000 data points around the explained instance in the original domain – using mixup, which is an explicitly local sampler that accounts for class labels (Zhang et al., 2017; Sokol et al., 2022b) – since the corresponding IR transformation function is non-deterministic; we use quartile-based discretisation applied to the data sample followed by binarisation as our interpretable domain. For images we use cosine distance measured in the IR, and for tabular data we use Euclidean distance measured in the original domain; we use the exponential kernel for both, with its parameter determined experimentally for each data set. Our code is available on GitHub¹¹1https://github.com/So-Cool/bLIMEy/tree/master/ECML-PKDD_2023 .

In our experiments LIME produces three independent linear surrogates, one per class; each LIMEtree variant is either built as a single surrogate that models all of the classes simultaneously (n^th top), or a separate surrogate is constructed for a one-, two- and three-class problem (top n). In deployment, however, LIMEtree fits only a single multi-output tree, whereas LIME requires as many models as explained classes. As a result, since both methods follow the same steps except for the surrogate model training phase, our method tends to be faster for relatively small trees given that they are fitted to binary data with feature thresholds fixed at $\nicefrac{{1}}{{2}}$ – up to the depth of 20 in our experiments – and becomes negligibly slower for large trees – requiring 250 milliseconds more than LIME for trees as deep as 40 – but these measures will fluctuate with the number of explained classes and the IR dimensionality. Since the number of interpretable features should be kept low to improve human comprehensibility of the explanations, which directly limits the surrogate tree depth, we expect LIMEtree to be faster in practice (Sokol and Flach, 2024).

To assess explanation quality we measure multi-class fidelity with the LIMEtree loss as well as the fidelity of each class separately with the LIME loss. The results summarised by Table 1 show that our base method – TREE – provides more faithful explanations than LIME at $\nicefrac{{2}}{{3}}$ of its complexity for tabular and image data. TREE – which post-processes the surrogate tree to facilitate full fidelity of model-driven explanations when the IR transformation function is deterministic – also surpasses LIME at $\nicefrac{{3}}{{4}}$ of its complexity for image data given their compliant IR, but its performance is degraded for tabular data even at full tree complexity (100%) due to the stochasticity of the underlying IR. TREE requires higher complexity, i.e., deeper trees, than TREE to achieve comparable fidelity since the post-processing step makes the surrogate faithful with respect to the minimal interpretable data points but at the same time sub-optimal for the remainder of the interpretable space, which is especially detrimental for stochastic IRs where each minimal interpretable data point corresponds to multiple instances in the original data domain.

The version of LIMEtree without a depth bound – TREE^†, which is equivalent to TREE@100% – achieves full fidelity across the board for a deterministic IR (images), where it faithfully models the entire interpretable data space by constructing one leaf per instance, but fails to do so for non-deterministic IR (tabular) because each tree leaf has to model multiple distinct data points. By allowing deeper trees we reduce the impurity of their leaves, which improves the overall performance of the surrogates – an intuitive relation between the complexity of the trees and their fidelity, two representative examples of which are shown in Figure 4.

6.3. Pilot User Study

To assess usefulness of our approach we ran a pilot user study with eight participants, exposing them to LIME (Figure 1) and LIMEtree (Figure 2) explanations in a random order without revealing the method’s name. The study consisted of two sections, one per explainer, displaying an image split into three segments, with each part enclosing a unique object, e.g., a cat, a dog and a ball. The two most pertinent black-box predictions for each object were then explained with both methods – e.g., tabby and tiger cat for the cat object, golden retriever and Labrador retriever for the dog object, and tennis ball and croquet ball for the ball object – yielding six LIME explanations and a single multi-output tree spanning all the six predictions. The participants were offered a brief tutorial illustrating how to parse the tree structure to obtain a variety of explanations.

The participants were then asked about the expected behaviour of the black box in relation to any two out of the three displayed objects for each explainer – six questions in total as the relations are assumed to be non-reflexive. For example, “How does the presence of the cat object affect the model’s confidence of a presence of the dog object?”, with three possible answers: confidence decreases, confidence not affected and confidence increases. This question formulation was chosen to avoid a bias towards either explainer since we could neither ask for importance or influence of each object on a particular prediction (LIME’s domain), nor the relation between an object and a prediction, e.g., a what-if question (LIMEtree’s domain). Before viewing the explanations, the participants were asked to answer a similar set of questions using only their intuition, which allowed us to assess whether the explainees still relied on their intuition when explicitly asked to work with the explanations.

Our findings indicate a negligible overlap between the responses based on the participants’ intuition and both explainers; they also show that LIMEtree helped the participants to answer 25% more of the questions correctly as compared to LIME. All of the participants indicated that using LIME was either easy or very easy, and at the same time rated the process of manually extracting LIMEtree explanations as either difficult or very difficult, despite many of the explainees having AI background. This disparity in conjunction with subpar performance when using LIME suggests that the explainees misinterpreted its explanations and were overconfident (Small et al., 2023; Xuan et al., 2023); good performance when working with LIMEtree despite the difficulty in using its explanations, on the other hand, is promising given that the process of extracting them can be easily automated.

7. Conclusion and Future Work

In this paper we introduced the concept of multi-class explainability and proposed a surrogate explainer based on multi-output regression trees called LIMEtree. We then analysed its various properties and guarantees, and showed how it can achieve full fidelity. Next, we demonstrated how LIMEtree improves upon LIME and discussed the benefits of using trees as the surrogate model. We supported these claims with an assessment of its properties based on XAI desiderata as well as a collection of quantitative experiments and a pilot user study. In future work we will implement methods to algorithmically extract human-centred explanations from (surrogate) trees and evaluate them with large-scale user studies.

Acknowledgements.

This research was supported by the TAILOR project, funded by EU Horizon 2020 research and innovation programme (grant agreement number 952215). We would also like to acknowledge contributions of Alexander Hepburn and Raul Santos-Rodriguez, who helped with the development of the code used for the experiments and offered insightful feedback.

References

(1)
Achanta et al. (2012) Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.
Aeberhard and Forina (1991) Stefan Aeberhard and M Forina. 1991. Wine. UCI Machine Learning Repository.
Blackard (1998) Jock Blackard. 1998. Forest Covertypes. UCI Machine Learning Repository.
Breiman (2001) Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.
Breiman et al. (1984) Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC Press.
Byrne (2023) Ruth MJ Byrne. 2023. Good explanations in explainable artificial intelligence (XAI): Evidence from human explanatory reasoning. In IJCAI. 6536–6544.
Carlevaro et al. (2023) Alberto Carlevaro, Marta Lenatti, Alessia Paglialonga, and Maurizio Mongelli. 2023. Multi-class counterfactual explanations using support vector data description. IEEE Transactions on Artificial Intelligence (2023).
Chen (2021) Yaofo Chen. 2021. PyTorch CIFAR models. https://github.com/chenyaofo/pytorch-cifar-models.
Craven and Shavlik (1996) Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems. 24–30.
Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
Doshi-Velez and Kim (2017) Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017). arXiv:1702.08608
Garreau and Luxburg (2020) Damien Garreau and Ulrike Luxburg. 2020. Explaining the explainer: A first theoretical analysis of LIME. In International Conference on Artificial Intelligence and Statistics. PMLR, 1287–1296.
Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) 51, 5 (2018), 1–42.
Hastie and Tibshirani (1986) Trevor Hastie and Robert Tibshirani. 1986. Generalized additive models. Statist. Sci. 1, 3 (1986), 297–310.
Karimi et al. (2021) Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 353–362.
Keane et al. (2021) Mark T Keane, Eoin M Kenny, Eoin Delaney, and Barry Smyth. 2021. If only we had better counterfactual explanations: Five key deficits to rectify in the evaluation of counterfactual XAI techniques. In IJCAI. 4466–4474.
Krizhevsky and Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images.
Laugel et al. (2018) Thibault Laugel, Xavier Renard, Marie-Jeanne Lesot, Christophe Marsala, and Marcin Detyniecki. 2018. Defining locality for surrogates in post-hoc interpretablity. In 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018).
Lou et al. (2012) Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 150–158.
Miller (2018) Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence (2018).
Miller (2023) Tim Miller. 2023. Explainable AI is dead, long live explainable AI! Hypothesis-driven decision support using evaluative AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 333–342.
Mittelstadt et al. (2019) Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the 2019 ACM Conference on Fairness, Accountability, and Transparency. 279–288.
Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
Poyiadzi et al. (2020) Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: Feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344–350.
Ribeiro et al. (2016) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22^nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144.
Romashov et al. (2022) Piotr Romashov, Martin Gjoreski, Kacper Sokol, Maria Vanina Martinez, and Marc Langheinrich. 2022. BayCon: Model-agnostic Bayesian counterfactual generator. In IJCAI. 740–746.
Rudin (2019) Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
Shi et al. (2019) Sheng Shi, Xinfeng Zhang, Haisheng Li, and Wei Fan. 2019. Explaining the predictions of any image classifier via decision trees. (2019). arXiv:1911.01058
Small et al. (2023) Edward Small, Yueqing Xuan, Danula Hettiachchi, and Kacper Sokol. 2023. Helpful, misleading or confusing: How humans perceive fundamental building blocks of artificial intelligence explanations. In Proceedings of the ACM CHI 2023 Workshop on Human-Centered Explainable AI (HCXAI).
Sokol (2021) Kacper Sokol. 2021. Towards intelligible and robust surrogate explainers: A decision tree perspective. Ph. D. Dissertation. University of Bristol.
Sokol and Flach (2020a) Kacper Sokol and Peter Flach. 2020a. Explainability fact sheets: A framework for systematic assessment of explainable approaches. In Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency. 56–67.
Sokol and Flach (2020b) Kacper Sokol and Peter Flach. 2020b. One explanation does not fit all: The promise of interactive explanations for machine learning transparency. KI-Künstliche Intelligenz (2020), 1–16.
Sokol and Flach (2021) Kacper Sokol and Peter Flach. 2021. Explainability is in the mind of the beholder: Establishing the foundations of explainable artificial intelligence. (2021). arXiv:2112.14466
Sokol and Flach (2024) Kacper Sokol and Peter Flach. 2024. Interpretable representations in explainable AI: From theory to practice. Data Mining and Knowledge Discovery (2024), 1–39.
Sokol and Flach (2018) Kacper Sokol and Peter A Flach. 2018. Glass-Box: Explaining AI decisions with counterfactual statements through conversation with a voice-enabled virtual assistant. In IJCAI. 5868–5870.
Sokol et al. (2020) Kacper Sokol, Alexander Hepburn, Rafael Poyiadzi, Matthew Clifford, Raul Santos-Rodriguez, and Peter Flach. 2020. FAT Forensics: A Python toolbox for implementing and deploying fairness, accountability and transparency algorithms in predictive systems. Journal of Open Source Software 5, 49 (2020), 1904.
Sokol et al. (2019) Kacper Sokol, Alexander Hepburn, Raul Santos-Rodriguez, and Peter Flach. 2019. bLIMEy: Surrogate prediction explanations beyond LIME. In 2019 Workshop on Human-Centric Machine Learning (HCML 2019) at the 33^rd Conference on Neural Information Processing Systems (NeurIPS 2019).
Sokol et al. (2022a) Kacper Sokol, Alexander Hepburn, Raul Santos-Rodriguez, and Peter Flach. 2022a. What and how of machine learning transparency: Building bespoke explainability tools with interoperable algorithmic components. Journal of Open Source Education 5, 58 (2022), 175.
Sokol et al. (2022b) Kacper Sokol, Raul Santos-Rodriguez, and Peter Flach. 2022b. FAT Forensics: A Python toolbox for algorithmic fairness, accountability and transparency. Software Impacts 14 (2022), 100406.
Sokol and Vogt (2023) Kacper Sokol and Julia E Vogt. 2023. (Un)reasonable allure of ante-hoc interpretability for high-stakes domains: Transparency is necessary but insufficient for comprehensibility. In 3^rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at 2023 International Conference on Machine Learning (ICML).
Sokol and Vogt (2024) Kacper Sokol and Julia E Vogt. 2024. What does evaluation of explainable artificial intelligence actually tell us? A case for compositional and contextual validation of XAI building blocks. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–8.
Tolomei et al. (2017) Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23^rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 465–474.
Waa et al. (2018) Jasper van der Waa, Marcel Robeer, J van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. 2018. Contrastive explanations with local foil trees. In 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018).
Wachter et al. (2017) Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GPDR. Harvard Journal of Law & Technology 31 (2017), 841.
Weld and Bansal (2019) Daniel S Weld and Gagan Bansal. 2019. The challenge of crafting intelligible intelligence. Commun. ACM 62, 6 (2019), 70–79.
Xuan et al. (2023) Yueqing Xuan, Edward Small, Kacper Sokol, Danula Hettiachchi, and Mark Sanderson. 2023. Can users correctly interpret machine learning explanations and simultaneously identify their limitations? (2023). arXiv:2309.08438
Zhang et al. (2017) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
Zhang et al. (2019b) Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, and Rich Caruana. 2019b. Axiomatic interpretability for multiclass additive models. In Proceedings of the 25^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226–234.
Zhang et al. (2019a) Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. 2019a. “Why should you trust my explanation?” Understanding uncertainty in LIME explanations. In AI for Social Good Workshop at the 36^th International Conference on Machine Learning (ICML 2019).