\newmdenv

[innerlinewidth=0.5pt,roundcorner=4pt,innerleftmargin=4.25pt, innerrightmargin=4.25pt,innertopmargin=4.25pt,innerbottommargin=4.25pt, linecolor=myblue,backgroundcolor=myblue!25!white]mybluebox \newmdenv[innerlinewidth=0.5pt,roundcorner=4pt,innerleftmargin=4.25pt, innerrightmargin=4.25pt,innertopmargin=4.25pt,innerbottommargin=4.25pt, linecolor=mygreen,backgroundcolor=mygreen!25!white]mygreenbox

LIMEtree: Consistent and Faithful Multi-class Explanations

Kacper Sokol [email protected] 0000-0002-9869-5896 Intelligent Systems Laboratory, University of BristolUnited Kingdom  and  Peter Flach [email protected] 0000-0001-6857-5810 Intelligent Systems Laboratory, University of BristolUnited Kingdom
Abstract.

Explainable artificial intelligence provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a complete view may be difficult since they can present competing or contradictory evidence. To address this challenge we introduce the novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees – called LIMEtree – that offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. On top of strong fidelity guarantees, our implementation delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with respect to explainability desiderata, through quantitative experiments and via a pilot user study, on image and tabular data classification tasks, comparing it to LIME, which is a state-of-the-art surrogate explainer. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set of scenarios.

Model-agnostic • Post-hoc • Surrogate • Decision Tree • Explainability • Interpretability • Machine Learning • Artificial Intelligence
copyright: none
{mygreenbox}

\faLightbulbOHighlights

  • \faLemonO

    The paper proposes a novel XAI paradigm to coherently explain predictions of multiple classes for a selected data point.

  • \faTree

    This approach helps to mitigate automation bias and is compatible with the hypothesis-driven decision support conceptualisation of XAI.

  • \faLemonO

    Our explainer – called LIMEtree – is a post-hoc, model-agnostic and data-universal surrogate based on multi-output regression trees.

  • \faTree

    It achieves high – and in certain cases full – fidelity, thus offers robust, reliable and trustworthy explanations.

{mybluebox}\faGithub

Source Code
https://github.com/So-Cool/bLIMEy/tree/master/ECML-PKDD_2023

1. Introduction

Explainability of predictive systems based on artificial intelligence (AI) algorithms has become one of their most desirable properties (Rudin, 2019; Sokol and Flach, 2021). While a wide array of explanation types – supplemented by numerous techniques to generate them – is available (Guidotti et al., 2018), contrastive statements are dominant (Miller, 2018; Wachter et al., 2017; Poyiadzi et al., 2020; Romashov et al., 2022; Waa et al., 2018). Their particular realisation in the form of counterfactual examples is the most ubiquitous given its everyday usage among humans and solid foundations in social sciences (Miller, 2018) as well as its compliance with various legal frameworks (Wachter et al., 2017). Such insights are usually of the form: “Had certain aspects of the given case been different, the predictive model would behave like so instead.” The conditional part of this proposition usually prescribes a change in the feature vector of a particular data point, whereas the hypothetical fragment of the statement tends to capture the resulting difference in class prediction.

While offering a very appealing recipe for swaying an automated decision, these explanations are intrinsically restricted to a pair of outcomes, which may impact their utility, effectiveness and comprehensibility. They can either highlight an explicit contrast between two classes – “Why A𝐴Aitalic_A rather than B𝐵Bitalic_B?” – or be implicit instead – “Why A𝐴Aitalic_A (as opposed to anything else)?” As a result, counterfactuals, but also single-class explanations more broadly, have been shown to simply justify conclusions of AI systems, which may be counterproductive as it implicitly limits the number of possibilities that the explainees consider, thus bias their perception, impede independent reasoning and yield unwarranted reliance on AI or prevent trust from develo** altogether (Byrne, 2023).

In human explainability this limitation can be overcome with follow-up questions, progressively exploring and narrowing down the scope of the lack of understanding until finally eliminating it. One could imagine generating multiple counterfactuals across all the foils to mimic this process, e.g., “Why A𝐴Aitalic_A (and not B𝐵Bitalic_B or C𝐶Citalic_C)?”, “Why A𝐴Aitalic_A rather than B𝐵Bitalic_B?”, “Why A𝐴Aitalic_A instead of C𝐶Citalic_C?”, “Why B𝐵Bitalic_B (and not A𝐴Aitalic_A or C𝐶Citalic_C)?”, “Why B𝐵Bitalic_B instead of A𝐴Aitalic_A?”, etc., for three outcomes A𝐴Aitalic_A, B𝐵Bitalic_B and C𝐶Citalic_C. Other explainability methods could also be employed in this scenario to provide a wider gamut of insights varying in scope, complexity and explanation target. Such approaches embody the recent hypothesis-driven decision support conceptualisation of explainable AI (XAI), which aims to provide diverse evidence for a data-driven prediction instead of offering a recommendation to simply accept or reject a pre-selected AI decision (Miller, 2023); this process keeps the explainees engaged instead of displacing them, utilises their expertise, and mitigates over- and under-dependence on automation.

However, implementing this paradigm with current XAI tools is likely to fail given that they tend to generate independent insights whose one-class limitation prevents them from capturing and communicating a congruent bigger picture. The lack of a single origin and shared context may yield insights that do not overlap or are outright contradictory – different conditionals used by counterfactuals and disparate pieces of evidence output by other techniques – preventing the explainees from drawing coherent conclusions and adversely affecting their trust and decision-making capabilities (Weld and Bansal, 2019). While a promising research direction, to the best of our knowledge the challenge of generating inherently consistent explanations of multiple classes has neither been addressed for counterfactuals nor any other explanation type. In this paper we fill this gap by introducing the novel concept of multi-class explanations, where individual insights pertaining to different predictions (of a selected instance) originate from a single explanatory source.

To this end, we:

  1. (i)

    define a multi-class explainability optimisation objective;

  2. (ii)

    operationalise it in the form of a local surrogate;

  3. (iii)

    offer an algorithm for building multi-class explainers; and

  4. (iv)

    implement it with multi-output regression trees.

We evaluate our method – called LIMEtree – along three dimensions: analytical assessment of human-centred XAI desiderata; quantitative experiments on tabular and image data measuring explainer fidelity; and qualitative user study capturing explainees’ preferences. We choose to demonstrate multi-class explainability with a surrogate since this design yields an explainer that is post-hoc – i.e., capable of being retrofitted to pre-existing AI systems – model-agnostic – i.e., compatible with any predictive algorithm – and data-universal – i.e., suitable for tabular, text and image domains. Additionally, by using a tree as the surrogate, LIMEtree offers a broad range of explanation types such as model structure visualisation, feature importance, exemplars, logical rules, what-ifs, and, most importantly, counterfactuals (Sokol, 2021). This suite of investigative mechanisms supports diverse explanation scopes spanning model simplification, sub-space approximation and prediction rationales.

LIMEtree offers solutions to many shortcomings of currently available surrogate explainers in addition to addressing limitations found across the social and technical dimensions of XAI. Specifically, by using (shallow) regression trees as the surrogate models, it can guarantee full fidelity of the explanations with respect to the investigated black box under certain conditions, thus addressing one of the major criticisms of post-hoc approaches (Rudin, 2019). The flexible explanation generation process additionally enables it to comply with a range of desiderata such as feasibility and actionability (Poyiadzi et al., 2020) as well as facilitate algorithmic recourse (Karimi et al., 2021), to name just a few (Sokol and Flach, 2020a). The availability of multiple diverse explanation types also allows it to provide explainability to a broad range stakeholders and satisfy their diverse needs. With all of these contributions we hope to launch multi-class explainability as a novel XAI research direction.

Refer to caption
(a) Interpretable representation.
Refer to caption
(b) Tennis ball (99.28%).
Refer to caption
(c) Golden retriever (0.67%).
Refer to caption
(d) Labrador retriever (0.04%).
Figure 1. LIME explanations for the top three classes predicted by a black-box model. Panel (1(a)) shows the super-pixel interpretable representation of the explained image with d=8𝑑8d=8italic_d = 8 segments. Panels (1(b)), (1(c)) and (1(d)) are LIME explanations, which capture the positive or negative influence of (the presence of) interpretable features on the prediction (probability) of a selected class.

2. Related Work and Background

LIMEtree builds upon two prominent findings in XAI: counterfactuals (Miller, 2018; Wachter et al., 2017) and surrogate explainers (Ribeiro et al., 2016; Sokol et al., 2019; Sokol and Flach, 2024; Sokol, 2021; Sokol et al., 2022a). As noted earlier, the former are lauded for their human-centred aspects, and the latter exhibit numerous appealing technical properties, making them one of the most flexible type of explainers. In a nutshell, surrogates mimic the behaviour of more complex, hence opaque, predictive systems either locally or globally with simpler, inherently interpretable models, thereby offering human-comprehensible insights into their operation (Craven and Shavlik, 1996; Ribeiro et al., 2016). Unlike surrogates and counterfactuals, multi-class explainability is a largely under-explored topic. While counterfactual explanations can be generated for multiple classes (Carlevaro et al., 2023), such insights may not present a coherent perspective given that they can be conditioned on different sets of features. One of the very few pieces of work – if not the only – that directly addresses this challenge expands Generalised Additive Models (GAMs (Hastie and Tibshirani, 1986)) – which are inherently transparent and powerful predictors popular in high stakes domains (Lou et al., 2012) – to multiple classes (Zhang et al., 2019b).

LIME (Ribeiro et al., 2016) is one of the most popular surrogate approaches; it uses sparse linear regression to explain (probabilistic) black-box predictions. It augments the classic paradigm of surrogate explainers with interpretable representations (IR) of raw data, which makes them compatible with a variety of data domains (such as images and text) and extends their applicability beyond inherently interpretable features (of tabular data). High modularity and flexibility of these explainers (Sokol et al., 2019) encouraged the research community to compose their different variants, some of which use decision trees as the (local) surrogate model (Waa et al., 2018; Shi et al., 2019; Sokol and Flach, 2024; Sokol, 2021). For example, Waa et al. (2018) showed how a local one-vs-rest classification tree can be used to produce contrastive explanations; and Shi et al. (2019) fitted a local shallow regression tree whose structure constitutes an explanation. Interpretability of decision trees and their ensembles have also been investigated outside of the surrogate explainability context (Tolomei et al., 2017; Sokol and Flach, 2018, 2020b; Sokol, 2021). Sokol and Flach (2018, 2020b) demonstrated how to interactively extract personalised counterfactuals from a decision tree, and Tolomei et al. (2017) introduced a method to explain predictions made by tree ensembles also with counterfactuals.

More specifically, LIME builds a local surrogate model g𝒢𝑔𝒢g\in\mathcal{G}italic_g ∈ caligraphic_G to explain the prediction of an instance x̊𝒳̊𝑥𝒳\mathring{x}\in\mathcal{X}over̊ start_ARG italic_x end_ARG ∈ caligraphic_X with respect to a selected class c𝑐citalic_c for a probabilistic black box f:𝒳𝒴:𝑓maps-to𝒳𝒴f:\mathcal{X}\mapsto\mathcal{Y}italic_f : caligraphic_X ↦ caligraphic_Y, where 𝒢𝒢\mathcal{G}caligraphic_G is the space of (sparse linear) surrogate models, 𝒳𝒳\mathcal{X}caligraphic_X is the input data domain, 𝒴=[0,1]n𝒴superscript01𝑛\mathcal{Y}=[0,1]^{n}caligraphic_Y = [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for n𝒩+𝑛superscript𝒩n\in\mathcal{N}^{+}italic_n ∈ caligraphic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT target classes, and c[1,,n]𝑐1𝑛c\in[1,\ldots,n]italic_c ∈ [ 1 , … , italic_n ]. To this end, it employs a user-defined interpretable representation transformation function 𝐼𝑅:𝒳𝒳:𝐼𝑅maps-to𝒳superscript𝒳\mathit{IR}:\mathcal{X}\mapsto\mathcal{X}^{\prime}italic_IR : caligraphic_X ↦ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which encodes presence (1111) and absence (00) of d𝒩+𝑑superscript𝒩d\in\mathcal{N}^{+}italic_d ∈ caligraphic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT selected human-comprehensible concepts found in a data point x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, i.e., 𝒳={0,1}dsuperscript𝒳superscript01𝑑\mathcal{X}^{\prime}=\{0,1\}^{d}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { 0 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Additionally, 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR is defined such that the explained instance is assumed to have all of the concepts present, i.e., 𝐼𝑅(x̊)=x̊=[1,,1]𝐼𝑅̊𝑥superscript̊𝑥11\mathit{IR}(\mathring{x})=\mathring{x}^{\prime}=[1,\ldots,1]italic_IR ( over̊ start_ARG italic_x end_ARG ) = over̊ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ 1 , … , 1 ], which is an all-1111 vector. This step allows us to generate “conceptual” variations of x̊̊𝑥\mathring{x}over̊ start_ARG italic_x end_ARG by drawing a collection of binary vectors X={x:x𝒳}superscript𝑋conditional-setsuperscript𝑥superscript𝑥superscript𝒳X^{\prime}=\{x^{\prime}:x^{\prime}\in\mathcal{X}^{\prime}\}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }.

Next, Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is converted back to the original data domain 𝒳𝒳\mathcal{X}caligraphic_X using the inverse of the interpretable representation transformation function 𝐼𝑅1:𝒳𝒳:superscript𝐼𝑅1maps-tosuperscript𝒳𝒳\mathit{IR}^{-1}:\mathcal{X}^{\prime}\mapsto\mathcal{X}italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ caligraphic_X, i.e., X={𝐼𝑅1(x):xX}𝑋conditional-setsuperscript𝐼𝑅1superscript𝑥superscript𝑥superscript𝑋X=\{\mathit{IR}^{-1}(x^{\prime}):x^{\prime}\in X^{\prime}\}italic_X = { italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) : italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }, which facilitates predicting these instances with the explained black box f𝑓fitalic_f, focusing on the probabilities of the explained class c𝑐citalic_c, i.e., Yc={fc(x):xX}subscript𝑌𝑐conditional-setsubscript𝑓𝑐𝑥𝑥𝑋Y_{c}=\{f_{c}(x):x\in X\}italic_Y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = { italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) : italic_x ∈ italic_X }. These predictions capture the influence of (the presence of) each human-comprehensible concept on the (change in) prediction of class c𝑐citalic_c. We can quantify this dependence by fitting sparse linear regression to the binary sample Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and probabilities Ycsubscript𝑌𝑐Y_{c}italic_Y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. This procedure can be focused on a specific aspect of the data sample by computing its distance \ellroman_ℓ to the explained instance either in the original or interpretable representation – i.e., :𝒳×𝒳:maps-to𝒳𝒳\ell:\mathcal{X}\times\mathcal{X}\mapsto\mathbb{R}roman_ℓ : caligraphic_X × caligraphic_X ↦ blackboard_R or :𝒳×𝒳:maps-tosuperscript𝒳superscript𝒳\ell:\mathcal{X}^{\prime}\times\mathcal{X}^{\prime}\mapsto\mathbb{R}roman_ℓ : caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ blackboard_R – then transformed into a similarity measure by passing it through a kernel κ::𝜅maps-to\kappa:\mathbb{R}\mapsto\mathbb{R}italic_κ : blackboard_R ↦ blackboard_R and used as weight factor for training the surrogate model. This step allows to prioritise smaller changes to the instance, e.g., give more significance to samples with fewer alterations in the concept space.

LIME optimises fidelity of the surrogate, i.e., its ability to approximate the predictive behaviour of the explained black box, and complexity of the resulting explanation, i.e., its human-comprehensibility, which objective 𝒪𝒪\mathcal{O}caligraphic_O is formalised in Equation 1. Complexity ΩΩ\Omegaroman_Ω, in case of linear models, is computed as the number of non-zero (or significantly larger than zero) coefficients ΘgsubscriptΘ𝑔\Theta_{g}roman_Θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT of the surrogate g𝑔gitalic_g – see Equation 2. High fidelity entails small loss \mathcal{L}caligraphic_L – Equation 3 – calculated between the outputs of the black box f𝑓fitalic_f and the surrogate g𝑔gitalic_g, which is measured empirically on the data sampled “around” the explained instance. Individual loss components are weighted by similarity scores – ω(x;x̊)𝜔𝑥̊𝑥\omega(x;\;\mathring{x})italic_ω ( italic_x ; over̊ start_ARG italic_x end_ARG ) for xX𝑥𝑋x\in Xitalic_x ∈ italic_X or ω(x;x̊)𝜔superscript𝑥superscript̊𝑥\omega(x^{\prime};\;\mathring{x}^{\prime})italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; over̊ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) for xXsuperscript𝑥superscript𝑋x^{\prime}\in X^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT depending on the domain – derived by kernelising distance between the explained instance and sampled data. This loss is inspired by Weighted Least Squares, where the weights are similarity scores.

(1) 𝒪(𝒢;f)=argming𝒢(f,g)fidelity+Ω(g)complexity𝒪𝒢𝑓subscriptargmin𝑔𝒢subscript𝑓𝑔fidelitysubscriptΩ𝑔complexity\mathcal{O}(\mathcal{G};\;f)=\operatorname*{arg\,min}_{g\in\mathcal{G}}% \underbrace{\mathcal{L}(f,g)}_{\text{\makebox[42.0pt][c]{fidelity}}}+% \underbrace{\Omega(g)}_{\text{\makebox[42.0pt][c]{complexity}}}caligraphic_O ( caligraphic_G ; italic_f ) = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_g ∈ caligraphic_G end_POSTSUBSCRIPT under⏟ start_ARG caligraphic_L ( italic_f , italic_g ) end_ARG start_POSTSUBSCRIPT fidelity end_POSTSUBSCRIPT + under⏟ start_ARG roman_Ω ( italic_g ) end_ARG start_POSTSUBSCRIPT complexity end_POSTSUBSCRIPT
(2) Ω(g)=θΘg𝟙(|θ|>0)/|Θg|Ω𝑔subscript𝜃subscriptΘ𝑔1𝜃0subscriptΘ𝑔\Omega(g)=\sum_{\theta\in\Theta_{g}}\mathds{1}(|\theta|>0)\;/\;|\Theta_{g}|roman_Ω ( italic_g ) = ∑ start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 ( | italic_θ | > 0 ) / | roman_Θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT |
(3) (f,g;X,x̊,c)=1xXω(x;𝐼𝑅(x̊))xXω(x;𝐼𝑅(x̊))(fc(𝐼𝑅1(x))g(x))2whereω(x;x̊)=κ((x,x̊))formulae-sequence𝑓𝑔superscript𝑋̊𝑥𝑐1subscriptsuperscript𝑥superscript𝑋𝜔superscript𝑥𝐼𝑅̊𝑥subscriptsuperscript𝑥superscript𝑋𝜔superscript𝑥𝐼𝑅̊𝑥superscriptsubscript𝑓𝑐superscript𝐼𝑅1superscript𝑥𝑔superscript𝑥2where𝜔superscript𝑥superscript̊𝑥𝜅superscript𝑥superscript̊𝑥\displaystyle\begin{split}\mathcal{L}(f,g;\;X^{\prime},\mathring{x},c)&=\frac{% 1}{\sum_{x^{\prime}\in X^{\prime}}\omega\left(x^{\prime};\;\mathit{IR}(% \mathring{x})\right)}\\ &\sum_{x^{\prime}\in X^{\prime}}\omega\left(x^{\prime};\;\mathit{IR}(\mathring% {x})\right)\;\left(f_{c}\left(\mathit{IR}^{-1}(x^{\prime})\right)-g(x^{\prime}% )\right)^{2}\\ &\text{where}\quad\omega(x^{\prime};\;\mathring{x}^{\prime})=\kappa\left(\ell% \left(x^{\prime},\;\mathring{x}^{\prime}\right)\right)\end{split}start_ROW start_CELL caligraphic_L ( italic_f , italic_g ; italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over̊ start_ARG italic_x end_ARG , italic_c ) end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_IR ( over̊ start_ARG italic_x end_ARG ) ) end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_IR ( over̊ start_ARG italic_x end_ARG ) ) ( italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - italic_g ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL where italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; over̊ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_κ ( roman_ℓ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over̊ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW

The precise definition of the interpretable representation transformation function 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR depends on the data domain. For text, 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR splits it into d𝑑ditalic_d tokens, e.g., using the bag-of-words approach, whose presence (1111) or absence (00) is encoded by 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT; setting a component of this domain to 00 is thus equivalent to removing a token from a text excerpt. For images, this domain transformation relies on super-pixel partition of a picture into d𝑑ditalic_d non-overlap** patches whose binary vector encoding indicates whether a particular segment is preserved (1111) or discarded (00); since parts of an image cannot be removed directly, an occlusion proxy that replaces selected patches with a predetermined colour is used. Figure 1 shows an interpretable representation of an image and its LIME explanations for the top three predictions. For tabular data, the 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR function is more complex; continuous features are first discretised and then, together with any remaining categorical attributes, binarised. The latter step assigns, separately for every feature, 1111 to the discrete partition where the explained instance is located, with all the other partitions merged and represented by 00. As a result, the map** between 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝒳𝒳\mathcal{X}caligraphic_X tends to be non-deterministic, unlike the corresponding 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR transformation for image and text data. Further information about surrogate explainers – including their generalisation and in-depth analysis of individual building blocks – can be found in the literature (Sokol et al., 2019, 2022a; Sokol and Flach, 2024; Sokol, 2021).

Refer to caption
Figure 2. Surrogate multi-output regression tree explaining the top three classes – tennis ball, golden retriever and Labrador retriever – predicted by a black box for the image shown in Figure 1(a). The segments marked in blue do not influence the explanation at a given tree node, i.e., they can either be preserved or discarded for the explanation to hold. Super-pixels whose value in the interpretable representation is 1111 are preserved and those with 00 are “removed” by occluding them with black patches. The class probabilities estimated by each node of the surrogate tree may not sum up to 1111 as these values capture a subset of the modelled classes and are a result of numerical regression, hence they should not be treated as probabilities per se.

3. LIMEtree

LIME fits a separate surrogate model to the probabilities of each class of interest. This makes the process of discovering the dependencies between multiple classes challenging as each explanation needs to be interpreted in isolation. A surrogate fitted to class A𝐴Aitalic_A is implicitly a one-vs-rest explainer since it can only answer questions about the probability of this single class, with the complementary probability p(¬A)=1p(A)𝑝𝐴1𝑝𝐴p(\neg A)=1-p(A)italic_p ( ¬ italic_A ) = 1 - italic_p ( italic_A ) modelling the union of all the other classes ¬ABC𝐴𝐵𝐶\neg A\equiv B\cup C\cup\cdots¬ italic_A ≡ italic_B ∪ italic_C ∪ ⋯. Interpreting the magnitude of the probability p(A)𝑝𝐴p(A)italic_p ( italic_A ) output by a surrogate trained for class A𝐴Aitalic_A can also be problematic when explaining multi-class black boxes. For example, if p(A)0.5𝑝𝐴0.5p(A)\leq 0.5italic_p ( italic_A ) ≤ 0.5, we cannot be certain whether there is a single class B𝐵Bitalic_B with p(B)>p(A)𝑝𝐵𝑝𝐴p(B)>p(A)italic_p ( italic_B ) > italic_p ( italic_A ), or alternatively the combined probability of all the complementary classes p(¬A)𝑝𝐴p(\neg A)italic_p ( ¬ italic_A ) is greater than or equal to p(A)𝑝𝐴p(A)italic_p ( italic_A ) with no single class dominating over p(A)𝑝𝐴p(A)italic_p ( italic_A ).

Moreover, linear surrogates are unable to model target variables that are non-linear with respect to input features – a property that does not necessarily hold for high-level features such as the concepts encoded by IRs. Their high inter-dependence may also have adverse effects on explanation quality. Additionally, modelling probabilities with linear regression risks confusing the explainees who expect an output bounded between 00 and 1111 but may be given a numerical prediction outside of this range.

We address the challenge of simultaneously explaining multiple classes of a prediction output by a probabilistic model by proposing a first-of-a-kind surrogate explainer based on multi-output regression trees. It facilitates multi-class modelling in a regression setting, allowing the surrogate to capture the interactions between multiple classes, hence explain them coherently. Each node of such a tree approximates the probabilities of every explained class – which level of detail is impossible to achieve with surrogate multi-class classifiers – thus reflecting how individual interventions in the interpretable domain affect the predictions. Figure 2 shows an example of a surrogate multi-output regression tree. This is a significant improvement over training a separate regression surrogate for each explained class, which may produce diverse, inconsistent, competing or contradictory explanations – thus risk confusing the explainees and put their trust at stake – whenever these models do not share a common tree structure or split on different feature subsets. Our contributions establish a new direction in XAI research – concerned with consistent and faithful explanations of multiple classes – and offer a pioneering method to address this challenge.

Moreover, using decision trees (Breiman et al., 1984) as surrogates overcomes the shortcomings identified when linear models are used to this end (Sokol et al., 2019; Sokol and Flach, 2024; Sokol, 2021). Trees neither presuppose independence of features nor existence of a linear relationship between them and the target variable. While surrogate regression trees that approximate the probability of a single class are guaranteed to output a number within the [0,1]01[0,1][ 0 , 1 ] range – since the estimate is calculated as an average – this may not necessarily hold for multi-output trees. Approximating probabilities of multiple classes by averaging their values across a number of instances may yield estimates whose sum is greater than 1111, nonetheless these values can be rescaled to avoid confusing the explainees.

While surrogates based on linear models are limited to (interpretable) feature influence explanations – see Figure 1 – employing trees offers a broad selection of diverse explanation types. These include: (1) visualisation of the tree structure; (2) tree-based (interpretable) feature importance (Gini importance (Breiman, 2001)); (3) logical conditions extracted from root-to-leaf paths; (4) exemplar explanations taken from the training data assigned to the same leaf; (5) answers to what-if questions generated based on the tree structure (e.g., by querying the model); and (6) counterfactualsretrieved by comparing and applying logical reasoning to different tree paths (Sokol, 2021). The first two explanation types uncover the behaviour of a black box in a given data sub-space; the remainder targets specific predictions. Since all six explanation types – see Section 5 for their examples – are derived from a single (surrogate) model, they are guaranteed to be coherent and their diversity should appeal to a wide range of audiences.

To ensure low complexity and high fidelity of our multi-output regression trees, we employ the optimisation objective 𝒪𝒪\mathcal{O}caligraphic_O from Equation 1. Since we are using surrogate trees, we modify the model complexity function ΩΩ\Omegaroman_Ω to measure the depth or width (number of leaves) of the tree as given by Equation 4, where d𝑑ditalic_d is the dimensionality of the binary interpretable domain 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This choice depends on the type of the explanation that we want to extract from the surrogate tree; e.g., depth may be preferred when visualising the tree structure or extracting decision rules. In some cases, such as unbalanced trees, optimising for width or a mixture of the two may be more desirable. We also adapt the loss function \mathcal{L}caligraphic_L to account for the surrogate tree g𝑔gitalic_g outputting multiple values in a single prediction as shown in Equation 5, where C[1,,n]𝐶1𝑛C\subseteq[1,\ldots,n]italic_C ⊆ [ 1 , … , italic_n ] are the classes to be explained by g𝑔gitalic_g, for which the c𝑐citalic_c subscript in gc(x)subscript𝑔𝑐superscript𝑥g_{c}(x^{\prime})italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) indicates the prediction of a selected class cC𝑐𝐶c\in Citalic_c ∈ italic_C for the data point xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

(4) Ω(g;d)=depth(g)dorΩ(g;d)=width(g)2dformulae-sequenceΩ𝑔𝑑depth𝑔𝑑orΩ𝑔𝑑width𝑔superscript2𝑑\Omega(g;\;d)=\frac{\text{depth}(g)}{d}\qquad\text{or}\qquad\Omega(g;\;d)=% \frac{\text{width}(g)}{2^{d}}roman_Ω ( italic_g ; italic_d ) = divide start_ARG depth ( italic_g ) end_ARG start_ARG italic_d end_ARG or roman_Ω ( italic_g ; italic_d ) = divide start_ARG width ( italic_g ) end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG
(5) (f,g;X,x̊,C)=1xXω(x;𝐼𝑅(x̊))xX(ω(x;𝐼𝑅(x̊))1+𝟙(|C|>1)cC(fc(𝐼𝑅1(x))gc(x))2)𝑓𝑔superscript𝑋̊𝑥𝐶1subscriptsuperscript𝑥superscript𝑋𝜔superscript𝑥𝐼𝑅̊𝑥subscriptsuperscript𝑥superscript𝑋𝜔superscript𝑥𝐼𝑅̊𝑥11𝐶1subscript𝑐𝐶superscriptsubscript𝑓𝑐superscript𝐼𝑅1superscript𝑥subscript𝑔𝑐superscript𝑥2\begin{gathered}\mathcal{L}(f,g;\;X^{\prime},\mathring{x},C)=\frac{1}{\sum_{x^% {\prime}\in X^{\prime}}\omega(x^{\prime};\;\mathit{IR}(\mathring{x}))}\\ \sum_{x^{\prime}\in X^{\prime}}\left(\frac{\omega(x^{\prime};\;\mathit{IR}(% \mathring{x}))}{1+\mathds{1}(|C|>1)}\;\sum_{c\in C}\left(f_{c}\left(\mathit{IR% }^{-1}(x^{\prime})\right)-g_{c}(x^{\prime})\right)^{2}\right)\end{gathered}start_ROW start_CELL caligraphic_L ( italic_f , italic_g ; italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over̊ start_ARG italic_x end_ARG , italic_C ) = divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_IR ( over̊ start_ARG italic_x end_ARG ) ) end_ARG end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG italic_ω ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_IR ( over̊ start_ARG italic_x end_ARG ) ) end_ARG start_ARG 1 + blackboard_1 ( | italic_C | > 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_c ∈ italic_C end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL end_ROW

Note that the inner sum over the explained classes cCsubscript𝑐𝐶\sum_{c\in C}∑ start_POSTSUBSCRIPT italic_c ∈ italic_C end_POSTSUBSCRIPT is normalised by (1+𝟙(|C|>1))1superscript11𝐶11(1+\mathds{1}(|C|>1))^{-1}( 1 + blackboard_1 ( | italic_C | > 1 ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, which is 1111 when the surrogate is built for a single class and becomes 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG for more classes. The loss given by Equation 5 is thus equivalent to the one in Equation 3 in the former case, and in the latter the scaling factor ensures that the inner sum is between 00 and 1111 since the biggest squared difference is 2222, which happens when the predictions of f𝑓fitalic_f and g𝑔gitalic_g assign a probability of 1111 to two different classes, e.g., [1,0,0]100[1,0,0][ 1 , 0 , 0 ] and [0,0,1]001[0,0,1][ 0 , 0 , 1 ]. An additional assumption is that the sum of values predicted by each leaf of the surrogate tree is at most 1111, which as noted earlier may in some cases require normalisation. In practice, the surrogate explainer is built by iteratively adding splits to a multi-output regression tree – thus incrementally increasing its complexity Ω(g;d)Ω𝑔𝑑\Omega(g;\;d)roman_Ω ( italic_g ; italic_d ) but also improving its predictive power – which allows to progressively minimise the loss \mathcal{L}caligraphic_L and optimise the objective 𝒪𝒪\mathcal{O}caligraphic_O. This procedure terminates when the loss \mathcal{L}caligraphic_L (calculated with Equation 5) reaches a certain, user-defined level ϵ[0,1]italic-ϵ01\epsilon\in[0,1]italic_ϵ ∈ [ 0 , 1 ], which corresponds to the fidelity of the local surrogate, i.e., (f,g;X,x̊,C)ϵ𝑓𝑔superscript𝑋̊𝑥𝐶italic-ϵ\mathcal{L}\left(f,g;\;X^{\prime},\mathring{x},C\right)\leq\epsiloncaligraphic_L ( italic_f , italic_g ; italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over̊ start_ARG italic_x end_ARG , italic_C ) ≤ italic_ϵ.

4. Fidelity Guarantees

The flexibility of surrogate explainers – they are post-hoc, model-agnostic and, often, data-universal – also contributes to the instability and occasional unreliability of their explanations (Laugel et al., 2018; Zhang et al., 2019a; Sokol and Flach, 2024; Sokol et al., 2019, 2022a). Their subpar fidelity, i.e., the predictive coherence with respect to the underlying black box, is thus a major barrier for their uptake (Rudin, 2019). In addition to remedying the shortcomings of linear surrogates, LIMEtree comes with strong fidelity guarantees, which can be achieved in practice while kee** low explanation complexity.

To imbue LIMEtree with near-full or full fidelity, we identify the minimal IR set Xmin,T𝒳subscriptsuperscript𝑋min𝑇superscript𝒳X^{\prime}_{\textit{min},T}\subseteq\mathcal{X}^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT ⊆ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. It is unique to a tree and composed of binary vectors xmin,tsubscriptsuperscript𝑥min𝑡x^{\prime}_{\textit{min},t}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT drawn from the IR – one per leaf tT𝑡𝑇t\in Titalic_t ∈ italic_T of the surrogate tree – that have the least number of 00 components while still being assigned to the leaf t𝑡titalic_t. The construction of this set is formalised in Definition 4.1 and can be understood as seeking instances with the highest number of human-interpretable concepts being present, e.g., minimal occlusion for images, for each leaf.

Definition 0 (Minimal Representation).

Assume a binary decision tree g𝒢𝑔𝒢g\in\mathcal{G}italic_g ∈ caligraphic_G fitted to a binary d𝑑ditalic_d-dimensional data space 𝒳={0,1}dsuperscript𝒳superscript01𝑑\mathcal{X}^{\prime}=\{0,1\}^{d}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { 0 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with T𝑇Titalic_T denoting its set of leaves. This tree assigns a leaf tT𝑡𝑇t\in Titalic_t ∈ italic_T to a data point x𝒳superscript𝑥superscript𝒳x^{\prime}\in\mathcal{X}^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with the function g𝑖𝑑(x)=tsubscript𝑔𝑖𝑑superscript𝑥𝑡g_{\mathit{id}}(x^{\prime})=titalic_g start_POSTSUBSCRIPT italic_id end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_t. For a given tree leaf t𝑡titalic_t, its unique minimal data point xmin,tsubscriptsuperscript𝑥min𝑡x^{\prime}_{\textit{min},t}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT is

xmin,t=argmaxx𝒳i=1dxis.t.g𝑖𝑑(x)=t,formulae-sequencesubscriptsuperscript𝑥min𝑡subscriptargmaxsuperscript𝑥superscript𝒳superscriptsubscript𝑖1𝑑subscriptsuperscript𝑥𝑖s.t.subscript𝑔𝑖𝑑superscript𝑥𝑡,x^{\prime}_{\textit{min},t}=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}% ^{\prime}}\sum_{i=1}^{d}x^{\prime}_{i}\quad\text{s.t.}\quad g_{\mathit{id}}(x^% {\prime})=t\text{,}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT s.t. italic_g start_POSTSUBSCRIPT italic_id end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_t ,

where xisubscriptsuperscript𝑥𝑖x^{\prime}_{i}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith component of the binary vector xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We can further define a minimal set of data points Xmin,T𝒳subscriptsuperscript𝑋min𝑇superscript𝒳X^{\prime}_{\textit{min},T}\subseteq\mathcal{X}^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT ⊆ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT – uniquely representing a tree g𝑔gitalic_g and the set of its leaves T𝑇Titalic_T – that is composed of all the minimal data points for this tree:

Xmin,T={xmin,t:tT}.subscriptsuperscript𝑋min𝑇conditional-setsubscriptsuperscript𝑥min𝑡𝑡𝑇.X^{\prime}_{\textit{min},T}=\{x^{\prime}_{\textit{min},t}:t\in T\}\text{.}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT : italic_t ∈ italic_T } .

Next, we transform this minimal representation set Xmin,Tsubscriptsuperscript𝑋min𝑇X^{\prime}_{\textit{min},T}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT from the interpretable into the original domain using the inverse of the IR transformation function: Xmin,T={𝐼𝑅1(xmin,t):xmin,tXmin,T}subscript𝑋min𝑇conditional-setsuperscript𝐼𝑅1subscriptsuperscript𝑥min𝑡subscriptsuperscript𝑥min𝑡subscriptsuperscript𝑋min𝑇X_{\textit{min},T}=\{\mathit{IR}^{-1}(x^{\prime}_{\textit{min},t}):x^{\prime}_% {\textit{min},t}\in X^{\prime}_{\textit{min},T}\}italic_X start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT = { italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT ) : italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT }. We then predict class probabilities for each instance in Xmin,Tsubscript𝑋min𝑇X_{\textit{min},T}italic_X start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT with the black box f𝑓fitalic_f and replace the values estimated by the surrogate tree with these probabilities for each leaf tT𝑡𝑇t\in Titalic_t ∈ italic_T, i.e., modify the surrogate tree by overriding its predictions. Doing so is only feasible for the tree leaves as the minimal data points for some of the splitting nodes are indistinguishable; e.g., all the nodes on the root-to-leaf path that decides every interpretable feature to be 1111 are non-unique and all would be represented by the unmodified explained instance.

This variant of LIMEtree – which we refer to as TREE – guarantees full fidelity of the surrogate tree with respect to the explanations derived from the tree structure such as counterfactuals and root-to-leaf decision rules (see Section 5 for their examples). However, for this property to hold the function 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR transforming data from their original domain into the interpretable representation has to be deterministic (Sokol and Flach, 2024), which holds for image and text but not for tabular data (refer back to Section 2). The rationale behind this claim is outlined in Lemma 1, which follows from the subsequent discussion.

Lemma 1 (Structural Fidelity).

A surrogate tree can achieve full fidelity with respect to the explanations derived from its structure – i.e., model-driven explanations – if the interpretable representation transformation function 𝐼𝑅𝐼𝑅\mathit{IR}italic_IR is deterministic. Therefore, an instance x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X can be translated into a unique point 𝐼𝑅(x)=x𝒳𝐼𝑅𝑥superscript𝑥superscript𝒳\mathit{IR}(x)=x^{\prime}\in\mathcal{X}^{\prime}italic_IR ( italic_x ) = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and vice versa 𝐼𝑅1(x)=xsuperscript𝐼𝑅1superscript𝑥𝑥\mathit{IR}^{-1}(x^{\prime})=xitalic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_x, i.e., the map** is one-to-one.

Lemma 1 guarantees that each leaf in the surrogate tree is associated with only one data point xmin,tsubscript𝑥min𝑡x_{\textit{min},t}italic_x start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT in the original representation 𝒳𝒳\mathcal{X}caligraphic_X. This instance is derived from the minimal interpretable data point xmin,tsubscriptsuperscript𝑥min𝑡x^{\prime}_{\textit{min},t}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT by applying the inverse of the interpretable representation transformation function 𝐼𝑅1superscript𝐼𝑅1\mathit{IR}^{-1}italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, i.e., xmin,t=𝐼𝑅1(xmin,t)subscript𝑥min𝑡superscript𝐼𝑅1subscriptsuperscript𝑥min𝑡x_{\textit{min},t}=\mathit{IR}^{-1}(x^{\prime}_{\textit{min},t})italic_x start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT = italic_IR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT ). Therefore, xmin,tsubscript𝑥min𝑡x_{\textit{min},t}italic_x start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT represents the explained instance with the smallest possible number of concepts deleted from it such that g𝑖𝑑(xmin,t)=tsubscript𝑔𝑖𝑑subscriptsuperscript𝑥min𝑡𝑡g_{\mathit{id}}(x^{\prime}_{\textit{min},t})=titalic_g start_POSTSUBSCRIPT italic_id end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT ) = italic_t. By assigning the probabilities predicted by the explained black box for each data point xmin,tsubscript𝑥min𝑡x_{\textit{min},t}italic_x start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT to the corresponding leaf t𝑡titalic_t of the surrogate, it achieves full fidelity for the minimal representation set Xmin,Tsubscript𝑋min𝑇X_{\textit{min},T}italic_X start_POSTSUBSCRIPT min , italic_T end_POSTSUBSCRIPT, which is the backbone of model-driven explanations.

While such an approach ensures full fidelity of model-driven explanations, the same is not guaranteed for data-driven explanations such as answers to what-if questions, e.g., “What if concept xisubscriptsuperscript𝑥𝑖x^{\prime}_{i}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is absent?” Root-to-leaf paths that do not condition on all of the binary interpretable features allow for more than one data point to be assigned to that leaf; e.g., for three binary features [x1,x2,x3]{0,1}3subscriptsuperscript𝑥1subscriptsuperscript𝑥2subscriptsuperscript𝑥3superscript013[x^{\prime}_{1},x^{\prime}_{2},x^{\prime}_{3}]\in\{0,1\}^{3}[ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, a root-to-leaf path with a x1<0.5x3<0.5subscriptsuperscript𝑥10.5subscriptsuperscript𝑥30.5x^{\prime}_{1}<0.5\land x^{\prime}_{3}<0.5italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 0.5 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT < 0.5 condition assigns [0,0,0]000[0,0,0][ 0 , 0 , 0 ] and [0,1,0]010[0,1,0][ 0 , 1 , 0 ] to this leaf. This observation motivates the minimal interpretable representation Xmin,tsubscriptsuperscript𝑋min𝑡X^{\prime}_{\textit{min},t}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT min , italic_t end_POSTSUBSCRIPT (Definition 4.1), which selects a single data point to represent each leaf thereby facilitating full fidelity of model-driven explanations without additional assumptions. However, for data-driven explanations to achieve full fidelity, the surrogate tree must faithfully model the entire interpretable feature space, i.e., have one leaf for every data point in 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which can be thought of as extreme overfitting. Since the cardinality of a binary d𝑑ditalic_d-dimensional space 𝔹d={0,1}dsuperscript𝔹𝑑superscript01𝑑\mathbb{B}^{d}=\{0,1\}^{d}blackboard_B start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { 0 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is given by |𝔹d|=2dsuperscript𝔹𝑑superscript2𝑑|\mathbb{B}^{d}|=2^{d}| blackboard_B start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | = 2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and a complete and balanced binary decision tree of 2dsuperscript2𝑑2^{d}2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT width (number of leaves) is d𝑑ditalic_d deep, relaxing the tree complexity bound ΩΩ\Omegaroman_Ω accordingly guarantees full fidelity of all the explanations, which property is captured by Corollary 2.

Corollary 2 (Full Fidelity).

If the complexity bound (width) ΩΩ\Omegaroman_Ω of a surrogate tree g𝑔gitalic_g is relaxed to equal the cardinality of the binary interpretable domain 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, i.e., Ω(g;|𝒳|)=width(g)2|𝒳|=2|𝒳|2|𝒳|=1Ω𝑔superscript𝒳width𝑔superscript2superscript𝒳superscript2superscript𝒳superscript2superscript𝒳1\Omega(g;\;|\mathcal{X}^{\prime}|)=\frac{\text{width}(g)}{2^{|\mathcal{X}^{% \prime}|}}=\frac{2^{|\mathcal{X}^{\prime}|}}{2^{|\mathcal{X}^{\prime}|}}=1roman_Ω ( italic_g ; | caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) = divide start_ARG width ( italic_g ) end_ARG start_ARG 2 start_POSTSUPERSCRIPT | caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG = divide start_ARG 2 start_POSTSUPERSCRIPT | caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT | caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG = 1, then the surrogate is guaranteed to achieve full fidelity. This property applies to explanations that are both data-driven – i.e., derived from any data point in the interpretable representation – and model-driven – i.e., derived from the structure of the surrogate tree.

Therefore, a surrogate tree that guarantees faithfulness of model-driven explanations (Lemma 1) can only deliver trustworthy counterfactuals and exemplar explanations sourced from the minimal representation set. This may be an attractive alternative to more complex surrogate trees that additionally guarantee faithfulness of data-driven explanations (Corollary 2). The latter surrogate type, which usually yields deeper trees, can deliver a broader spectrum of trustworthy explanations: tree structure-based explanations, feature importance, decision rules (root-to-leaf paths), answers to what-if questions, and exemplar explanations based on any data point, in addition to counterfactuals.

Refer to caption
(a) Tree-based feature importance explanation, which is shared between all the three explained classes.
Refer to caption
(b) The shortest – i.e., highest number of occlusions with 6 out of 8 segments removed – exemplar explanation of tennis ball (97%).
Refer to caption
(c) What-if / counterfactual explanation: “If segment #8 (representing the ball) is removed, the black box predicts golden retriever (97%).”
Refer to caption
(d) Visualisation of an explanation based on a root-to-leaf decision rule that maximises the probability of Labrador retriever (98%).
Figure 3. Examples of four LIMEtree explanation types complementing the tree structure visualisation shown in Figure 2: (3(a)) feature importance, (3(b)) exemplar, (3(c)) what-if /counterfactual and (3(d)) decision rule. These insights allow to uncover the heuristic used by the black box to differentiate between the three explained classes, which is not feasible with the LIME explanations displayed in Figure 1. Panels (3(b)), (3(c)) & (3(d)) show explanations generated to maximise the predicted probability of one of the classes; they are presented here with appealing visualisations, but they can also be communicated via the underlying logical expressions, e.g., x1=0x2=0x3=1x4=1x5=1x6=1x7=0x8=0subscriptsuperscript𝑥10subscriptsuperscript𝑥20subscriptsuperscript𝑥31subscriptsuperscript𝑥41subscriptsuperscript𝑥51subscriptsuperscript𝑥61subscriptsuperscript𝑥70subscriptsuperscript𝑥80x^{\prime}_{1}=0\land x^{\prime}_{2}=0\land x^{\prime}_{3}=1\land x^{\prime}_{% 4}=1\land x^{\prime}_{5}=1\land x^{\prime}_{6}=1\land x^{\prime}_{7}=0\land x^% {\prime}_{8}=0italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 1 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 1 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = 1 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT = 1 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT = 0 ∧ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT = 0 for Panel (3(d)). Note that LIMEtree explanations can be customised to an individual explainee’s needs, which can be seen in Panels (3(b)), (3(c)) & (3(d)); the user can ask for certain image segments (i.e., interpretable features) to be preserved and other discarded as well as for the smallest or biggest possible occlusion, at the same time requesting to maximise the probability of a selected class (according to the black box).

5. Discussion

LIMEtree explanations are versatile and appealing but achieving their full fidelity presupposes a deterministic IR transformation function (Lemma 1) and a complete surrogate tree (Corollary 2). This is not a problem for image and text data since the corresponding IRs can be built to be deterministic and of low dimensionality (given by the number of desired human-comprehensible concepts). The IR of tabular data, however, is inherently non-deterministic (Sokol and Flach, 2024) – due to the many-to-one map** introduced by discretisation and binarisation (refer back to Section 2) – with its dimensionality equal to the size of the original feature space. Nonetheless, since uniquely for tabular data the surrogate tree can be trained directly on their original representation, thus implicitly constructing a locally faithful and meaningful IR instead of relying on an external one (Sokol et al., 2019; Sokol and Flach, 2024), the surrogate can be overfitted to maximise its fidelity. While LIMEtree offers a close approximation in both cases, full fidelity cannot be guaranteed since a complete surrogate tree is unable to achieve full coverage for non-deterministic IRs. The consequences of this shortcoming can be seen in our experimental results given later in Table 1, which shows that a complete surrogate tree – labelled TREE – can reach full fidelity for image but not for tabular data.

In practice, full fidelity of surrogates based on deterministic IRs (Lemma 1) is achieved by adjusting the sample size |X|superscript𝑋|X^{\prime}|| italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | and relaxing the tree complexity bound ΩΩ\Omegaroman_Ω. Recall that a d𝑑ditalic_d-dimensional binary interpretable representation 𝒳𝔹d={0,1}dsuperscript𝒳superscript𝔹𝑑superscript01𝑑\mathcal{X}^{\prime}\equiv\mathbb{B}^{d}=\{0,1\}^{d}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≡ blackboard_B start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { 0 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT has |𝒳|=2dsuperscript𝒳superscript2𝑑|\mathcal{X}^{\prime}|=2^{d}| caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = 2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT unique instances, and the width, i.e., the number of leaves, of a complete, balanced binary decision tree of depth d𝑑ditalic_d is 2dsuperscript2𝑑2^{d}2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (Corollary 2). Therefore, we can use all of these data points – there is no benefit from oversampling – to easily train a local surrogate with its complexity bound ΩΩ\Omegaroman_Ω removed to allow complete trees of depth d𝑑ditalic_d, i.e., with one leaf per instance, guaranteeing full fidelity and access to a diverse range of faithful and comprehensible explanations. The depth bound and the sample size can be adjust dynamically prior to training the surrogate to ensure its optimality since the size of the interpretable domain is known beforehand.

Since for images as well as text each dimension of the IR captures a human-comprehensible concept, their number is expected to be low, especially that tokens in text excerpts and segments in images do not have to be adjacent to constitute a single concept. For every additional feature in the interpretable space, the number of sampled data points doubles and the tree depth is incremented by one in order to provide the interpretable domain and the surrogate tree with enough capacity to preserve the full fidelity guarantee. While this exponential growth in the number of interpretable data points may seem overwhelming, training decision trees on binary data spaces is fast given the predetermined 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG split at every node. The exponential growth of the width of the surrogate tree that guarantees its full fidelity increases its complexity and can have adverse effects on the comprehensibility of some explanation types, however, as we show next, it does not affect the most important and versatile explanation kinds.

Guaranteeing full fidelity of a surrogate tree requires relaxing its complexity bound ΩΩ\Omegaroman_Ω, which the optimisation objective 𝒪𝒪\mathcal{O}caligraphic_O tries to minimise (Equation 1). Since in this setting a moderate number of interpretable features may yield a relatively large tree, the increased complexity of the resulting explanations is concerning. While a complex surrogate tree may render the explanations based on its structure, e.g., model visualisations, incomprehensible, these are not the most appealing explanation types and their appreciation often requires AI expertise. The (interpretable) feature importance, what-if explanations, counterfactuals and exemplars are not affected by the tree size in any way and remain highly compact and comprehensible – see Figure 3 for some examples. Notably, a complete surrogate tree with full fidelity will produce more counterfactual explanations for every data point, making it more interpretable.

The decision rules – logical conditions extracted from root-to-leaf paths – may indeed become overwhelmingly long, in fact as long as the tree depth, however this does not impact all the data types equally and an appropriate presentation medium can alleviate this issue regardless of the tree complexity. For image and text data such rules will always be comprehensible, no matter their length, since they cannot have more literals than the dimensionality of the underlying interpretable domain, i.e., the number of segments for images and word-based tokens for text. Presenting this rule in the former case corresponds to displaying an image with its various segments occluded (e.g., see Figure 3(d)) and in the latter producing a text excerpt with selected tokens removed. For tabular data, however, these rules may become relatively long and incomprehensible since this domain lacks a similar human-friendly representation; the exception are root-to-leaf paths that impose multiple logical conditions on a single feature (in the original domain), allowing for their compression. Regardless of the presentation medium, a general criticism of rule-based explanations is the difficulty of understanding how each logical condition affects the prediction, making them less appealing than other explanation types.

In view of these observations, if explanations based on the structure of the surrogate tree are not required for image and text data, and additionally rule-based explanations are not needed for tabular data, the model complexity ΩΩ\Omegaroman_Ω does not have to be minimised. It can therefore be removed from the optimisation objective 𝒪𝒪\mathcal{O}caligraphic_O given in Equation 1, paving the way for full surrogate fidelity.

6. Evaluation

We assess the explanatory power of LIMEtree with a multi-tier evaluation approach that consists of an assessment guided by XAI desiderata (Section 6.1) as well as functionally-grounded (Section 6.2) and human-grounded (Section 6.3) experiments (Doshi-Velez and Kim, 2017; Sokol and Flach, 2020a). The first judges our approach against a number of criteria important for XAI systems; the second involves a (synthetic) proxy task in which we compare the (numerical) fidelity of LIME with multiple variants of LIMEtree on image and tabular data; the third reports results of a pilot user study, which is based on image classification to enable straightforward qualitative evaluation of explanations by means of visual inspection, thus alleviating the need for technical expertise.

6.1. Desiderata

XAI systems follow two distinct steps: explanation generation and presentation, which separation allows us to better identify, evaluate and report the unique desiderata important at each stage (Sokol and Vogt, 2024). Given that LIMEtree is a surrogate explainer, the insights that it generates are post-hoc, therefore they may not reflect the true behaviour of the underlying black box (Rudin, 2019). This discrepancy – measured as fidelity – is an important indicator of explanation truthfulness, which should always be communicated to the explainees, especially in high stakes applications. While LIMEtree can achieve full fidelity without sacrificing explanation comprehensibility, this desideratum is limited to IRs that are deterministic. To take advantage of this property it is therefore important to design an IR that addresses the explainability needs of each particular use case, which may require additional effort to build such a bespoke module despite the explainer itself being model-agnostic (Sokol and Flach, 2024; Mittelstadt et al., 2019; Sokol et al., 2022a). More broadly, the truthfulness is a major advantage of our approach given that it allows to retrofit explainability into pre-existing black boxes. Whatever explanation type, presentation format and communication medium are chosen, this property guarantees that the explanatory insights are based on an accurate reflection of the black-box model’s behaviour.

Before reviewing desiderata of specific explanation types, we discuss a set of general properties that are expected of all explanatory insights (Sokol and Flach, 2020a). LIMEtree excels when it comes to explanation plurality and diversity – especially so given their consistency – allowing the explainees to explore distinct aspects of the underlying black box without running into spuriously contradictory observations, further improving the trustworthiness of its explanations. While some of them are inherently static, others can be operationalised within an interactive explanatory protocol (Sokol and Flach, 2020b), enabling the explainees to customise and personalise them in a natural way – see Figure 3 for examples. This breadth of explanatory insights and access to their source – the surrogate tree structure (Figure 2) – enables their contextualisation, which makes them particularly appealing since good explanations do not only communicate what information is used by a predictive model but also how it is used (Rudin, 2019).

By simultaneously accounting for multiple classes, LIMEtree offers a more comprehensive picture of the explained model’s predictive behaviour and facilitates user-driven exploration, which, as noted in Section 1, mitigates the severity of automation bias, especially so for counterfactuals (Byrne, 2023). Also, recall that our method is compatible with hypothesis-driven XAI since the breadth of its insights allows the explainees to consider multiple congruent explanations for different predictions of a given instance instead of only receiving a justification of the top prediction (Miller, 2023). Given that our method operates as a surrogate, we can freely tweak and tune the target, breadth and scope of its explanations by adjusting its configuration, which further adds to its flexibility (Sokol and Flach, 2020a; Sokol et al., 2019; Sokol and Flach, 2024; Sokol et al., 2022a; Sokol and Flach, 2020b).

While LIMEtree offers a broad spectrum of explanation types – whose diversity makes it appealing to a wide range of audiences – we anticipate the counterfactuals to be the most attractive given their ubiquity in XAI (Miller, 2018). Notably, these insights are ante-hoc with respect to the surrogate tree, therefore their truthfulness is guaranteed in this regard (Sokol and Vogt, 2023). Their generation procedure allows to account for plausibility and actionability of their conditional part as well as other (human-centred) properties that may be desired (Sokol and Flach, 2020a; Keane et al., 2021; Sokol and Flach, 2020b, 2018). Counterfactual explanations are known to be intrinsically comprehensible given their parsimony and low complexity, making them an attractive choice across a diverse range of applications (Sokol and Flach, 2020a; Miller, 2018).

×102absentsuperscript102\times 10^{-2}× 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ImageNet + Inception v3 CIFAR-10 + ResNet 56 CIFAR-100 + RepVGG
LIME TREE@66% TREE@75% TREE LIME TREE@66% TREE@75% TREE LIME TREE@66% TREE@75% TREE
nth top 1st 3.67±2.18plus-or-minus3.672.183.67\pm 2.183.67 ± 2.18 0.60±0.61plus-or-minus0.600.61\boldsymbol{0.60}\pm 0.61bold_0.60 ± 0.61 0.64±0.73plus-or-minus0.640.730.64\pm 0.730.64 ± 0.73 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 7.34±2.96plus-or-minus7.342.967.34\pm 2.967.34 ± 2.96 2.17±1.25plus-or-minus2.171.25\boldsymbol{2.17}\pm 1.25bold_2.17 ± 1.25 2.77±1.66plus-or-minus2.771.662.77\pm 1.662.77 ± 1.66 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 3.33±1.80plus-or-minus3.331.803.33\pm 1.803.33 ± 1.80 0.59±0.56plus-or-minus0.590.56\boldsymbol{0.59}\pm 0.56bold_0.59 ± 0.56 0.66±0.63plus-or-minus0.660.630.66\pm 0.630.66 ± 0.63 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
2nd 1.14±1.77plus-or-minus1.141.771.14\pm 1.771.14 ± 1.77 0.24±0.42plus-or-minus0.240.42\boldsymbol{0.24}\pm 0.42bold_0.24 ± 0.42 0.25±0.40plus-or-minus0.250.400.25\pm 0.400.25 ± 0.40 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 3.91±3.98plus-or-minus3.913.983.91\pm 3.983.91 ± 3.98 1.28±1.31plus-or-minus1.281.31\boldsymbol{1.28}\pm 1.31bold_1.28 ± 1.31 1.69±1.76plus-or-minus1.691.761.69\pm 1.761.69 ± 1.76 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 0.97±1.46plus-or-minus0.971.460.97\pm 1.460.97 ± 1.46 0.24±0.36plus-or-minus0.240.36\boldsymbol{0.24}\pm 0.36bold_0.24 ± 0.36 0.26±0.40plus-or-minus0.260.400.26\pm 0.400.26 ± 0.40 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
3rd 0.63±1.36plus-or-minus0.631.360.63\pm 1.360.63 ± 1.36 0.13±0.25plus-or-minus0.130.25\boldsymbol{0.13}\pm 0.25bold_0.13 ± 0.25 0.16±0.33plus-or-minus0.160.330.16\pm 0.330.16 ± 0.33 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 2.57±3.37plus-or-minus2.573.372.57\pm 3.372.57 ± 3.37 0.89±1.15plus-or-minus0.891.15\boldsymbol{0.89}\pm 1.15bold_0.89 ± 1.15 1.10±1.44plus-or-minus1.101.441.10\pm 1.441.10 ± 1.44 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 0.56±1.13plus-or-minus0.561.130.56\pm 1.130.56 ± 1.13 0.14±0.29plus-or-minus0.140.29\boldsymbol{0.14}\pm 0.29bold_0.14 ± 0.29 0.16±0.32plus-or-minus0.160.320.16\pm 0.320.16 ± 0.32 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
top n 1 3.67±2.18plus-or-minus3.672.183.67\pm 2.183.67 ± 2.18 0.60±0.61plus-or-minus0.600.61\boldsymbol{0.60}\pm 0.61bold_0.60 ± 0.61 0.64±0.73plus-or-minus0.640.730.64\pm 0.730.64 ± 0.73 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 7.34±2.96plus-or-minus7.342.967.34\pm 2.967.34 ± 2.96 2.17±1.25plus-or-minus2.171.25\boldsymbol{2.17}\pm 1.25bold_2.17 ± 1.25 2.77±1.66plus-or-minus2.771.662.77\pm 1.662.77 ± 1.66 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 3.33±1.80plus-or-minus3.331.803.33\pm 1.803.33 ± 1.80 0.59±0.56plus-or-minus0.590.56\boldsymbol{0.59}\pm 0.56bold_0.59 ± 0.56 0.66±0.63plus-or-minus0.660.630.66\pm 0.630.66 ± 0.63 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
2 2.41±1.40plus-or-minus2.411.402.41\pm 1.402.41 ± 1.40 0.42±0.42plus-or-minus0.420.42\boldsymbol{0.42}\pm 0.42bold_0.42 ± 0.42 0.44±0.45plus-or-minus0.440.450.44\pm 0.450.44 ± 0.45 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 5.63±2.69plus-or-minus5.632.695.63\pm 2.695.63 ± 2.69 1.73±1.03plus-or-minus1.731.03\boldsymbol{1.73}\pm 1.03bold_1.73 ± 1.03 2.23±1.42plus-or-minus2.231.422.23\pm 1.422.23 ± 1.42 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 2.15±1.15plus-or-minus2.151.152.15\pm 1.152.15 ± 1.15 0.41±0.36plus-or-minus0.410.36\boldsymbol{0.41}\pm 0.36bold_0.41 ± 0.36 0.46±0.40plus-or-minus0.460.400.46\pm 0.400.46 ± 0.40 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
3 2.72±1.58plus-or-minus2.721.582.72\pm 1.582.72 ± 1.58 0.48±0.47plus-or-minus0.480.47\boldsymbol{0.48}\pm 0.47bold_0.48 ± 0.47 0.53±0.50plus-or-minus0.530.500.53\pm 0.500.53 ± 0.50 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 6.91±3.26plus-or-minus6.913.266.91\pm 3.266.91 ± 3.26 2.17±1.28plus-or-minus2.171.28\boldsymbol{2.17}\pm 1.28bold_2.17 ± 1.28 2.78±1.73plus-or-minus2.781.732.78\pm 1.732.78 ± 1.73 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0 2.42±1.29plus-or-minus2.421.292.42\pm 1.292.42 ± 1.29 0.48±0.41plus-or-minus0.480.41\boldsymbol{0.48}\pm 0.41bold_0.48 ± 0.41 0.54±0.45plus-or-minus0.540.450.54\pm 0.450.54 ± 0.45 𝟎±0plus-or-minus00\boldsymbol{0}\pm 0bold_0 ± 0
(a) Image data sets and the corresponding (pre-trained) neural networks (Chen, 2021): ImageNet (Deng et al., 2009) (1,659 samples, 256×\times×256 pixels, 1,000 classes) + Inception v3 (77% acc.); CIFAR-10 (Krizhevsky and Hinton, 2009) (9,714 samples, 32×\times×32 pixels, 10 classes) + ResNet 56 (94% acc.); and CIFAR-100 (Krizhevsky and Hinton, 2009) (9,665 samples, 32×\times×32 pixels, 100 classes) + RepVGG (77% acc.). We use all validation set images for which an interpretable representation can be built; however, for ImageNet we first pre-select images that are square and at least 256×\times×256, which we resize to these dimensions. The results are scaled up by 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
×101absentsuperscript101\times 10^{-1}× 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT Wine + Logistic Regression Forest Covertypes + Multilayer Perceptron
LIME TREE@66% TREE@100% TREE LIME TREE@66% TREE@100% TREE
nth top 1st 0.29±0.27plus-or-minus0.290.270.29\pm 0.270.29 ± 0.27 0.08±0.11plus-or-minus0.080.11\boldsymbol{0.08}\pm 0.11bold_0.08 ± 0.11 5.54±3.43plus-or-minus5.543.435.54\pm 3.435.54 ± 3.43 0.07±0.11plus-or-minus0.070.11\boldsymbol{0.07}\pm 0.11bold_0.07 ± 0.11 0.59±0.26plus-or-minus0.590.260.59\pm 0.260.59 ± 0.26 0.06±0.06plus-or-minus0.060.06\boldsymbol{0.06}\pm 0.06bold_0.06 ± 0.06 4.56±2.12plus-or-minus4.562.124.56\pm 2.124.56 ± 2.12 0.06±0.06plus-or-minus0.060.06\boldsymbol{0.06}\pm 0.06bold_0.06 ± 0.06
2nd 0.14±0.16plus-or-minus0.140.160.14\pm 0.160.14 ± 0.16 0.03±0.04plus-or-minus0.030.04\boldsymbol{0.03}\pm 0.04bold_0.03 ± 0.04 2.35±3.26plus-or-minus2.353.262.35\pm 3.262.35 ± 3.26 0.03±0.04plus-or-minus0.030.04\boldsymbol{0.03}\pm 0.04bold_0.03 ± 0.04 0.51±0.29plus-or-minus0.510.290.51\pm 0.290.51 ± 0.29 0.05±0.05plus-or-minus0.050.05\boldsymbol{0.05}\pm 0.05bold_0.05 ± 0.05 1.88±1.21plus-or-minus1.881.211.88\pm 1.211.88 ± 1.21 0.05±0.05plus-or-minus0.050.05\boldsymbol{0.05}\pm 0.05bold_0.05 ± 0.05
3rd 0.20±0.28plus-or-minus0.200.280.20\pm 0.280.20 ± 0.28 0.07±0.12plus-or-minus0.070.12\boldsymbol{0.07}\pm 0.12bold_0.07 ± 0.12 3.73±4.18plus-or-minus3.734.183.73\pm 4.183.73 ± 4.18 0.06±0.11plus-or-minus0.060.11\boldsymbol{0.06}\pm 0.11bold_0.06 ± 0.11 0.13±0.21plus-or-minus0.130.210.13\pm 0.210.13 ± 0.21 0.02±0.04plus-or-minus0.020.04\boldsymbol{0.02}\pm 0.04bold_0.02 ± 0.04 0.57±0.94plus-or-minus0.570.940.57\pm 0.940.57 ± 0.94 0.02±0.04plus-or-minus0.020.04\boldsymbol{0.02}\pm 0.04bold_0.02 ± 0.04
top n 1 0.29±0.27plus-or-minus0.290.270.29\pm 0.270.29 ± 0.27 0.08±0.11plus-or-minus0.080.11\boldsymbol{0.08}\pm 0.11bold_0.08 ± 0.11 5.54±3.43plus-or-minus5.543.435.54\pm 3.435.54 ± 3.43 0.07±0.11plus-or-minus0.070.11\boldsymbol{0.07}\pm 0.11bold_0.07 ± 0.11 0.59±0.26plus-or-minus0.590.260.59\pm 0.260.59 ± 0.26 0.06±0.06plus-or-minus0.060.06\boldsymbol{0.06}\pm 0.06bold_0.06 ± 0.06 4.56±2.12plus-or-minus4.562.124.56\pm 2.124.56 ± 2.12 0.06±0.06plus-or-minus0.060.06\boldsymbol{0.06}\pm 0.06bold_0.06 ± 0.06
2 0.22±0.19plus-or-minus0.220.190.22\pm 0.190.22 ± 0.19 0.06±0.07plus-or-minus0.060.07\boldsymbol{0.06}\pm 0.07bold_0.06 ± 0.07 3.94±2.67plus-or-minus3.942.673.94\pm 2.673.94 ± 2.67 0.05±0.07plus-or-minus0.050.07\boldsymbol{0.05}\pm 0.07bold_0.05 ± 0.07 0.55±0.26plus-or-minus0.550.260.55\pm 0.260.55 ± 0.26 0.06±0.05plus-or-minus0.060.05\boldsymbol{0.06}\pm 0.05bold_0.06 ± 0.05 3.22±1.04plus-or-minus3.221.043.22\pm 1.043.22 ± 1.04 0.06±0.05plus-or-minus0.060.05\boldsymbol{0.06}\pm 0.05bold_0.06 ± 0.05
3 0.32±0.29plus-or-minus0.320.290.32\pm 0.290.32 ± 0.29 0.09±0.12plus-or-minus0.090.12\boldsymbol{0.09}\pm 0.12bold_0.09 ± 0.12 5.80±3.56plus-or-minus5.803.565.80\pm 3.565.80 ± 3.56 0.08±0.12plus-or-minus0.080.12\boldsymbol{0.08}\pm 0.12bold_0.08 ± 0.12 0.62±0.29plus-or-minus0.620.290.62\pm 0.290.62 ± 0.29 0.07±0.06plus-or-minus0.070.06\boldsymbol{0.07}\pm 0.06bold_0.07 ± 0.06 3.51±1.09plus-or-minus3.511.093.51\pm 1.093.51 ± 1.09 0.07±0.06plus-or-minus0.070.06\boldsymbol{0.07}\pm 0.06bold_0.07 ± 0.06
(b) Tabular data sets and the corresponding models (trained with scikit-learn (Pedregosa et al., 2011)): Wine (Aeberhard and Forina, 1991) (36 samples, 13 features, 3 classes) + Logistic Regression (93% balanced acc.); and Forest Covertypes (Blackard, 1998) (2,500 samples, 54 features, 7 classes) + Multilayer Perceptron (86% balanced acc.). For Wine we use all the test set samples; given their small number we repeated the study on the entire data set (178 samples) with comparable results. The Forest Covertypes test set has 116,203 samples, from which we draw a stratified subset of size 2,500. The results are scaled up by 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.
Table 1. Fidelity loss (mean ±plus-or-minus\pm± standard deviation, smaller is better) computed: (nth top) separately for each of the top three black-box predictions with the LIME loss (Equation 3); and (top n) collectively for the top one, two and three black-box predictions with the LIMEtree loss (Equation 5). We report results for (1(a)) three image and (1(b)) two tabular data sets with four surrogates: LIME and three variants of LIMEtree (TREE, TREE & TREE). The percentage shown after the explainer name specifies the tree complexity ΩΩ\Omegaroman_Ω – i.e., its depth divided by its maximum possible depth determined by the number of features in the interpretable representation – at which loss is computed; TREE is equivalent to TREE@100%. See Figure 4 for examples of the loss behaviour.

6.2. Synthetic Experiments

We evaluate the trustworthiness and comprehensibility of LIMEtree explanations using the two components of the optimisation objective 𝒪𝒪\mathcal{O}caligraphic_O (Equation 1) – fidelity \mathcal{L}caligraphic_L and complexity ΩΩ\Omegaroman_Ω – as computational proxies. The former measures the faithfulness of the surrogate with respect to the black box, i.e., its ability to mimic the black box, which is the only metric capable of reporting the reliability of all the diverse explanation types extracted from the surrogate. To this end, we employ the formulation of fidelity used by LIME (Equation 3) and LIMEtree (Equation 5); we compute this property when modelling the top three classes predicted by the black box for each test instance. We additionally analyse the complexity of LIMEtree surrogates (Equation 4), i.e., the tree depth normalised by the dimensionality of the IR, and compare it to the size, i.e., number of coefficients, of LIME surrogates (Equation 2).

We study three variants of LIMEtree, all of which minimise fidelity but differ in complexity constraints and post-processing:

TREE:

optimises a surrogate tree for complexity, i.e., it determines the shallowest tree that offers the desired level of fidelity;

TREE:

is a variant of TREE whose predictions are post-processed to guarantee full fidelity of model-driven explanations; and

TREE:

constructs a surrogate tree without any complexity constraints, allowing the algorithm to build a complete tree.

We compare the fidelity of these explainers to LIME with disabled feature selection, which allows it to achieve maximum fidelity at the expense of explanation size. Our study is limited to fidelity and complexity since XAI lacks metrics suitable for multi-class explainability or for cases when multiple explanation types are derived from a single source as well as for explanations that rely on probabilities instead of crisp predictions (to mitigate automation bias). LIME is our only baseline given the general lack of multi-class explainers or methods whose underlying surrogate model can be accessed.

Refer to caption
(a) CIFAR-100 (image data set).
Refer to caption
(b) Forest Covertypes (tabular data set).
Figure 4. Behaviour of LIMEtree loss (fidelity \mathcal{L}caligraphic_L and its standard deviation, y-axis) computed for the top three classes of the (4(a)) CIFAR-100 and (4(b)) Forest Covertypes data sets and plotted against surrogate complexity (ΩΩ\Omegaroman_Ω, x-axis) given as the ratio between the depth of the tree and its maximum depth (complete tree) determined by the number of features of the interpretable domain. We report results for three surrogate variants: LIME, TREE and TREE; the plots are representative of the other data sets used in our experiments and complement the fidelity at fixed tree complexity levels (66%, 75% and 100%) reported in Table 1. LIME complexity is constant and given by the number of features in the interpretable representation, i.e., 100% equivalent.

Table 1, which reports the results of our evaluation, also summarises our experimental setup. We use a collection of popular multi-class image and tabular data sets; with the former we rely on a selection of pre-trained neural networks, and with the latter we split the data 80%–20% into stratified training and test sets and fit the models ourselves. LIME and LIMEtree are implemented following best practice described in the literature (Sokol et al., 2019; Sokol and Flach, 2024; Garreau and Luxburg, 2020; Sokol et al., 2020, 2022b, 2022a). For images we use an IR built upon SLIC (edge-based) segmentation (Achanta et al., 2012) with black colour occlusion as the information removal proxy; given its deterministic transformation function we operate directly on the binary interpretable domain and generate its full set of instances instead of their random sample to enable the surrogate to reach full fidelity. For tabular data we sample 10,000 data points around the explained instance in the original domain – using mixup, which is an explicitly local sampler that accounts for class labels (Zhang et al., 2017; Sokol et al., 2022b) – since the corresponding IR transformation function is non-deterministic; we use quartile-based discretisation applied to the data sample followed by binarisation as our interpretable domain. For images we use cosine distance measured in the IR, and for tabular data we use Euclidean distance measured in the original domain; we use the exponential kernel for both, with its parameter determined experimentally for each data set. Our code is available on GitHub111https://github.com/So-Cool/bLIMEy/tree/master/ECML-PKDD_2023 .

In our experiments LIME produces three independent linear surrogates, one per class; each LIMEtree variant is either built as a single surrogate that models all of the classes simultaneously (nth top), or a separate surrogate is constructed for a one-, two- and three-class problem (top n). In deployment, however, LIMEtree fits only a single multi-output tree, whereas LIME requires as many models as explained classes. As a result, since both methods follow the same steps except for the surrogate model training phase, our method tends to be faster for relatively small trees given that they are fitted to binary data with feature thresholds fixed at 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG – up to the depth of 20 in our experiments – and becomes negligibly slower for large trees – requiring 250 milliseconds more than LIME for trees as deep as 40 – but these measures will fluctuate with the number of explained classes and the IR dimensionality. Since the number of interpretable features should be kept low to improve human comprehensibility of the explanations, which directly limits the surrogate tree depth, we expect LIMEtree to be faster in practice (Sokol and Flach, 2024).

To assess explanation quality we measure multi-class fidelity with the LIMEtree loss as well as the fidelity of each class separately with the LIME loss. The results summarised by Table 1 show that our base method – TREE – provides more faithful explanations than LIME at 2/323\nicefrac{{2}}{{3}}/ start_ARG 2 end_ARG start_ARG 3 end_ARG of its complexity for tabular and image data. TREE – which post-processes the surrogate tree to facilitate full fidelity of model-driven explanations when the IR transformation function is deterministic – also surpasses LIME at 3/434\nicefrac{{3}}{{4}}/ start_ARG 3 end_ARG start_ARG 4 end_ARG of its complexity for image data given their compliant IR, but its performance is degraded for tabular data even at full tree complexity (100%) due to the stochasticity of the underlying IR. TREE requires higher complexity, i.e., deeper trees, than TREE to achieve comparable fidelity since the post-processing step makes the surrogate faithful with respect to the minimal interpretable data points but at the same time sub-optimal for the remainder of the interpretable space, which is especially detrimental for stochastic IRs where each minimal interpretable data point corresponds to multiple instances in the original data domain.

The version of LIMEtree without a depth bound – TREE, which is equivalent to TREE@100% – achieves full fidelity across the board for a deterministic IR (images), where it faithfully models the entire interpretable data space by constructing one leaf per instance, but fails to do so for non-deterministic IR (tabular) because each tree leaf has to model multiple distinct data points. By allowing deeper trees we reduce the impurity of their leaves, which improves the overall performance of the surrogates – an intuitive relation between the complexity of the trees and their fidelity, two representative examples of which are shown in Figure 4.

6.3. Pilot User Study

To assess usefulness of our approach we ran a pilot user study with eight participants, exposing them to LIME (Figure 1) and LIMEtree (Figure 2) explanations in a random order without revealing the method’s name. The study consisted of two sections, one per explainer, displaying an image split into three segments, with each part enclosing a unique object, e.g., a cat, a dog and a ball. The two most pertinent black-box predictions for each object were then explained with both methods – e.g., tabby and tiger cat for the cat object, golden retriever and Labrador retriever for the dog object, and tennis ball and croquet ball for the ball object – yielding six LIME explanations and a single multi-output tree spanning all the six predictions. The participants were offered a brief tutorial illustrating how to parse the tree structure to obtain a variety of explanations.

The participants were then asked about the expected behaviour of the black box in relation to any two out of the three displayed objects for each explainer – six questions in total as the relations are assumed to be non-reflexive. For example, “How does the presence of the cat object affect the model’s confidence of a presence of the dog object?”, with three possible answers: confidence decreases, confidence not affected and confidence increases. This question formulation was chosen to avoid a bias towards either explainer since we could neither ask for importance or influence of each object on a particular prediction (LIME’s domain), nor the relation between an object and a prediction, e.g., a what-if question (LIMEtree’s domain). Before viewing the explanations, the participants were asked to answer a similar set of questions using only their intuition, which allowed us to assess whether the explainees still relied on their intuition when explicitly asked to work with the explanations.

Our findings indicate a negligible overlap between the responses based on the participants’ intuition and both explainers; they also show that LIMEtree helped the participants to answer 25% more of the questions correctly as compared to LIME. All of the participants indicated that using LIME was either easy or very easy, and at the same time rated the process of manually extracting LIMEtree explanations as either difficult or very difficult, despite many of the explainees having AI background. This disparity in conjunction with subpar performance when using LIME suggests that the explainees misinterpreted its explanations and were overconfident (Small et al., 2023; Xuan et al., 2023); good performance when working with LIMEtree despite the difficulty in using its explanations, on the other hand, is promising given that the process of extracting them can be easily automated.

7. Conclusion and Future Work

In this paper we introduced the concept of multi-class explainability and proposed a surrogate explainer based on multi-output regression trees called LIMEtree. We then analysed its various properties and guarantees, and showed how it can achieve full fidelity. Next, we demonstrated how LIMEtree improves upon LIME and discussed the benefits of using trees as the surrogate model. We supported these claims with an assessment of its properties based on XAI desiderata as well as a collection of quantitative experiments and a pilot user study. In future work we will implement methods to algorithmically extract human-centred explanations from (surrogate) trees and evaluate them with large-scale user studies.

Acknowledgements.
This research was supported by the TAILOR project, funded by EU Horizon 2020 research and innovation programme (grant agreement number 952215). We would also like to acknowledge contributions of Alexander Hepburn and Raul Santos-Rodriguez, who helped with the development of the code used for the experiments and offered insightful feedback.

References

  • (1)
  • Achanta et al. (2012) Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.
  • Aeberhard and Forina (1991) Stefan Aeberhard and M Forina. 1991. Wine. UCI Machine Learning Repository.
  • Blackard (1998) Jock Blackard. 1998. Forest Covertypes. UCI Machine Learning Repository.
  • Breiman (2001) Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.
  • Breiman et al. (1984) Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC Press.
  • Byrne (2023) Ruth MJ Byrne. 2023. Good explanations in explainable artificial intelligence (XAI): Evidence from human explanatory reasoning. In IJCAI. 6536–6544.
  • Carlevaro et al. (2023) Alberto Carlevaro, Marta Lenatti, Alessia Paglialonga, and Maurizio Mongelli. 2023. Multi-class counterfactual explanations using support vector data description. IEEE Transactions on Artificial Intelligence (2023).
  • Chen (2021) Yaofo Chen. 2021. PyTorch CIFAR models. https://github.com/chenyaofo/pytorch-cifar-models.
  • Craven and Shavlik (1996) Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems. 24–30.
  • Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
  • Doshi-Velez and Kim (2017) Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017). arXiv:1702.08608
  • Garreau and Luxburg (2020) Damien Garreau and Ulrike Luxburg. 2020. Explaining the explainer: A first theoretical analysis of LIME. In International Conference on Artificial Intelligence and Statistics. PMLR, 1287–1296.
  • Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) 51, 5 (2018), 1–42.
  • Hastie and Tibshirani (1986) Trevor Hastie and Robert Tibshirani. 1986. Generalized additive models. Statist. Sci. 1, 3 (1986), 297–310.
  • Karimi et al. (2021) Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 353–362.
  • Keane et al. (2021) Mark T Keane, Eoin M Kenny, Eoin Delaney, and Barry Smyth. 2021. If only we had better counterfactual explanations: Five key deficits to rectify in the evaluation of counterfactual XAI techniques. In IJCAI. 4466–4474.
  • Krizhevsky and Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images.
  • Laugel et al. (2018) Thibault Laugel, Xavier Renard, Marie-Jeanne Lesot, Christophe Marsala, and Marcin Detyniecki. 2018. Defining locality for surrogates in post-hoc interpretablity. In 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018).
  • Lou et al. (2012) Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 150–158.
  • Miller (2018) Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence (2018).
  • Miller (2023) Tim Miller. 2023. Explainable AI is dead, long live explainable AI! Hypothesis-driven decision support using evaluative AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 333–342.
  • Mittelstadt et al. (2019) Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the 2019 ACM Conference on Fairness, Accountability, and Transparency. 279–288.
  • Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  • Poyiadzi et al. (2020) Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: Feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344–350.
  • Ribeiro et al. (2016) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144.
  • Romashov et al. (2022) Piotr Romashov, Martin Gjoreski, Kacper Sokol, Maria Vanina Martinez, and Marc Langheinrich. 2022. BayCon: Model-agnostic Bayesian counterfactual generator. In IJCAI. 740–746.
  • Rudin (2019) Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
  • Shi et al. (2019) Sheng Shi, Xinfeng Zhang, Haisheng Li, and Wei Fan. 2019. Explaining the predictions of any image classifier via decision trees. (2019). arXiv:1911.01058
  • Small et al. (2023) Edward Small, Yueqing Xuan, Danula Hettiachchi, and Kacper Sokol. 2023. Helpful, misleading or confusing: How humans perceive fundamental building blocks of artificial intelligence explanations. In Proceedings of the ACM CHI 2023 Workshop on Human-Centered Explainable AI (HCXAI).
  • Sokol (2021) Kacper Sokol. 2021. Towards intelligible and robust surrogate explainers: A decision tree perspective. Ph. D. Dissertation. University of Bristol.
  • Sokol and Flach (2020a) Kacper Sokol and Peter Flach. 2020a. Explainability fact sheets: A framework for systematic assessment of explainable approaches. In Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency. 56–67.
  • Sokol and Flach (2020b) Kacper Sokol and Peter Flach. 2020b. One explanation does not fit all: The promise of interactive explanations for machine learning transparency. KI-Künstliche Intelligenz (2020), 1–16.
  • Sokol and Flach (2021) Kacper Sokol and Peter Flach. 2021. Explainability is in the mind of the beholder: Establishing the foundations of explainable artificial intelligence. (2021). arXiv:2112.14466
  • Sokol and Flach (2024) Kacper Sokol and Peter Flach. 2024. Interpretable representations in explainable AI: From theory to practice. Data Mining and Knowledge Discovery (2024), 1–39.
  • Sokol and Flach (2018) Kacper Sokol and Peter A Flach. 2018. Glass-Box: Explaining AI decisions with counterfactual statements through conversation with a voice-enabled virtual assistant. In IJCAI. 5868–5870.
  • Sokol et al. (2020) Kacper Sokol, Alexander Hepburn, Rafael Poyiadzi, Matthew Clifford, Raul Santos-Rodriguez, and Peter Flach. 2020. FAT Forensics: A Python toolbox for implementing and deploying fairness, accountability and transparency algorithms in predictive systems. Journal of Open Source Software 5, 49 (2020), 1904.
  • Sokol et al. (2019) Kacper Sokol, Alexander Hepburn, Raul Santos-Rodriguez, and Peter Flach. 2019. bLIMEy: Surrogate prediction explanations beyond LIME. In 2019 Workshop on Human-Centric Machine Learning (HCML 2019) at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).
  • Sokol et al. (2022a) Kacper Sokol, Alexander Hepburn, Raul Santos-Rodriguez, and Peter Flach. 2022a. What and how of machine learning transparency: Building bespoke explainability tools with interoperable algorithmic components. Journal of Open Source Education 5, 58 (2022), 175.
  • Sokol et al. (2022b) Kacper Sokol, Raul Santos-Rodriguez, and Peter Flach. 2022b. FAT Forensics: A Python toolbox for algorithmic fairness, accountability and transparency. Software Impacts 14 (2022), 100406.
  • Sokol and Vogt (2023) Kacper Sokol and Julia E Vogt. 2023. (Un)reasonable allure of ante-hoc interpretability for high-stakes domains: Transparency is necessary but insufficient for comprehensibility. In 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at 2023 International Conference on Machine Learning (ICML).
  • Sokol and Vogt (2024) Kacper Sokol and Julia E Vogt. 2024. What does evaluation of explainable artificial intelligence actually tell us? A case for compositional and contextual validation of XAI building blocks. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–8.
  • Tolomei et al. (2017) Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 465–474.
  • Waa et al. (2018) Jasper van der Waa, Marcel Robeer, J van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. 2018. Contrastive explanations with local foil trees. In 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018).
  • Wachter et al. (2017) Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GPDR. Harvard Journal of Law & Technology 31 (2017), 841.
  • Weld and Bansal (2019) Daniel S Weld and Gagan Bansal. 2019. The challenge of crafting intelligible intelligence. Commun. ACM 62, 6 (2019), 70–79.
  • Xuan et al. (2023) Yueqing Xuan, Edward Small, Kacper Sokol, Danula Hettiachchi, and Mark Sanderson. 2023. Can users correctly interpret machine learning explanations and simultaneously identify their limitations? (2023). arXiv:2309.08438
  • Zhang et al. (2017) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
  • Zhang et al. (2019b) Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, and Rich Caruana. 2019b. Axiomatic interpretability for multiclass additive models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226–234.
  • Zhang et al. (2019a) Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. 2019a. “Why should you trust my explanation?” Understanding uncertainty in LIME explanations. In AI for Social Good Workshop at the 36th International Conference on Machine Learning (ICML 2019).