Search | arXiv e-print repository

Tell Me a Story! Narrative-Driven XAI with Large Language Models

Authors: David Martens, James Hinns, Camille Dams, Mark Vergouwen, Theodoros Evgeniou

Abstract: In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the need for Explainable AI (XAI). Existing XAI approaches, such as the widely used SHAP values or counterfactual (CF) explanations, are arguably often too technical for users to understand and act upon. To enhance comprehension of explanations of AI decisions and… ▽ More In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the need for Explainable AI (XAI). Existing XAI approaches, such as the widely used SHAP values or counterfactual (CF) explanations, are arguably often too technical for users to understand and act upon. To enhance comprehension of explanations of AI decisions and the overall user experience, we introduce XAIstories, which leverage Large Language Models to provide narratives about how AI predictions are made: SHAPstories do so based on SHAP explanations, while CFstories do so for CF explanations. We study the impact of our approach on users' experience and understanding of AI predictions. Our results are striking: over 90% of the surveyed general audience finds the narratives generated by SHAPstories convincing. Data scientists primarily see the value of SHAPstories in communicating explanations to a general audience, with 83% of data scientists indicating they are likely to use SHAPstories for this purpose. In an image classification setting, CFstories are considered more or equally convincing as the users' own crafted stories by more than 75% of the participants. CFstories additionally bring a tenfold speed gain in creating a narrative. We also find that SHAPstories help users to more accurately summarize and understand AI decisions, in a credit scoring setting we test, correctly answering comprehension questions significantly more often than they do when only SHAP values are provided. The results thereby suggest that XAIstories may significantly help explaining and understanding AI predictions, ultimately supporting better decision-making in various applications. △ Less

Submitted 12 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2306.13885 [pdf, other]

Manipulation Risks in Explainable AI: The Implications of the Disagreement Problem

Authors: Sofie Goethals, David Martens, Theodoros Evgeniou

Abstract: Artificial Intelligence (AI) systems are increasingly used in high-stakes domains of our life, increasing the need to explain these decisions and to make sure that they are aligned with how we want the decision to be made. The field of Explainable AI (XAI) has emerged in response. However, it faces a significant challenge known as the disagreement problem, where multiple explanations are possible… ▽ More Artificial Intelligence (AI) systems are increasingly used in high-stakes domains of our life, increasing the need to explain these decisions and to make sure that they are aligned with how we want the decision to be made. The field of Explainable AI (XAI) has emerged in response. However, it faces a significant challenge known as the disagreement problem, where multiple explanations are possible for the same AI decision or prediction. While the existence of the disagreement problem is acknowledged, the potential implications associated with this problem have not yet been widely studied. First, we provide an overview of the different strategies explanation providers could deploy to adapt the returned explanation to their benefit. We make a distinction between strategies that attack the machine learning model or underlying data to influence the explanations, and strategies that leverage the explanation phase directly. Next, we analyse several objectives and concrete scenarios the providers could have to engage in this behavior, and the potential dangerous consequences this manipulative behavior could have on society. We emphasize that it is crucial to investigate this issue now, before these methods are widely implemented, and propose some mitigation strategies. △ Less

Submitted 27 June, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2109.01450 [pdf, other]

Epidemic Models for COVID-19 during the First Wave from February to May 2020: a Methodological Review

Authors: Marie Garin, Myrto Limnios, Alice Nicolaï, Ioannis Bargiotas, Olivier Boulant, Stephen Chick, Amir Dib, Theodoros Evgeniou, Mathilde Fekom, Argyris Kalogeratos, Christophe Labourdette, Anton Ovchinnikov, Raphaël Porcher, Camille Pouchol, Nicolas Vayatis

Abstract: We review epidemiological models for the propagation of the COVID-19 pandemic during the early months of the outbreak: from February to May 2020. The aim is to propose a methodological review that highlights the following characteristics: (i) the epidemic propagation models, (ii) the modeling of intervention strategies, (iii) the models and estimation procedures of the epidemic parameters and (iv)… ▽ More We review epidemiological models for the propagation of the COVID-19 pandemic during the early months of the outbreak: from February to May 2020. The aim is to propose a methodological review that highlights the following characteristics: (i) the epidemic propagation models, (ii) the modeling of intervention strategies, (iii) the models and estimation procedures of the epidemic parameters and (iv) the characteristics of the data used. We finally selected 80 articles from open access databases based on criteria such as the theoretical background, the reproducibility, the incorporation of interventions strategies, etc. It mainly resulted to phenomenological, compartmental and individual-level models. A digital companion including an online sheet, a Kibana interface and a markdown document is proposed. Finally, this work provides an opportunity to witness how the scientific community reacted to this unique situation. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2107.02624 [pdf, other]

Understanding Consumer Preferences for Explanations Generated by XAI Algorithms

Authors: Yanou Ramon, Tom Vermeire, Olivier Toubia, David Martens, Theodoros Evgeniou

Abstract: Explaining firm decisions made by algorithms in customer-facing applications is increasingly required by regulators and expected by customers. While the emerging field of Explainable Artificial Intelligence (XAI) has mainly focused on develo** algorithms that generate such explanations, there has not yet been sufficient consideration of customers' preferences for various types and formats of exp… ▽ More Explaining firm decisions made by algorithms in customer-facing applications is increasingly required by regulators and expected by customers. While the emerging field of Explainable Artificial Intelligence (XAI) has mainly focused on develo** algorithms that generate such explanations, there has not yet been sufficient consideration of customers' preferences for various types and formats of explanations. We discuss theoretically and study empirically people's preferences for explanations of algorithmic decisions. We focus on three main attributes that describe automatically-generated explanations from existing XAI algorithms (format, complexity, and specificity), and capture differences across contexts (online targeted advertising vs. loan applications) as well as heterogeneity in users' cognitive styles. Despite their popularity among academics, we find that counterfactual explanations are not popular among users, unless they follow a negative outcome (e.g., loan application was denied). We also find that users are willing to tolerate some complexity in explanations. Finally, our results suggest that preferences for specific (vs. more abstract) explanations are related to the level at which the decision is construed by the user, and to the deliberateness of the user's cognitive style. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: 18 pages, 1 appendix, 3 figures, 4 tables

arXiv:2003.04792 [pdf, other]

doi 10.1007/s10994-021-05981-0

Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data

Authors: Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene Praet

Abstract: Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant… ▽ More Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations. △ Less

Submitted 4 March, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: 31 pages, 13 figures

arXiv:1912.01819 [pdf, other]

doi 10.1007/s11634-020-00418-3

Counterfactual Explanation Algorithms for Behavioral and Textual Data

Authors: Yanou Ramon, David Martens, Foster Provost, Theodoros Evgeniou

Abstract: We study the interpretability of predictive systems that use high-dimensonal behavioral and textual data. Examples include predicting product interest based on online browsing data and detecting spam emails or objectionable web content. Recently, counterfactual explanations have been proposed for generating insight into model predictions, which focus on what is relevant to a particular instance. C… ▽ More We study the interpretability of predictive systems that use high-dimensonal behavioral and textual data. Examples include predicting product interest based on online browsing data and detecting spam emails or objectionable web content. Recently, counterfactual explanations have been proposed for generating insight into model predictions, which focus on what is relevant to a particular instance. Conducting a complete search to compute counterfactuals is very time-consuming because of the huge dimensionality. To our knowledge, for behavioral and text data, only one model-agnostic heuristic algorithm (SEDC) for finding counterfactual explanations has been proposed in the literature. However, there may be better algorithms for finding counterfactuals quickly. This study aligns the recently proposed Linear Interpretable Model-agnostic Explainer (LIME) and Shapley Additive Explanations (SHAP) with the notion of counterfactual explanations, and empirically benchmarks their effectiveness and efficiency against SEDC using a collection of 13 data sets. Results show that LIME-Counterfactual (LIME-C) and SHAP-Counterfactual (SHAP-C) have low and stable computation times, but mostly, they are less efficient than SEDC. However, for certain instances on certain data sets, SEDC's run time is comparably large. With regard to effectiveness, LIME-C and SHAP-C find reasonable, if not always optimal, counterfactual explanations. SHAP-C, however, seems to have difficulties with highly unbalanced data. Because of its good overall performance, LIME-C seems to be a favorable alternative to SEDC, which failed for some nonlinear models to find counterfactuals because of the particular heuristic search algorithm it uses. A main upshot of this paper is that there is a good deal of room for further research. For example, we propose algorithmic adjustments that are direct upshots of the paper's findings. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: 24 pages, 7 figures, currently under review

arXiv:1808.06452 [pdf]

doi 10.1016/j.neuroimage.2018.08.042

Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data

Authors: Jorge Samper-González, Ninon Burgos, Simona Bottani, Sabrina Fontanella, Pascal Lu, Arnaud Marcoux, Alexandre Routier, Jérémy Guillon, Michael Bacci, Junhao Wen, Anne Bertrand, Hugo Bertin, Marie-Odile Habert, Stanley Durrleman, Theodoros Evgeniou, Olivier Colliot

Abstract: A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different appro… ▽ More A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different approaches is also difficult to compare objectively. In particular, it is often difficult to assess which part of the method provides a real improvement, if any. We propose a framework for reproducible and objective classification experiments in AD using three publicly available datasets (ADNI, AIBL and OASIS). The framework comprises: i) automatic conversion of the three datasets into BIDS format, ii) a modular set of preprocessing pipelines, feature extraction and classification methods, together with an evaluation framework, that provide a baseline for benchmarking the different components. We demonstrate the use of the framework for a large-scale evaluation on 1960 participants using T1 MRI and FDG PET data. In this evaluation, we assess the influence of different modalities, preprocessing, feature types, classifiers, training set sizes and datasets. Performances were in line with the state-of-the-art. FDG PET outperformed T1 MRI for all classification tasks. No difference in performance was found for the use of different atlases, image smoothing, partial volume correction of FDG PET images, or feature type. Linear SVM and L2-logistic regression resulted in similar performance and both outperformed random forests. The classification performance increased along with the number of subjects used for training. Classifiers trained on ADNI generalized well to AIBL and OASIS. All the code of the framework and the experiments is publicly available at: https://gitlab.icm-institute.org/aramislab/AD-ML. △ Less

Submitted 20 August, 2018; originally announced August 2018.

arXiv:1709.07267 [pdf, other]

doi 10.1007/978-3-319-67389-9_7

Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease

Authors: Jorge Samper-González, Ninon Burgos, Sabrina Fontanella, Hugo Bertin, Marie-Odile Habert, Stanley Durrleman, Theodoros Evgeniou, Olivier Colliot

Abstract: In recent years, the number of papers on Alzheimer's disease classification has increased dramatically, generating interesting methodological ideas on the use machine learning and feature extraction methods. However, practical impact is much more limited and, eventually, one could not tell which of these approaches are the most efficient. While over 90\% of these works make use of ADNI an objectiv… ▽ More In recent years, the number of papers on Alzheimer's disease classification has increased dramatically, generating interesting methodological ideas on the use machine learning and feature extraction methods. However, practical impact is much more limited and, eventually, one could not tell which of these approaches are the most efficient. While over 90\% of these works make use of ADNI an objective comparison between approaches is impossible due to variations in the subjects included, image pre-processing, performance metrics and cross-validation procedures. In this paper, we propose a framework for reproducible classification experiments using multimodal MRI and PET data from ADNI. The core components are: 1) code to automatically convert the full ADNI database into BIDS format; 2) a modular architecture based on Nipype in order to easily plug-in different classification and feature extraction tools; 3) feature extraction pipelines for MRI and PET data; 4) baseline classification approaches for unimodal and multimodal features. This provides a flexible framework for benchmarking different feature extraction and classification tools in a reproducible manner. We demonstrate its use on all (1519) baseline T1 MR images and all (1102) baseline FDG PET images from ADNI 1, GO and 2 with SPM-based feature extraction pipelines and three different classification techniques (linear SVM, anatomically regularized SVM and multiple kernel learning SVM). The highest accuracies achieved were: 91% for AD vs CN, 83% for MCIc vs CN, 75% for MCIc vs MCInc, 94% for AD-A$β$+ vs CN-A$β$- and 72% for MCIc-A$β$+ vs MCInc-A$β$+. The code is publicly available at https://gitlab.icm-institute.org/aramislab/AD-ML (depends on the Clinica software platform, publicly available at http://www.clinica.run). △ Less

Submitted 21 September, 2017; originally announced September 2017.

Journal ref: Proc. Machine Learning in Medical Imaging MLMI 2017, MICCAI Worskhop, Lecture Notes in Computer Science, volume 10541, pp 53-60, Springer

arXiv:1203.5438 [pdf, ps, other]

A Regularization Approach for Prediction of Edges and Node Features in Dynamic Graphs

Authors: Emile Richard, Andreas Argyriou, Theodoros Evgeniou, Nicolas Vayatis

Abstract: We consider the two problems of predicting links in a dynamic graph sequence and predicting functions defined at each node of the graph. In many applications, the solution of one problem is useful for solving the other. Indeed, if these functions reflect node features, then they are related through the graph structure. In this paper, we formulate a hybrid approach that simultaneously learns the st… ▽ More We consider the two problems of predicting links in a dynamic graph sequence and predicting functions defined at each node of the graph. In many applications, the solution of one problem is useful for solving the other. Indeed, if these functions reflect node features, then they are related through the graph structure. In this paper, we formulate a hybrid approach that simultaneously learns the structure of the graph and predicts the values of the node-related functions. Our approach is based on the optimization of a joint regularization objective. We empirically test the benefits of the proposed method with both synthetic and real data. The results indicate that joint regularization improves prediction performance over the graph evolution and the node features. △ Less

Submitted 24 March, 2012; originally announced March 2012.

arXiv:0802.1430 [pdf, ps, other]

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

Authors: Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert

Abstract: We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators from "users" to the "objects" they rate. Recent low-rank type matrix completion approaches to CF are shown to be special cases. However, unlike existing regularization based CF methods, our approach can be used to also incorporate information such as attributes of the users or t… ▽ More We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators from "users" to the "objects" they rate. Recent low-rank type matrix completion approaches to CF are shown to be special cases. However, unlike existing regularization based CF methods, our approach can be used to also incorporate information such as attributes of the users or the objects -- a limitation of existing regularization based CF methods. We then provide novel representer theorems that we use to develop new estimation methods. We provide learning algorithms based on low-rank decompositions, and test them on a standard CF dataset. The experiments indicate the advantages of generalizing the existing regularization based CF methods to incorporate related information about users and objects. Finally, we show that certain multi-task learning methods can be also seen as special cases of our proposed approach. △ Less

Submitted 19 December, 2008; v1 submitted 11 February, 2008; originally announced February 2008.

arXiv:cs/0611124 [pdf, ps, other]

Low-rank matrix factorization with attributes

Authors: Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert

Abstract: We develop a new collaborative filtering (CF) method that combines both previously known users' preferences, i.e. standard CF, as well as product/user attributes, i.e. classical function approximation, to predict a given user's interest in a particular product. Our method is a generalized low rank matrix completion problem, where we learn a function whose inputs are pairs of vectors -- the stand… ▽ More We develop a new collaborative filtering (CF) method that combines both previously known users' preferences, i.e. standard CF, as well as product/user attributes, i.e. classical function approximation, to predict a given user's interest in a particular product. Our method is a generalized low rank matrix completion problem, where we learn a function whose inputs are pairs of vectors -- the standard low rank matrix completion problem being a special case where the inputs to the function are the row and column indices of the matrix. We solve this generalized matrix completion problem using tensor product kernels for which we also formally generalize standard kernel properties. Benchmark experiments on movie ratings show the advantages of our generalized matrix completion method over the standard matrix completion one with no information about movies or people, as well as over standard multi-task or single task learning methods. △ Less

Submitted 24 November, 2006; originally announced November 2006.

Comments: 12 pages, 2 figures

Report number: N-24/06/MM

Showing 1–11 of 11 results for author: Evgeniou, T