-
Efficient and Accurate Explanation Estimation with Distribution Compression
Authors:
Hubert Baniecki,
Giuseppe Casalicchio,
Bernd Bischl,
Przemyslaw Biecek
Abstract:
Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad sp…
▽ More
Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad spectrum of algorithms for explanation estimation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm for more efficient and accurate explanation estimation. CTE uses distribution compression through kernel thinning to obtain a data sample that best approximates the marginal distribution. We show that CTE improves the estimation of removal-based local and global explanations with negligible computational overhead. It often achieves an on-par explanation approximation error using 2-3x less samples, i.e. requiring 2-3x less model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
On the Robustness of Global Feature Effect Explanations
Authors:
Hubert Baniecki,
Giuseppe Casalicchio,
Bernd Bischl,
Przemyslaw Biecek
Abstract:
We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bo…
▽ More
We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Does context matter in digital pathology?
Authors:
Paulina Tomaszewska,
Mateusz Sperkowski,
Przemysław Biecek
Abstract:
The development of Artificial Intelligence for healthcare is of great importance. Models can sometimes achieve even superior performance to human experts, however, they can reason based on spurious features. This is not acceptable to the experts as it is expected that the models catch the valid patterns in the data following domain expertise. In the work, we analyse whether Deep Learning (DL) mode…
▽ More
The development of Artificial Intelligence for healthcare is of great importance. Models can sometimes achieve even superior performance to human experts, however, they can reason based on spurious features. This is not acceptable to the experts as it is expected that the models catch the valid patterns in the data following domain expertise. In the work, we analyse whether Deep Learning (DL) models for vision follow the histopathologists' practice so that when diagnosing a part of a lesion, they take into account also the surrounding tissues which serve as context. It turns out that the performance of DL models significantly decreases when the amount of contextual information is limited, therefore contextual information is valuable at prediction time. Moreover, we show that the models sometimes behave in an unstable way as for some images, they change the predictions many times depending on the size of the context. It may suggest that partial contextual information can be misleading.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification
Authors:
Mustafa Cavus,
Przemysław Biecek
Abstract:
Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical pre-processing steps in the modeling process. However, there have been debates and questioning of the functionality o…
▽ More
Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical pre-processing steps in the modeling process. However, there have been debates and questioning of the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, which is called the Rashomon effect, in model selection. Selecting one of them without considering predictive multiplicity which is the case of yielding conflicting models' predictions for any sample may lead to a loss of using another model. In this study, in addition to the existing debates, the impact of balancing methods on predictive multiplicity is examined through the Rashomon effect. It is important because the blind model selection is risky from a set of approximately equally accurate models. This may lead to serious problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect. Our findings showed that balancing methods inflate the predictive multiplicity, and they yield varying results. To monitor the trade-off between performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended performance-gain plot for the Rashomon effect.
△ Less
Submitted 24 June, 2024; v1 submitted 22 March, 2024;
originally announced May 2024.
-
Position: Do Not Explain Vision Models Without Context
Authors:
Paulina Tomaszewska,
Przemysław Biecek
Abstract:
Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of course, depends on the contextual relationship of the two objects. If it's obvious, why don't explanation methods for vision models use contextual information? In this paper, we (1) review the most popular methods of explaining computer vision models by pointing out that they do not take into account conte…
▽ More
Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of course, depends on the contextual relationship of the two objects. If it's obvious, why don't explanation methods for vision models use contextual information? In this paper, we (1) review the most popular methods of explaining computer vision models by pointing out that they do not take into account context information, (2) show examples of failures of popular XAI methods, (3) provide examples of real-world use cases where spatial context plays a significant role, (4) propose new research directions that may lead to better use of context information in explaining computer vision models, (5) argue that a change in approach to explanations is needed from 'where' to 'how'.
△ Less
Submitted 2 June, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Resistance Against Manipulative AI: key factors and possible actions
Authors:
Piotr Wilczyński,
Wiktoria Mieleszczenko-Kowszewicz,
Przemysław Biecek
Abstract:
If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of large language models (LLMs) to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of…
▽ More
If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of large language models (LLMs) to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking language models to create manipulative statements. Then, we analyze their obedience, the persuasion strategies used, and the choice of vocabulary. Based on these experiments, we discuss two actions that can protect us from LLM manipulation. In the long term, we put AI literacy at the forefront, arguing that educating society would minimize the risk of manipulation and its consequences. We also propose an ad hoc solution, a classifier that detects manipulation of LLMs - a Manipulation Fuse.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Global Counterfactual Directions
Authors:
Bartlomiej Sobieski,
Przemysław Biecek
Abstract:
Despite increasing progress in development of methods for generating visual counterfactual explanations, especially with the recent rise of Denoising Diffusion Probabilistic Models, previous works consider them as an entirely local technique. In this work, we take the first step at globalizing them. Specifically, we discover that the latent space of Diffusion Autoencoders encodes the inference pro…
▽ More
Despite increasing progress in development of methods for generating visual counterfactual explanations, especially with the recent rise of Denoising Diffusion Probabilistic Models, previous works consider them as an entirely local technique. In this work, we take the first step at globalizing them. Specifically, we discover that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions. We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner. Precisely, g-directions allow for flip** the decision of a given classifier on an entire dataset of images, while h-directions further increase the diversity of explanations. We refer to them in general as Global Counterfactual Directions (GCDs). Moreover, we show that GCDs can be naturally combined with Latent Integrated Gradients resulting in a new black-box attribution method, while simultaneously enhancing the understanding of counterfactual explanations. We validate our approach on existing benchmarks and show that it generalizes to real-world use-cases.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
CNN-based explanation ensembling for dataset, representation and explanations evaluation
Authors:
Weronika Hryniewska-Guzik,
Luca Longo,
Przemysław Biecek
Abstract:
Explainable Artificial Intelligence has gained significant attention due to the widespread use of complex deep learning models in high-stake domains such as medicine, finance, and autonomous cars. However, different explanations often present different aspects of the model's behavior. In this research manuscript, we explore the potential of ensembling explanations generated by deep classification…
▽ More
Explainable Artificial Intelligence has gained significant attention due to the widespread use of complex deep learning models in high-stake domains such as medicine, finance, and autonomous cars. However, different explanations often present different aspects of the model's behavior. In this research manuscript, we explore the potential of ensembling explanations generated by deep classification models using convolutional model. Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior, leading to the possibility of evaluating the representation learned by the model. With our method, we can uncover problems of under-representation of images in a certain class. Moreover, we discuss other side benefits like features' reduction by replacing the original image with its explanations resulting in the removal of some sensitive information. Through the use of carefully selected evaluation metrics from the Quantus library, we demonstrated the method's superior performance in terms of Localisation and Faithfulness, compared to individual explanations.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
A comparative analysis of deep learning models for lung segmentation on X-ray images
Authors:
Weronika Hryniewska-Guzik,
Jakub Bilski,
Bartosz Chrostowski,
Jakub Drak Sbahi,
Przemysław Biecek
Abstract:
Robust and highly accurate lung segmentation in X-rays is crucial in medical imaging. This study evaluates deep learning solutions for this task, ranking existing methods and analyzing their performance under diverse image modifications. Out of 61 analyzed papers, only nine offered implementation or pre-trained models, enabling assessment of three prominent methods: Lung VAE, TransResUNet, and CE-…
▽ More
Robust and highly accurate lung segmentation in X-rays is crucial in medical imaging. This study evaluates deep learning solutions for this task, ranking existing methods and analyzing their performance under diverse image modifications. Out of 61 analyzed papers, only nine offered implementation or pre-trained models, enabling assessment of three prominent methods: Lung VAE, TransResUNet, and CE-Net. The analysis revealed that CE-Net performs best, demonstrating the highest values in dice similarity coefficient and intersection over union metric.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Red-Teaming Segment Anything Model
Authors:
Krzysztof Jankowski,
Bartlomiej Sobieski,
Mateusz Kwiatkowski,
Jakub Szulc,
Michal Janik,
Hubert Baniecki,
Przemyslaw Biecek
Abstract:
Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against chal…
▽ More
Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Interpretable Machine Learning for Survival Analysis
Authors:
Sophie Hanna Langbein,
Mateusz Krzyziński,
Mikołaj Spytek,
Hubert Baniecki,
Przemysław Biecek,
Marvin N. Wright
Abstract:
With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clin…
▽ More
With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine learning for predicting time-to-event data. We present a comprehensive review of the limited existing amount of work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to real data on data on under-5 year mortality of Ghanaian children from the Demographic and Health Surveys (DHS) Program serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI
Authors:
Vladimir Zaigrajew,
Hubert Baniecki,
Lukasz Tulczyjew,
Agata M. Wijata,
Jakub Nalepa,
Nicolas Longépé,
Przemyslaw Biecek
Abstract:
Remote sensing (RS) applications in the space domain demand machine learning (ML) models that are reliable, robust, and quality-assured, making red teaming a vital approach for identifying and exposing potential flaws and biases. Since both fields advance independently, there is a notable gap in integrating red teaming strategies into RS. This paper introduces a methodology for examining ML models…
▽ More
Remote sensing (RS) applications in the space domain demand machine learning (ML) models that are reliable, robust, and quality-assured, making red teaming a vital approach for identifying and exposing potential flaws and biases. Since both fields advance independently, there is a notable gap in integrating red teaming strategies into RS. This paper introduces a methodology for examining ML models operating on hyperspectral images within the HYPERVIEW challenge, focusing on soil parameters' estimation. We use post-hoc explanation methods from the Explainable AI (XAI) domain to critically assess the best performing model that won the HYPERVIEW challenge and served as an inspiration for the model deployed on board the INTUITION-1 hyperspectral mission. Our approach effectively red teams the model by pinpointing and validating key shortcomings, constructing a model that achieves comparable performance using just 1% of the input features and a mere up to 5% performance loss. Additionally, we propose a novel way of visualizing explanations that integrate domain-specific information about hyperspectral bands (wavelengths) and data transformations to better suit interpreting models for hyperspectral image analysis.
△ Less
Submitted 14 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Position: Explain to Question not to Justify
Authors:
Przemyslaw Biecek,
Wojciech Samek
Abstract:
Explainable Artificial Intelligence (XAI) is a young but very promising field of research. Unfortunately, the progress in this field is currently slowed down by divergent and incompatible goals. We separate various threads tangled within the area of XAI into two complementary cultures of human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). This positio…
▽ More
Explainable Artificial Intelligence (XAI) is a young but very promising field of research. Unfortunately, the progress in this field is currently slowed down by divergent and incompatible goals. We separate various threads tangled within the area of XAI into two complementary cultures of human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). This position paper argues that the area of RED XAI is currently under-explored, i.e., more methods for explainability are desperately needed to question models (e.g., extract knowledge from well-performing models as well as spotting and fixing bugs in faulty models), and the area of RED XAI hides great opportunities and potential for important research necessary to ensure the safety of AI systems. We conclude this paper by presenting promising challenges in this area.
△ Less
Submitted 28 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Underestimation of lung regions on chest X-ray segmentation masks assessed by comparison with total lung volume evaluated on computed tomography
Authors:
Przemysław Bombiński,
Patryk Szatkowski,
Bartłomiej Sobieski,
Tymoteusz Kwieciński,
Szymon Płotka,
Mariusz Adamek,
Marcin Banasiuk,
Mariusz I. Furmanek,
Przemysław Biecek
Abstract:
Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks cre…
▽ More
Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks created by following the contours of the heart, mediastinum, and diaphragm significantly underestimate lung regions and exclude substantial portions of the lungs from further assessment, which may result in numerous clinical errors.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
NormEnsembleXAI: Unveiling the Strengths and Weaknesses of XAI Ensemble Techniques
Authors:
Weronika Hryniewska-Guzik,
Bartosz Sawicki,
Przemysław Biecek
Abstract:
This paper presents a comprehensive comparative analysis of explainable artificial intelligence (XAI) ensembling methods. Our research brings three significant contributions. Firstly, we introduce a novel ensembling method, NormEnsembleXAI, that leverages minimum, maximum, and average functions in conjunction with normalization techniques to enhance interpretability. Secondly, we offer insights in…
▽ More
This paper presents a comprehensive comparative analysis of explainable artificial intelligence (XAI) ensembling methods. Our research brings three significant contributions. Firstly, we introduce a novel ensembling method, NormEnsembleXAI, that leverages minimum, maximum, and average functions in conjunction with normalization techniques to enhance interpretability. Secondly, we offer insights into the strengths and weaknesses of XAI ensemble methods. Lastly, we provide a library, facilitating the practical implementation of XAI ensembling, thus promoting the adoption of transparent and interpretable deep learning models.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Deep spatial context: when attention-based models meet spatial regression
Authors:
Paulina Tomaszewska,
Elżbieta Sienkiewicz,
Mai P. Hoang,
Przemysław Biecek
Abstract:
We propose 'Deep spatial context' (DSCon) method, which serves for investigation of the attention-based vision models using the concept of spatial context. It was inspired by histopathologists, however, the method can be applied to various domains. The DSCon allows for a quantitative measure of the spatial context's role using three Spatial Context Measures: $SCM_{features}$, $SCM_{targets}$,…
▽ More
We propose 'Deep spatial context' (DSCon) method, which serves for investigation of the attention-based vision models using the concept of spatial context. It was inspired by histopathologists, however, the method can be applied to various domains. The DSCon allows for a quantitative measure of the spatial context's role using three Spatial Context Measures: $SCM_{features}$, $SCM_{targets}$, $SCM_{residuals}$ to distinguish whether the spatial context is observable within the features of neighboring regions, their target values (attention scores) or residuals, respectively. It is achieved by integrating spatial regression into the pipeline. The DSCon helps to verify research questions. The experiments reveal that spatial relationships are much bigger in the case of the classification of tumor lesions than normal tissues. Moreover, it turns out that the larger the size of the neighborhood taken into account within spatial regression, the less valuable contextual information is. Furthermore, it is observed that the spatial context measure is the largest when considered within the feature space as opposed to the targets and residuals.
△ Less
Submitted 10 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Big Tech influence over AI research revisited: memetic analysis of attribution of ideas to affiliation
Authors:
Stanisław Giziński,
Paulina Kaczyńska,
Hubert Ruczyński,
Emilia Wiśnios,
Bartosz Pieliński,
Przemysław Biecek,
Julian Sienkiewicz
Abstract:
There exists a growing discourse around the domination of Big Tech on the landscape of artificial intelligence (AI) research, yet our comprehension of this phenomenon remains cursory. This paper aims to broaden and deepen our understanding of Big Tech's reach and power within AI research. It highlights the dominance not merely in terms of sheer publication volume but rather in the propagation of n…
▽ More
There exists a growing discourse around the domination of Big Tech on the landscape of artificial intelligence (AI) research, yet our comprehension of this phenomenon remains cursory. This paper aims to broaden and deepen our understanding of Big Tech's reach and power within AI research. It highlights the dominance not merely in terms of sheer publication volume but rather in the propagation of new ideas or \textit{memes}. Current studies often oversimplify the concept of influence to the share of affiliations in academic papers, typically sourced from limited databases such as arXiv or specific academic conferences.
The main goal of this paper is to unravel the specific nuances of such influence, determining which AI ideas are predominantly driven by Big Tech entities. By employing network and memetic analysis on AI-oriented paper abstracts and their citation network, we are able to grasp a deeper insight into this phenomenon. By utilizing two databases: OpenAlex and S2ORC, we are able to perform such analysis on a much bigger scale than previous attempts.
Our findings suggest, that while Big Tech-affiliated papers are disproportionately more cited in some areas, the most cited papers are those affiliated with both Big Tech and Academia. Focusing on the most contagious memes, their attribution to specific affiliation groups (Big Tech, Academia, mixed affiliation) seems to be equally distributed between those three groups. This suggests that the notion of Big Tech domination over AI research is oversimplified in the discourse.
Ultimately, this more nuanced understanding of Big Tech's and Academia's influence could inform a more symbiotic alliance between these stakeholders which would better serve the dual goals of societal welfare and the scientific integrity of AI research.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Be Careful When Evaluating Explanations Regarding Ground Truth
Authors:
Hubert Baniecki,
Maciej Chrabaszcz,
Andreas Holzinger,
Bastian Pfeifer,
Anna Saranti,
Przemyslaw Biecek
Abstract:
Evaluating explanations of image classifiers regarding ground truth, e.g. segmentation masks defined by human perception, primarily evaluates the quality of the models under consideration rather than the explanation methods themselves. Driven by this observation, we propose a framework for $\textit{jointly}$ evaluating the robustness of safety-critical systems that $\textit{combine}$ a deep neural…
▽ More
Evaluating explanations of image classifiers regarding ground truth, e.g. segmentation masks defined by human perception, primarily evaluates the quality of the models under consideration rather than the explanation methods themselves. Driven by this observation, we propose a framework for $\textit{jointly}$ evaluating the robustness of safety-critical systems that $\textit{combine}$ a deep neural network with an explanation method. These are increasingly used in real-world applications like medical image analysis or robotics. We introduce a fine-tuning procedure to (mis)align model$\unicode{x2013}$explanation pipelines with ground truth and use it to quantify the potential discrepancy between worst and best-case scenarios of human alignment. Experiments across various model architectures and post-hoc local interpretation methods provide insights into the robustness of vision transformers and the overall vulnerability of such AI systems to potential adversarial attacks.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
survex: an R package for explaining machine learning survival models
Authors:
Mikołaj Spytek,
Mateusz Krzyziński,
Sophie Hanna Langbein,
Hubert Baniecki,
Marvin N. Wright,
Przemysław Biecek
Abstract:
Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explai…
▽ More
Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explaining any survival model by applying explainable artificial intelligence techniques. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, transparency and responsibility may be promoted in sensitive areas, such as biomedical research and healthcare applications.
△ Less
Submitted 21 November, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Glocal Explanations of Expected Goal Models in Soccer
Authors:
Mustafa Cavus,
Adrian Stando,
Przemyslaw Biecek
Abstract:
The expected goal models have gained popularity, but their interpretability is often limited, especially when trained using black-box methods. Explainable artificial intelligence tools have emerged to enhance model transparency and extract descriptive knowledge for a single observation or for all observations. However, explaining black-box models for a specific group of observations may be more us…
▽ More
The expected goal models have gained popularity, but their interpretability is often limited, especially when trained using black-box methods. Explainable artificial intelligence tools have emerged to enhance model transparency and extract descriptive knowledge for a single observation or for all observations. However, explaining black-box models for a specific group of observations may be more useful in some domains. This paper introduces the glocal explanations (between local and global levels) of the expected goal models to enable performance analysis at the team and player levels by proposing the use of aggregated versions of the SHAP values and partial dependence profiles. This allows knowledge to be extracted from the expected goal model for a player or team rather than just a single shot. In addition, we conducted real-data applications to illustrate the usefulness of aggregated SHAP and aggregated profiles. The paper concludes with remarks on the potential of these explanations for performance analysis in soccer analytics.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data
Authors:
Katarzyna Kobylińska,
Mateusz Krzyziński,
Rafał Machowicz,
Mariusz Adamek,
Przemysław Biecek
Abstract:
The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on a single model can r…
▽ More
The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on a single model can result in misleading or incomplete conclusions. This problem is particularly pertinent when dealing with a set of models known as $\textit{Rashomon set}$, with performance close to maximum one. Such a set can be numerous and may contain models describing the data in a different way, which calls for comprehensive analysis. This paper introduces a novel process to explore models in the Rashomon set, extending the conventional modeling approach. We propose the $\texttt{Rashomon_DETECT}$ algorithm to detect models with different behavior. It is based on recent developments in the eXplainable Artificial Intelligence (XAI) field. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis. To illustrate the effectiveness of our approach, we showcase its application in predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients - a foundational case study. Additionally, we benchmark our approach on other medical data sets, demonstrating its versatility and utility in various contexts. If differently behaving models are detected in the Rashomon set, their combined analysis leads to more trustworthy conclusions, which is of vital importance for high-stakes applications such as medical applications.
△ Less
Submitted 18 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Multi-task learning for classification, segmentation, reconstruction, and detection on chest CT scans
Authors:
Weronika Hryniewska-Guzik,
Maria Kędzierska,
Przemysław Biecek
Abstract:
Lung cancer and covid-19 have one of the highest morbidity and mortality rates in the world. For physicians, the identification of lesions is difficult in the early stages of the disease and time-consuming. Therefore, multi-task learning is an approach to extracting important features, such as lesions, from small amounts of medical data because it learns to generalize better. We propose a novel mu…
▽ More
Lung cancer and covid-19 have one of the highest morbidity and mortality rates in the world. For physicians, the identification of lesions is difficult in the early stages of the disease and time-consuming. Therefore, multi-task learning is an approach to extracting important features, such as lesions, from small amounts of medical data because it learns to generalize better. We propose a novel multi-task framework for classification, segmentation, reconstruction, and detection. To the best of our knowledge, we are the first ones who added detection to the multi-task solution. Additionally, we checked the possibility of using two different backbones and different loss functions in the segmentation task.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Explaining and visualizing black-box models through counterfactual paths
Authors:
Bastian Pfeifer,
Mateusz Krzyzinski,
Hubert Baniecki,
Anna Saranti,
Andreas Holzinger,
Przemyslaw Biecek
Abstract:
Explainable AI (XAI) is an increasingly important area of machine learning research, which aims to make black-box models transparent and interpretable. In this paper, we propose a novel approach to XAI that uses the so-called counterfactual paths generated by conditional permutations of features. The algorithm measures feature importance by identifying sequential permutations of features that most…
▽ More
Explainable AI (XAI) is an increasingly important area of machine learning research, which aims to make black-box models transparent and interpretable. In this paper, we propose a novel approach to XAI that uses the so-called counterfactual paths generated by conditional permutations of features. The algorithm measures feature importance by identifying sequential permutations of features that most influence changes in model predictions. It is particularly suitable for generating explanations based on counterfactual paths in knowledge graphs incorporating domain knowledge. Counterfactual paths introduce an additional graph dimension to current XAI methods in both explaining and visualizing black-box models. Experiments with synthetic and medical data demonstrate the practical applicability of our approach.
△ Less
Submitted 1 August, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems
Authors:
Adrian Stando,
Mustafa Cavus,
Przemysław Biecek
Abstract:
Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To ca…
▽ More
Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
SeFNet: Bridging Tabular Datasets with Semantic Feature Nets
Authors:
Katarzyna Woźnica,
Piotr Wilczyński,
Przemysław Biecek
Abstract:
Machine learning applications cover a wide range of predictive tasks in which tabular datasets play a significant role. However, although they often address similar problems, tabular datasets are typically treated as standalone tasks. The possibilities of using previously solved problems are limited due to the lack of structured contextual information about their features and the lack of understan…
▽ More
Machine learning applications cover a wide range of predictive tasks in which tabular datasets play a significant role. However, although they often address similar problems, tabular datasets are typically treated as standalone tasks. The possibilities of using previously solved problems are limited due to the lack of structured contextual information about their features and the lack of understanding of the relations between them. To overcome this limitation, we propose a new approach called Semantic Feature Net (SeFNet), capturing the semantic meaning of the analyzed tabular features. By leveraging existing ontologies and domain knowledge, SeFNet opens up new opportunities for sharing insights between diverse predictive tasks. One such opportunity is the Dataset Ontology-based Semantic Similarity (DOSS) measure, which quantifies the similarity between datasets using relations across their features. In this paper, we present an example of SeFNet prepared for a collection of predictive tasks in healthcare, with the features' relations derived from the SNOMED-CT ontology. The proposed SeFNet framework and the accompanying DOSS measure address the issue of limited contextual information in tabular datasets. By incorporating domain knowledge and establishing semantic relations between features, we enhance the potential for meta-learning and enable valuable insights to be shared across different predictive tasks.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Adversarial attacks and defenses in explainable artificial intelligence: A survey
Authors:
Hubert Baniecki,
Przemyslaw Biecek
Abstract:
Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The po…
▽ More
Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.
△ Less
Submitted 13 February, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Prevention is better than cure: a case study of the abnormalities detection in the chest
Authors:
Weronika Hryniewska,
Piotr Czarnecki,
Jakub Wiśniewski,
Przemysław Bombiński,
Przemysław Biecek
Abstract:
Prevention is better than cure. This old truth applies not only to the prevention of diseases but also to the prevention of issues with AI models used in medicine. The source of malfunctioning of predictive models often lies not in the training process but reaches the data acquisition phase or design of the experiment phase.
In this paper, we analyze in detail a single use case - a Kaggle compet…
▽ More
Prevention is better than cure. This old truth applies not only to the prevention of diseases but also to the prevention of issues with AI models used in medicine. The source of malfunctioning of predictive models often lies not in the training process but reaches the data acquisition phase or design of the experiment phase.
In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Errors made at the data collection stage make it difficult to validate the model correctly.
Based on this use case, we show how to monitor data and model balance (fairness) throughout the life cycle of a predictive model, from data acquisition to parity analysis of model scores.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Towards Evaluating Explanations of Vision Transformers for Medical Imaging
Authors:
Piotr Komorowski,
Hubert Baniecki,
Przemysław Biecek
Abstract:
As deep learning models increasingly find applications in critical domains such as medical imaging, the need for transparent and trustworthy decision-making becomes paramount. Many explainability methods provide insights into how these models make predictions by attributing importance to input features. As Vision Transformer (ViT) becomes a promising alternative to convolutional neural networks fo…
▽ More
As deep learning models increasingly find applications in critical domains such as medical imaging, the need for transparent and trustworthy decision-making becomes paramount. Many explainability methods provide insights into how these models make predictions by attributing importance to input features. As Vision Transformer (ViT) becomes a promising alternative to convolutional neural networks for image classification, its interpretability remains an open research question. This paper investigates the performance of various interpretation methods on a ViT applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Our findings provide insights into the applicability of ViT explanations in medical imaging and highlight the importance of using appropriate evaluation criteria for comparing them.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Interpretable machine learning for time-to-event prediction in medicine and healthcare
Authors:
Hubert Baniecki,
Bartlomiej Sobieski,
Patryk Szatkowski,
Przemyslaw Bombinski,
Przemyslaw Biecek
Abstract:
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature im…
▽ More
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We hope the contributed open data and code resources facilitate future work in the emerging research direction of explainable survival analysis.
△ Less
Submitted 27 March, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Challenges facing the explainability of age prediction models: case study for two modalities
Authors:
Mikolaj Spytek,
Weronika Hryniewska-Guzik,
Jaroslaw Zygierewicz,
Jacek Rogala,
Przemyslaw Biecek
Abstract:
The prediction of age is a challenging task with various practical applications in high-impact fields like the healthcare domain or criminology.
Despite the growing number of models and their increasing performance, we still know little about how these models work. Numerous examples of failures of AI systems show that performance alone is insufficient, thus, new methods are needed to explore and…
▽ More
The prediction of age is a challenging task with various practical applications in high-impact fields like the healthcare domain or criminology.
Despite the growing number of models and their increasing performance, we still know little about how these models work. Numerous examples of failures of AI systems show that performance alone is insufficient, thus, new methods are needed to explore and explain the reasons for the model's predictions.
In this paper, we investigate the use of Explainable Artificial Intelligence (XAI) for age prediction focusing on two specific modalities, EEG signal and lung X-rays. We share predictive models for age to facilitate further research on new techniques to explain models for these modalities.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Performance is not enough: the story told by a Rashomon quartet
Authors:
Przemyslaw Biecek,
Hubert Baniecki,
Mateusz Krzyzinski,
Dianne Cook
Abstract:
The usual goal of supervised learning is to find the best model, the one that optimizes a particular performance measure. However, what if the explanation provided by this model is completely different from another model and different again from another model despite all having similarly good fit statistics? Is it possible that the equally effective models put the spotlight on different relationsh…
▽ More
The usual goal of supervised learning is to find the best model, the one that optimizes a particular performance measure. However, what if the explanation provided by this model is completely different from another model and different again from another model despite all having similarly good fit statistics? Is it possible that the equally effective models put the spotlight on different relationships in the data? Inspired by Anscombe's quartet, this paper introduces a Rashomon Quartet, i.e. a set of four models built on a synthetic dataset which have practically identical predictive performance. However, the visual exploration reveals distinct explanations of the relations in the data. This illustrative example aims to encourage the use of methods for model visualization to compare predictive models beyond their performance.
△ Less
Submitted 11 April, 2024; v1 submitted 26 February, 2023;
originally announced February 2023.
-
HADES: Homologous Automated Document Exploration and Summarization
Authors:
Piotr Wilczyński,
Artur Żółkowski,
Mateusz Krzyziński,
Emilia Wiśnios,
Bartosz Pieliński,
Stanisław Giziński,
Julian Sienkiewicz,
Przemysław Biecek
Abstract:
This paper introduces HADES, a novel tool for automatic comparative documents with similar structures. HADES is designed to streamline the work of professionals dealing with large volumes of documents, such as policy documents, legal acts, and scientific papers. The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the…
▽ More
This paper introduces HADES, a novel tool for automatic comparative documents with similar structures. HADES is designed to streamline the work of professionals dealing with large volumes of documents, such as policy documents, legal acts, and scientific papers. The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the most important words for each topic. The process concludes with an interactive web app with visualizations that facilitate the comparison of the documents. HADES has the potential to significantly improve the productivity of professionals dealing with high volumes of documents, reducing the time and effort required to complete tasks related to comparative document analysis. Our package is publically available on GitHub.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Climate Policy Tracker: Pipeline for automated analysis of public climate policies
Authors:
Artur Żółkowski,
Mateusz Krzyziński,
Piotr Wilczyński,
Stanisław Giziński,
Emilia Wiśnios,
Bartosz Pieliński,
Julian Sienkiewicz,
Przemysław Biecek
Abstract:
The number of standardized policy documents regarding climate policy and their publication frequency is significantly increasing. The documents are long and tedious for manual analysis, especially for policy experts, lawmakers, and citizens who lack access or domain expertise to utilize data analytics tools. Potential consequences of such a situation include reduced citizen governance and involvem…
▽ More
The number of standardized policy documents regarding climate policy and their publication frequency is significantly increasing. The documents are long and tedious for manual analysis, especially for policy experts, lawmakers, and citizens who lack access or domain expertise to utilize data analytics tools. Potential consequences of such a situation include reduced citizen governance and involvement in climate policies and an overall surge in analytics costs, rendering less accessibility for the public. In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. The methods leverage topic modeling and clustering for the comparative analysis of policy documents across different countries. It allows for easier integration in potential user-friendly applications for the development of theories and processes of climate policy. This would further lead to better citizen governance and engagement over climate policies and public policy research.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
SurvSHAP(t): Time-dependent explanations of machine learning survival models
Authors:
Mateusz Krzyziński,
Mikołaj Spytek,
Hubert Baniecki,
Przemysław Biecek
Abstract:
Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the…
▽ More
Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at http://github.com/MI2DataLab/survshap.
△ Less
Submitted 7 September, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Performance, Opaqueness, Consequences, and Assumptions: Simple questions for responsible planning of machine learning solutions
Authors:
Przemyslaw Biecek
Abstract:
The data revolution has generated a huge demand for data-driven solutions. This demand propels a growing number of easy-to-use tools and training for aspiring data scientists that enable the rapid building of predictive models. Today, weapons of math destruction can be easily built and deployed without detailed planning and validation. This rapidly extends the list of AI failures, i.e. deployments…
▽ More
The data revolution has generated a huge demand for data-driven solutions. This demand propels a growing number of easy-to-use tools and training for aspiring data scientists that enable the rapid building of predictive models. Today, weapons of math destruction can be easily built and deployed without detailed planning and validation. This rapidly extends the list of AI failures, i.e. deployments that lead to financial losses or even violate democratic values such as equality, freedom and justice. The lack of planning, rules and standards around the model development leads to the ,,anarchisation of AI". This problem is reported under different names such as validation debt, reproducibility crisis, and lack of explainability. Post-mortem analysis of AI failures often reveals mistakes made in the early phase of model development or data acquisition. Thus, instead of curing the consequences of deploying harmful models, we shall prevent them as early as possible by putting more attention to the initial planning stage.
In this paper, we propose a quick and simple framework to support planning of AI solutions. The POCA framework is based on four pillars: Performance, Opaqueness, Consequences, and Assumptions. It helps to set the expectations and plan the constraints for the AI solution before any model is built and any data is collected. With the help of the POCA method, preliminary requirements can be defined for the model-building process, so that costly model misspecification errors can be identified as soon as possible or even avoided. AI researchers, product owners and business analysts can use this framework in the initial stages of building AI solutions.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
Explainable expected goal models for performance analysis in football analytics
Authors:
Mustafa Cavus,
Przemysław Biecek
Abstract:
The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of score in modern football. The score of a match involves randomness and often may not represent the performance of the teams and players, therefore it has been popular to use the alternative statistics in recent years such as shots on target, ball…
▽ More
The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of score in modern football. The score of a match involves randomness and often may not represent the performance of the teams and players, therefore it has been popular to use the alternative statistics in recent years such as shots on target, ball possessions, and drills. To measure the probability of a shot being a goal by the expected goal, several features are used to train an expected goal model which is based on the event and tracking football data. The selection of these features, the size and date of the data, and the model which are used as the parameters that may affect the performance of the model. Using black-box machine learning models for increasing the predictive performance of the model decreases its interpretability that causes the loss of information that can be gathered from the model. This paper proposes an accurate expected goal model trained consisting of 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. Moreover, this model is explained by using explainable artificial intelligence tool to obtain an explainable expected goal model for evaluating a team or player performance. To the best of our knowledge, this is the first paper that demonstrates a practical application of an explainable artificial intelligence tool aggregated profiles to explain a group of observations on an accurate expected goal model for monitoring the team and player performance. Moreover, these methods can be generalized to other sports branches.
△ Less
Submitted 6 September, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
Authors:
Nicholas Spyrison,
Dianne Cook,
Przemyslaw Biecek
Abstract:
The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to a…
▽ More
The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.
△ Less
Submitted 18 January, 2024; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Consolidated learning -- a domain-specific model-free optimization strategy with examples for XGBoost and MIMIC-IV
Authors:
Katarzyna Woźnica,
Mateusz Grzyb,
Zuzanna Trafas,
Przemysław Biecek
Abstract:
For many machine learning models, a choice of hyperparameters is a crucial step towards achieving high performance. Prevalent meta-learning approaches focus on obtaining good hyperparameters configurations with a limited computational budget for a completely new task based on the results obtained from the prior tasks. This paper proposes a new formulation of the tuning problem, called consolidated…
▽ More
For many machine learning models, a choice of hyperparameters is a crucial step towards achieving high performance. Prevalent meta-learning approaches focus on obtaining good hyperparameters configurations with a limited computational budget for a completely new task based on the results obtained from the prior tasks. This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar data sets. In such settings, we are interested in the total optimization time rather than tuning for a single task. We show that a carefully selected static portfolio of hyperparameters yields good results for anytime optimization, maintaining ease of use and implementation. Moreover, we point out how to construct such a portfolio for specific domains. The improvement in the optimization is possible due to more efficient transfer of hyperparameter configurations between similar tasks. We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database; however, consolidated learning is applicable in many others fields.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
LIMEcraft: Handcrafted superpixel selection and inspection for Visual eXplanations
Authors:
Weronika Hryniewska,
Adrianna Grudzień,
Przemysław Biecek
Abstract:
The increased interest in deep learning applications, and their hard-to-detect biases result in the need to validate and explain complex models. However, current explanation methods are limited as far as both the explanation of the reasoning process and prediction results are concerned. They usually only show the location in the image that was important for model prediction. The lack of possibilit…
▽ More
The increased interest in deep learning applications, and their hard-to-detect biases result in the need to validate and explain complex models. However, current explanation methods are limited as far as both the explanation of the reasoning process and prediction results are concerned. They usually only show the location in the image that was important for model prediction. The lack of possibility to interact with explanations makes it difficult to verify and understand exactly how the model works. This creates a significant risk when using the model. The risk is compounded by the fact that explanations do not take into account the semantic meaning of the explained objects. To escape from the trap of static and meaningless explanations, we propose a tool and a process called LIMEcraft. LIMEcraft enhances the process of explanation by allowing a user to interactively select semantically consistent areas and thoroughly examine the prediction for the image instance in case of many image features. Experiments on several models show that our tool improves model safety by inspecting model fairness for image pieces that may indicate model bias. The code is available at: http://github.com/MI2DataLab/LIMEcraft
△ Less
Submitted 30 April, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Graph-guided random forest for gene set selection
Authors:
Bastian Pfeifer,
Hubert Baniecki,
Anna Saranti,
Przemyslaw Biecek,
Andreas Holzinger
Abstract:
Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research…
▽ More
Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.
△ Less
Submitted 6 September, 2022; v1 submitted 26 August, 2021;
originally announced August 2021.
-
MAIR: Framework for mining relationships between research articles, strategies, and regulations in the field of explainable artificial intelligence
Authors:
Stanisław Gizinski,
Michał Kuzba,
Bartosz Pielinski,
Julian Sienkiewicz,
Stanisław Łaniewski,
Przemysław Biecek
Abstract:
The growing number of AI applications, also for high-stake decisions, increases the interest in Explainable and Interpretable Machine Learning (XI-ML). This trend can be seen both in the increasing number of regulations and strategies for develo** trustworthy AI and the growing number of scientific papers dedicated to this topic. To ensure the sustainable development of AI, it is essential to un…
▽ More
The growing number of AI applications, also for high-stake decisions, increases the interest in Explainable and Interpretable Machine Learning (XI-ML). This trend can be seen both in the increasing number of regulations and strategies for develo** trustworthy AI and the growing number of scientific papers dedicated to this topic. To ensure the sustainable development of AI, it is essential to understand the dynamics of the impact of regulation on research papers as well as the impact of scientific discourse on AI-related policies. This paper introduces a novel framework for joint analysis of AI-related policy documents and eXplainable Artificial Intelligence (XAI) research papers. The collected documents are enriched with metadata and interconnections, using various NLP methods combined with a methodology inspired by Institutional Grammar. Based on the information extracted from collected documents, we showcase a series of analyses that help understand interactions, similarities, and differences between documents at different stages of institutionalization. To the best of our knowledge, this is the first work to use automatic language analysis tools to understand the dynamics between XI-ML methods and regulations. We believe that such a system contributes to better cooperation between XAI researchers and AI policymakers.
△ Less
Submitted 29 July, 2021;
originally announced August 2021.
-
Do not explain without context: addressing the blind spot of model explanations
Authors:
Katarzyna Woźnica,
Katarzyna Pękala,
Hubert Baniecki,
Wojciech Kretowicz,
Elżbieta Sienkiewicz,
Przemysław Biecek
Abstract:
The increasing number of regulations and expectations of predictive machine learning models, such as so called right to explanation, has led to a large number of methods promising greater interpretability. High demand has led to a widespread adoption of XAI techniques like Shapley values, Partial Dependence profiles or permutational variable importance. However, we still do not know enough about t…
▽ More
The increasing number of regulations and expectations of predictive machine learning models, such as so called right to explanation, has led to a large number of methods promising greater interpretability. High demand has led to a widespread adoption of XAI techniques like Shapley values, Partial Dependence profiles or permutational variable importance. However, we still do not know enough about their properties and how they manifest in the context in which explanations are created by analysts, reviewed by auditors, and interpreted by various stakeholders. This paper highlights a blind spot which, although critical, is often overlooked when monitoring and auditing machine learning models: the effect of the reference data on the explanation calculation. We discuss that many model explanations depend directly or indirectly on the choice of the referenced data distribution. We showcase examples where small changes in the distribution lead to drastic changes in the explanations, such as a change in trend or, alarmingly, a conclusion. Consequently, we postulate that obtaining robust and useful explanations always requires supporting them with a broader context.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Fooling Partial Dependence via Data Poisoning
Authors:
Hubert Baniecki,
Wojciech Kretowicz,
Przemyslaw Biecek
Abstract:
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trai…
▽ More
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.
△ Less
Submitted 10 July, 2022; v1 submitted 26 May, 2021;
originally announced May 2021.
-
Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts
Authors:
Tomasz Stanisławek,
Filip Graliński,
Anna Wróblewska,
Dawid Lipiński,
Agnieszka Kaliska,
Paulina Rosalska,
Bartosz Topolski,
Przemysław Biecek
Abstract:
The relevance of the Key Information Extraction (KIE) task is increasingly important in natural language processing problems. But there are still only a few well-defined problems that serve as benchmarks for solutions in this area. To bridge this gap, we introduce two new datasets (Kleister NDA and Kleister Charity). They involve a mix of scanned and born-digital long formal English-language docum…
▽ More
The relevance of the Key Information Extraction (KIE) task is increasingly important in natural language processing problems. But there are still only a few well-defined problems that serve as benchmarks for solutions in this area. To bridge this gap, we introduce two new datasets (Kleister NDA and Kleister Charity). They involve a mix of scanned and born-digital long formal English-language documents. In these datasets, an NLP system is expected to find or infer various types of entities by employing both textual and structural layout features. The Kleister Charity dataset consists of 2,788 annual financial reports of charity organizations, with 61,643 unique pages and 21,612 entities to extract. The Kleister NDA dataset has 540 Non-disclosure Agreements, with 3,229 unique pages and 2,160 entities to extract. We provide several state-of-the-art baseline systems from the KIE domain (Flair, BERT, RoBERTa, LayoutLM, LAMBERT), which show that our datasets pose a strong challenge to existing models. The best model achieved an 81.77% and an 83.57% F1-score on respectively the Kleister NDA and the Kleister Charity datasets. We share the datasets to encourage progress on more in-depth and complex information extraction tasks.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Enabling Machine Learning Algorithms for Credit Scoring -- Explainable Artificial Intelligence (XAI) methods for clear understanding complex predictive models
Authors:
Przemysław Biecek,
Marcin Chlebus,
Janusz Gajda,
Alicja Gosiewska,
Anna Kozak,
Dominik Ogonowski,
Jakub Sztachelski,
Piotr Wojewnik
Abstract:
Rapid development of advanced modelling techniques gives an opportunity to develop tools that are more and more accurate. However as usually, everything comes with a price and in this case, the price to pay is to loose interpretability of a model while gaining on its accuracy and precision. For managers to control and effectively manage credit risk and for regulators to be convinced with model qua…
▽ More
Rapid development of advanced modelling techniques gives an opportunity to develop tools that are more and more accurate. However as usually, everything comes with a price and in this case, the price to pay is to loose interpretability of a model while gaining on its accuracy and precision. For managers to control and effectively manage credit risk and for regulators to be convinced with model quality the price to pay is too high. In this paper, we show how to take credit scoring analytics in to the next level, namely we present comparison of various predictive models (logistic regression, logistic regression with weight of evidence transformations and modern artificial intelligence algorithms) and show that advanced tree based models give best results in prediction of client default. What is even more important and valuable we also show how to boost advanced models using techniques which allow to interpret them and made them more accessible for credit risk practitioners, resolving the crucial obstacle in widespread deployment of more complex, 'black box' models like random forests, gradient boosted or extreme gradient boosted trees. All this will be shown on the large dataset obtained from the Polish Credit Bureau to which all the banks and most of the lending companies in the country do report the credit files. In this paper the data from lending companies were used. The paper then compares state of the art best practices in credit risk modelling with new advanced modern statistical tools boosted by the latest developments in the field of interpretability and explainability of artificial intelligence algorithms. We believe that this is a valuable contribution when it comes to presentation of different modelling tools but what is even more important it is showing which methods might be used to get insight and understanding of AI methods in credit risk context.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure
Authors:
Katarzyna Pekala,
Katarzyna Woznica,
Przemyslaw Biecek
Abstract:
One of the key elements of explanatory analysis of a predictive model is to assess the importance of individual variables. Rapid development of the area of predictive model exploration (also called explainable artificial intelligence or interpretable machine learning) has led to the popularization of methods for local (instance level) and global (dataset level) methods, such as Permutational Varia…
▽ More
One of the key elements of explanatory analysis of a predictive model is to assess the importance of individual variables. Rapid development of the area of predictive model exploration (also called explainable artificial intelligence or interpretable machine learning) has led to the popularization of methods for local (instance level) and global (dataset level) methods, such as Permutational Variable Importance, Shapley Values (SHAP), Local Interpretable Model Explanations (LIME), Break Down and so on. However, these methods do not use information about the correlation between features which significantly reduce the explainability of the model behaviour. In this work, we propose new methods to support model analysis by exploiting the information about the correlation between variables. The dataset level aspect importance measure is inspired by the block permutations procedure, while the instance level aspect importance measure is inspired by the LIME method. We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically based on the hierarchical structure of correlations between variables. Additionally, we present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grou** to produce a high information density model visualisation. This visualisation provides a consistent illustration for either local or global model and data exploration. We also show an example of real-world data with 5k instances and 37 features in which a significant correlation between variables affects the interpretation of the effect of variable importance. The proposed method is, to our knowledge, the first to allow direct use of the correlation between variables in exploratory model analysis.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
fairmodels: A Flexible Tool For Bias Detection, Visualization, And Mitigation
Authors:
Jakub Wiśniewski,
Przemysław Biecek
Abstract:
Machine learning decision systems are getting omnipresent in our lives. From dating apps to rating loan seekers, algorithms affect both our well-being and future. Typically, however, these systems are not infallible. Moreover, complex predictive models are really eager to learn social biases present in historical data that can lead to increasing discrimination. If we want to create models responsi…
▽ More
Machine learning decision systems are getting omnipresent in our lives. From dating apps to rating loan seekers, algorithms affect both our well-being and future. Typically, however, these systems are not infallible. Moreover, complex predictive models are really eager to learn social biases present in historical data that can lead to increasing discrimination. If we want to create models responsibly then we need tools for in-depth validation of models also from the perspective of potential discrimination. This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in classification models in an easy and flexible fashion. The fairmodels package offers a model-agnostic approach to bias detection, visualization and mitigation. The implemented set of functions and fairness metrics enables model fairness validation from different perspectives. The package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed not only to examine a single model, but also to facilitate comparisons between multiple models.
△ Less
Submitted 11 February, 2022; v1 submitted 1 April, 2021;
originally announced April 2021.
-
dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python
Authors:
Hubert Baniecki,
Wojciech Kretowicz,
Piotr Piatyszek,
Jakub Wisniewski,
Przemyslaw Biecek
Abstract:
The increasing amount of available data, computing power, and the constant pursuit for higher performance results in the growing complexity of predictive models. Their black-box nature leads to opaqueness debt phenomenon inflicting increased risks of discrimination, lack of reproducibility, and deflated performance due to data drift. To manage these risks, good MLOps practices ask for better valid…
▽ More
The increasing amount of available data, computing power, and the constant pursuit for higher performance results in the growing complexity of predictive models. Their black-box nature leads to opaqueness debt phenomenon inflicting increased risks of discrimination, lack of reproducibility, and deflated performance due to data drift. To manage these risks, good MLOps practices ask for better validation of model performance and fairness, higher explainability, and continuous monitoring. The necessity of deeper model transparency appears not only from scientific and social domains, but also emerging laws and regulations on artificial intelligence. To facilitate the development of responsible machine learning models, we showcase dalex, a Python package which implements the model-agnostic interface for interactive model exploration. It adopts the design crafted through the development of various tools for responsible machine learning; thus, it aims at the unification of the existing solutions. This library's source code and documentation are available under open license at https://python.drwhy.ai/.
△ Less
Submitted 11 October, 2021; v1 submitted 28 December, 2020;
originally announced December 2020.
-
Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies
Authors:
Weronika Hryniewska,
Przemysław Bombiński,
Patryk Szatkowski,
Paulina Tomaszewska,
Artur Przelaskowski,
Przemysław Biecek
Abstract:
The sudden outbreak and uncontrolled spread of COVID-19 disease is one of the most important global problems today. In a short period of time, it has led to the development of many deep neural network models for COVID-19 detection with modules for explainability. In this work, we carry out a systematic analysis of various aspects of proposed models. Our analysis revealed numerous mistakes made at…
▽ More
The sudden outbreak and uncontrolled spread of COVID-19 disease is one of the most important global problems today. In a short period of time, it has led to the development of many deep neural network models for COVID-19 detection with modules for explainability. In this work, we carry out a systematic analysis of various aspects of proposed models. Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. We present the perspective of both: experts in the field - radiologists and deep learning engineers dealing with model explanations. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model.
△ Less
Submitted 23 April, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring
Authors:
Michael Bücker,
Gero Szepannek,
Alicja Gosiewska,
Przemyslaw Biecek
Abstract:
A major requirement for credit scoring models is to provide a maximally accurate risk prediction. Additionally, regulators demand these models to be transparent and auditable. Thus, in credit scoring, very simple predictive models such as logistic regression or decision trees are still widely used and the superior predictive power of modern machine learning algorithms cannot be fully leveraged. Si…
▽ More
A major requirement for credit scoring models is to provide a maximally accurate risk prediction. Additionally, regulators demand these models to be transparent and auditable. Thus, in credit scoring, very simple predictive models such as logistic regression or decision trees are still widely used and the superior predictive power of modern machine learning algorithms cannot be fully leveraged. Significant potential is therefore missed, leading to higher reserves or more credit defaults. This paper works out different dimensions that have to be considered for making credit scoring models understandable and presents a framework for making ``black box'' machine learning models transparent, auditable and explainable. Following this framework, we present an overview of techniques, demonstrate how they can be applied in credit scoring and how results compare to the interpretability of score cards. A real world case study shows that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.