Search | arXiv e-print repository

CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models

Authors: Juan Miguel Lopez Alcaraz, Nils Strodthoff

Abstract: Despite the excelling performance of machine learning models, understanding the decisions of machine learning models remains a long-standing goal. While commonly used attribution methods in explainable AI attempt to address this issue, they typically rely on associational rather than causal relationships. In this study, within the context of time series classification, we introduce a novel framewo… ▽ More Despite the excelling performance of machine learning models, understanding the decisions of machine learning models remains a long-standing goal. While commonly used attribution methods in explainable AI attempt to address this issue, they typically rely on associational rather than causal relationships. In this study, within the context of time series classification, we introduce a novel framework to assess the causal effect of concepts, i.e., predefined segments within a time series, on specific classification outcomes. To achieve this, we leverage state-of-the-art diffusion-based generative models to estimate counterfactual outcomes. Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically. We demonstrate the insights gained by our approach for a diverse set of qualitatively different time series classification tasks. Although causal and associational attributions might often share some similarities, in all cases they differ in important details, underscoring the risks associated with drawing causal conclusions from associational data alone. We believe that the proposed approach is widely applicable also in other domains, particularly where predefined segmentations are available, to shed some light on the limits of associational attributions. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures. Source code under https://github.com/AI4HealthUOL/CausalConceptTS

arXiv:2402.17779 [pdf, other]

Assessing the importance of long-range correlations for deep-learning-based sleep staging

Authors: Tiezhi Wang, Nils Strodthoff

Abstract: This study aims to elucidate the significance of long-range correlations for deep-learning-based sleep staging. It is centered around S4Sleep(TS), a recently proposed model for automated sleep staging. This model utilizes electroencephalography (EEG) as raw time series input and relies on structured state space sequence (S4) models as essential model component. Although the model already surpasses… ▽ More This study aims to elucidate the significance of long-range correlations for deep-learning-based sleep staging. It is centered around S4Sleep(TS), a recently proposed model for automated sleep staging. This model utilizes electroencephalography (EEG) as raw time series input and relies on structured state space sequence (S4) models as essential model component. Although the model already surpasses state-of-the-art methods for a moderate number of 15 input epochs, recent literature results suggest potential benefits from incorporating very long correlations spanning hundreds of input epochs. In this submission, we explore the possibility of achieving further enhancements by systematically scaling up the model's input size, anticipating potential improvements in prediction accuracy. In contrast to findings in literature, our results demonstrate that augmenting the input size does not yield a significant enhancement in the performance of S4Sleep(TS). These findings, coupled with the distinctive ability of S4 models to capture long-range dependencies in time series data, cast doubt on the diagnostic relevance of very long-range interactions for sleep staging. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 3 pages, 1 figure, Accepted at Workshop Biosignals, 28.2.-1.3.2024, Göttingen, Germany

arXiv:2401.06654 [pdf, other]

Decoupling Pixel Flip** and Occlusion Strategy for Consistent XAI Benchmarks

Authors: Stefan Blücher, Johanna Vielhaben, Nils Strodthoff

Abstract: Feature removal is a central building block for eXplainable AI (XAI), both for occlusion-based explanations (Shapley values) as well as their evaluation (pixel flip**, PF). However, occlusion strategies can vary significantly from simple mean replacement up to inpainting with state-of-the-art diffusion models. This ambiguity limits the usefulness of occlusion-based approaches. For example, PF be… ▽ More Feature removal is a central building block for eXplainable AI (XAI), both for occlusion-based explanations (Shapley values) as well as their evaluation (pixel flip**, PF). However, occlusion strategies can vary significantly from simple mean replacement up to inpainting with state-of-the-art diffusion models. This ambiguity limits the usefulness of occlusion-based approaches. For example, PF benchmarks lead to contradicting rankings. This is amplified by competing PF measures: Features are either removed starting with most influential first (MIF) or least influential first (LIF). This study proposes two complementary perspectives to resolve this disagreement problem. Firstly, we address the common criticism of occlusion-based XAI, that artificial samples lead to unreliable model evaluations. We propose to measure the reliability by the R(eference)-Out-of-Model-Scope (OMS) score. The R-OMS score enables a systematic comparison of occlusion strategies and resolves the disagreement problem by grou** consistent PF rankings. Secondly, we show that the insightfulness of MIF and LIF is conversely dependent on the R-OMS score. To leverage this, we combine the MIF and LIF measures into the symmetric relevance gain (SRG) measure. This breaks the inherent connection to the underlying occlusion strategy and leads to consistent rankings. This resolves the disagreement problem, which we verify for a set of 40 different occlusion strategies. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 28 pages, 8 figures

arXiv:2312.11050 [pdf, other]

Prospects for AI-Enhanced ECG as a Unified Screening Tool for Cardiac and Non-Cardiac Conditions -- An Explorative Study in Emergency Care

Authors: Nils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp

Abstract: Current deep learning algorithms designed for automatic ECG analysis have exhibited notable accuracy. However, akin to traditional electrocardiography, they tend to be narrowly focused and typically address a singular diagnostic condition. In this exploratory study, we specifically investigate the capability of a single model to predict a diverse range of both cardiac and non-cardiac discharge dia… ▽ More Current deep learning algorithms designed for automatic ECG analysis have exhibited notable accuracy. However, akin to traditional electrocardiography, they tend to be narrowly focused and typically address a singular diagnostic condition. In this exploratory study, we specifically investigate the capability of a single model to predict a diverse range of both cardiac and non-cardiac discharge diagnoses based on a sole ECG collected in the emergency department. We find that 253, 81 cardiac, and 172 non-cardiac, ICD codes can be reliably predicted in the sense of exceeding an AUROC score of 0.8 in a statistically significant manner. This underscores the model's proficiency in handling a wide array of cardiac and non-cardiac diagnostic scenarios which demonstrates potential as a screening tool for diverse medical encounters. △ Less

Submitted 13 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted version EHJDH. 30 pages, 6 figures, code available under https://github.com/AI4HealthUOL/ECG-MIMIC

arXiv:2310.07463 [pdf, other]

Using explainable AI to investigate electrocardiogram changes during healthy aging -- from expert features to raw signals

Authors: Gabriel Ott, Yannik Schaubelt, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff

Abstract: Cardiovascular diseases remain the leading global cause of mortality. Age is an important covariate whose effect is most easily investigated in a healthy cohort to properly distinguish the former from disease-related changes. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes in individuals as they age. However, these features, while i… ▽ More Cardiovascular diseases remain the leading global cause of mortality. Age is an important covariate whose effect is most easily investigated in a healthy cohort to properly distinguish the former from disease-related changes. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes in individuals as they age. However, these features, while informative, may potentially obscure underlying data relationships. In this paper we present the following contributions: (1) We employ a deep-learning model and a tree-based model to analyze ECG data from a robust dataset of healthy individuals across varying ages in both raw signals and ECG feature format. (2) We use explainable AI methods to identify the most discriminative ECG features across age groups.(3) Our analysis with tree-based classifiers reveals age-related declines in inferred breathing rates and identifies notably high SDANN values as indicative of elderly individuals, distinguishing them from younger adults. (4) Furthermore, the deep-learning model underscores the pivotal role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age. These findings shed new light on age-related ECG changes, offering insights that transcend traditional feature-based approaches. △ Less

Submitted 22 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted version by PLOS ONE. 10 pages, 5 figures, code available under https://github.com/AI4HealthUOL/ECG-aging. Publication under https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0302024

arXiv:2310.06715 [pdf, other]

S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models

Authors: Tiezhi Wang, Nils Strodthoff

Abstract: Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these… ▽ More Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components, leading to statistically significant advancements in performance on the extensive SHHS dataset. These improvements are assessed through both statistical and systematic error estimations. We anticipate that the architectural insights gained from this study will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 11 pages, 1 figure, code available at https://github.com/AI4HealthUOL/s4sleep

arXiv:2309.03631 [pdf, other]

doi 10.1093/bioinformatics/btae031

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

Authors: Markus Wenzel, Erik Grüner, Nils Strodthoff

Abstract: Motivation: We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too… ▽ More Motivation: We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins . △ Less

Submitted 9 February, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: 26 pages, 12 figures, 5 tables, source code available at https://github.com/markuswenzel/xai-proteins

Journal ref: Bioinformatics (2024) btae031

arXiv:2308.15291 [pdf, other]

Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadata

Authors: Temesgen Mehari, Nils Strodthoff

Abstract: Deep learning has emerged as the preferred modeling approach for automatic ECG analysis. In this study, we investigate three elements aimed at improving the quantitative accuracy of such systems. These components consistently enhance performance beyond the existing state-of-the-art, which is predominantly based on convolutional models. Firstly, we explore more expressive architectures by exploitin… ▽ More Deep learning has emerged as the preferred modeling approach for automatic ECG analysis. In this study, we investigate three elements aimed at improving the quantitative accuracy of such systems. These components consistently enhance performance beyond the existing state-of-the-art, which is predominantly based on convolutional models. Firstly, we explore more expressive architectures by exploiting structured state space models (SSMs). These models have shown promise in capturing long-term dependencies in time series data. By incorporating SSMs into our approach, we not only achieve better performance, but also gain insights into long-standing questions in the field. Specifically, for standard diagnostic tasks, we find no advantage in using higher sampling rates such as 500Hz compared to 100Hz. Similarly, extending the input size of the model beyond 3 seconds does not lead to significant improvements. Secondly, we demonstrate that self-supervised learning using contrastive predictive coding can further improve the performance of SSMs. By leveraging self-supervision, we enable the model to learn more robust and representative features, leading to improved analysis accuracy. Lastly, we depart from synthetic benchmarking scenarios and incorporate basic demographic metadata alongside the ECG signal as input. This inclusion of patient metadata departs from the conventional practice of relying solely on the signal itself. Remarkably, this addition consistently yields positive effects on predictive performance. We firmly believe that all three components should be considered when develo** next-generation ECG analysis algorithms. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: extended version of arXiv:2211.07579

arXiv:2305.17043 [pdf, other]

doi 10.1016/j.compbiomed.2024.108525

Explaining Deep Learning for ECG Analysis: Building Blocks for Auditing and Knowledge Discovery

Authors: Patrick Wagner, Temesgen Mehari, Wilhelm Haverkamp, Nils Strodthoff

Abstract: Deep neural networks have become increasingly popular for analyzing ECG data because of their ability to accurately identify cardiac conditions and hidden clinical factors. However, the lack of transparency due to the black box nature of these models is a common concern. To address this issue, explainable AI (XAI) methods can be employed. In this study, we present a comprehensive analysis of post-… ▽ More Deep neural networks have become increasingly popular for analyzing ECG data because of their ability to accurately identify cardiac conditions and hidden clinical factors. However, the lack of transparency due to the black box nature of these models is a common concern. To address this issue, explainable AI (XAI) methods can be employed. In this study, we present a comprehensive analysis of post-hoc XAI methods, investigating the local (attributions per sample) and global (based on domain expert concepts) perspectives. We have established a set of sanity checks to identify sensible attribution methods, and we provide quantitative evidence in accordance with expert rules. This dataset-wide analysis goes beyond anecdotal evidence by aggregating data across patient subgroups. Furthermore, we demonstrate how these XAI techniques can be utilized for knowledge discovery, such as identifying subtypes of myocardial infarction. We believe that these proposed methods can serve as building blocks for a complementary assessment of the internal validity during a certification process, as well as for knowledge discovery in the field of ECG analysis. △ Less

Submitted 2 July, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Journal ref: Computers in Biology and Medicine, Vol. 176, June 2024, 108525

arXiv:2304.02577 [pdf, other]

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Authors: Temesgen Mehari, Ashish Sundar, Alen Bosnjakovic, Peter Harris, Steven E. Williams, Axel Loewe, Olaf Doessel, Claudia Nagel, Nils Strodthoff, Philip J. Aston

Abstract: Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology,… ▽ More Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology, where we try to distinguish three specific pathologies from healthy subjects based on ECG features comparing to features used in cardiologists' decision rules as ground truth. Some methods generally performed well and others performed poorly, while some methods did well on some but not all of the problems considered. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2301.11911 [pdf, other]

Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees

Authors: Johanna Vielhaben, Stefan Blücher, Nils Strodthoff

Abstract: The completeness axiom renders the explanation of a post-hoc XAI method only locally faithful to the model, i.e. for a single decision. For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. Recently, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circum… ▽ More The completeness axiom renders the explanation of a post-hoc XAI method only locally faithful to the model, i.e. for a single decision. For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. Recently, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circumvent this problem, we propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. Our method starts from general linear subspaces as concepts and does neither require reinforcing concept interpretability nor re-training of model parts. We propose sparse subspace clustering to discover improved concepts and fully leverage the potential of multi-dimensional subspaces. MCD offers two complementary analysis tools for concepts in input space: (1) concept activation maps, that show where a concept is expressed within a sample, allowing for concept characterization through prototypical samples, and (2) concept relevance heatmaps, that decompose the model decision into concept contributions. Both tools together enable a detailed understanding of the model reasoning, which is guaranteed to relate to the model via a completeness relation. This paves the way towards more trustworthy concept-based XAI. We empirically demonstrate the superiority of MCD against more constrained concept definitions. △ Less

Submitted 18 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: v2: Version published by Transactions on Machine Learning Research in 2023 (TMLR ISSN 2835-8856) https://openreview.net/forum?id=KxBQPz7HKh. 25 pages, 11 figures. This work builds on an earlier manuscript (arXiv:2203.06043) and crucially extends it. Code is available at https://github.com/jvielhaben/MCD-XAI

Journal ref: Version published by Transactions on Machine Learning Research in 2023 (TMLR ISSN 2835-8856) https://openreview.net/forum?id=KxBQPz7HKh

arXiv:2301.08227 [pdf, other]

Diffusion-based Conditional ECG Generation with Structured State Space Models

Authors: Juan Miguel Lopez Alcaraz, Nils Strodthoff

Abstract: Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the c… ▽ More Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions. △ Less

Submitted 15 June, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: 12 pages, 9 figures. Accepted version by Computers in Biology and Medicine in 2023 under https://doi.org/10.1016/j.compbiomed.2023.107115. Source code under https://github.com/AI4HealthUOL/SSSD-ECG

Journal ref: volume 163, year 2023, and page 107115

arXiv:2211.07579 [pdf, other]

Advancing the State-of-the-Art for ECG Analysis through Structured State Space Models

Authors: Temesgen Mehari, Nils Strodthoff

Abstract: The field of deep-learning-based ECG analysis has been largely dominated by convolutional architectures. This work explores the prospects of applying the recently introduced structured state space models (SSMs) as a particularly promising approach due to its ability to capture long-term dependencies in time series. We demonstrate that this approach leads to significant improvements over the curren… ▽ More The field of deep-learning-based ECG analysis has been largely dominated by convolutional architectures. This work explores the prospects of applying the recently introduced structured state space models (SSMs) as a particularly promising approach due to its ability to capture long-term dependencies in time series. We demonstrate that this approach leads to significant improvements over the current state-of-the-art for ECG classification, which we trace back to individual pathologies. Furthermore, the model's ability to capture long-term dependencies allows to shed light on long-standing questions in the literature such as the optimal sampling rate or window size to train classification models. Interestingly, we find no evidence for using data sampled at 500Hz as opposed to 100Hz and no advantages from extending the model's input size beyond 3s. Based on this very promising first assessment, SSMs could develop into a new modeling paradigm for ECG analysis. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6+5 pages

arXiv:2208.09399 [pdf, other]

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

Authors: Juan Miguel Lopez Alcaraz, Nils Strodthoff

Abstract: The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to c… ▽ More The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results. △ Less

Submitted 6 May, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: V3: Updated results for the solar dataset. 36 pages, 13 figures. Version published by Transactions on Machine Learning Research in 2022 (TMLR ISSN 2835-8856) https://openreview.net/forum?id=hHiIbk7ApW. Source code under https://github.com/AI4HealthUOL/SSSD

Journal ref: Version published by Transactions on Machine Learning Research in 2022 (TMLR ISSN 2835-8856) https://openreview.net/forum?id=hHiIbk7ApW

arXiv:2204.05044 [pdf, other]

doi 10.1016/j.media.2023.102809

From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology

Authors: Maximilian Springenberg, Annika Frommholz, Markus Wenzel, Eva Weicken, Jackie Ma, Nils Strodthoff

Abstract: While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodology to extensively evaluate a wide range of classification models, including recent vision transform… ▽ More While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodology to extensively evaluate a wide range of classification models, including recent vision transformers, and convolutional neural networks such as: ConvNeXt, ResNet (BiT), Inception, ViT and Swin transformer, with and without supervised or self-supervised pretraining. We thoroughly tested the models on five widely used histopathology datasets containing whole slide images of breast, gastric, and colorectal cancer and developed a novel approach using an image-to-image translation model to assess the robustness of a cancer classification model against stain variations. Further, we extended existing interpretability methods to previously unstudied models and systematically reveal insights of the models' classifications strategies that can be transferred to future model architectures. △ Less

Submitted 9 May, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 14 pages, 6 figures, v2: version accepted by Medical Image Analysis, code available under https://github.com/hhi-aml/histobenchmark

arXiv:2203.06043 [pdf, other]

Sparse Subspace Clustering for Concept Discovery (SSCCD)

Authors: Johanna Vielhaben, Stefan Blücher, Nils Strodthoff

Abstract: Concepts are key building blocks of higher level human understanding. Explainable AI (XAI) methods have shown tremendous progress in recent years, however, local attribution methods do not allow to identify coherent model behavior across samples and therefore miss this essential component. In this work, we study concept-based explanations and put forward a new definition of concepts as low-dimensi… ▽ More Concepts are key building blocks of higher level human understanding. Explainable AI (XAI) methods have shown tremendous progress in recent years, however, local attribution methods do not allow to identify coherent model behavior across samples and therefore miss this essential component. In this work, we study concept-based explanations and put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers. We novelly apply sparse subspace clustering to discover these concept subspaces. Moving forward, we derive insights from concept subspaces in terms of localized input (concept) maps, show how to quantify concept relevances and lastly, evaluate similarities and transferability between concepts. We empirically demonstrate the soundness of the proposed Sparse Subspace Clustering for Concept Discovery (SSCCD) method for a variety of different image classification tasks. This approach allows for deeper insights into the actual model behavior that would remain hidden from conventional input-level heatmaps. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 24 pages, 24 figures, code will be made publicly available

arXiv:2106.13497 [pdf, other]

On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy

Authors: Vignesh Srinivasan, Nils Strodthoff, Jackie Ma, Alexander Binder, Klaus-Robert Müller, Wojciech Samek

Abstract: There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exempla… ▽ More There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use-case considered in this work. △ Less

Submitted 25 June, 2021; originally announced June 2021.

arXiv:2104.08237 [pdf, other]

Predicting the Binding of SARS-CoV-2 Peptides to the Major Histocompatibility Complex with Recurrent Neural Networks

Authors: Johanna Vielhaben, Markus Wenzel, Eva Weicken, Nils Strodthoff

Abstract: Predicting the binding of viral peptides to the major histocompatibility complex with machine learning can potentially extend the computational immunology toolkit for vaccine development, and serve as a key component in the fight against a pandemic. In this work, we adapt and extend USMPep, a recently proposed, conceptually simple prediction algorithm based on recurrent neural networks. Most notab… ▽ More Predicting the binding of viral peptides to the major histocompatibility complex with machine learning can potentially extend the computational immunology toolkit for vaccine development, and serve as a key component in the fight against a pandemic. In this work, we adapt and extend USMPep, a recently proposed, conceptually simple prediction algorithm based on recurrent neural networks. Most notably, we combine regressors (binding affinity data) and classifiers (mass spectrometry data) from qualitatively different data sources to obtain a more comprehensive prediction tool. We evaluate the performance on a recently released SARS-CoV-2 dataset with binding stability measurements. USMPep not only sets new benchmarks on selected single alleles, but consistently turns out to be among the best-performing methods or, for some metrics, to be even the overall best-performing method for this task. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: Accepted at ICLR 2021 Workshop: Machine Learning for Preventing and Combating Pandemics; code available at https://github.com/nstrodt/USMPep

arXiv:2103.12676 [pdf, other]

doi 10.1016/j.compbiomed.2021.105114

Self-supervised representation learning from 12-lead ECG data

Authors: Temesgen Mehari, Nils Strodthoff

Abstract: Clinical 12-lead electrocardiography (ECG) is one of the most widely encountered kinds of biosignals. Despite the increased availability of public ECG datasets, label scarcity remains a central challenge in the field. Self-supervised learning represents a promising way to alleviate this issue. In this work, we put forward the first comprehensive assessment of self-supervised representation learnin… ▽ More Clinical 12-lead electrocardiography (ECG) is one of the most widely encountered kinds of biosignals. Despite the increased availability of public ECG datasets, label scarcity remains a central challenge in the field. Self-supervised learning represents a promising way to alleviate this issue. In this work, we put forward the first comprehensive assessment of self-supervised representation learning from clinical 12-lead ECG data. To this end, we adapt state-of-the-art self-supervised methods based on instance discrimination and latent forecasting to the ECG domain. In a first step, we learn contrastive representations and evaluate their quality based on linear evaluation performance on a recently established, comprehensive, clinical ECG classification task. In a second step, we analyze the impact of self-supervised pretraining on finetuned ECG classifiers as compared to purely supervised performance. For the best-performing method, an adaptation of contrastive predictive coding, we find a linear evaluation performance only 0.5% below supervised performance. For the finetuned models, we find improvements in downstream performance of roughly 1% compared to supervised performance, label efficiency, as well as robustness against physiological noise. This work clearly establishes the feasibility of extracting discriminative representations from ECG data via self-supervised learning and the numerous advantages when finetuning such representations on downstream tasks as compared to purely supervised training. As first comprehensive assessment of its kind in the ECG domain carried out exclusively on publicly available datasets, we hope to establish a first step towards reproducible progress in the rapidly evolving field of representation learning for biosignals. △ Less

Submitted 4 January, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: 15 pages, 12 figures, matches published version, code available under https://github.com/hhi-aml/ecg-selfsupervised

Journal ref: Comput. Biol. Med. 141 (2022) 105114

arXiv:2102.13519 [pdf, other]

doi 10.1016/j.artint.2022.103774

PredDiff: Explanations and Interactions from Conditional Expectations

Authors: Stefan Blücher, Johanna Vielhaben, Nils Strodthoff

Abstract: PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formal… ▽ More PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions. △ Less

Submitted 8 September, 2022; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: 35 pages, 20 Figures, accepted journal version, code available at https://github.com/AI4HealthUOL/preddiff-interactions

Journal ref: Artificial Intelligence 312 (2022) 103774

arXiv:2012.10264 [pdf, other]

doi 10.1103/PhysRevE.103.063304

Generative Neural Samplers for the Quantum Heisenberg Chain

Authors: Johanna Vielhaben, Nils Strodthoff

Abstract: Generative neural samplers offer a complementary approach to Monte Carlo methods for problems in statistical physics and quantum field theory. This work tests the ability of generative neural samplers to estimate observables for real-world low-dimensional spin systems. It maps out how autoregressive models can sample configurations of a quantum Heisenberg chain via a classical approximation based… ▽ More Generative neural samplers offer a complementary approach to Monte Carlo methods for problems in statistical physics and quantum field theory. This work tests the ability of generative neural samplers to estimate observables for real-world low-dimensional spin systems. It maps out how autoregressive models can sample configurations of a quantum Heisenberg chain via a classical approximation based on the Suzuki-Trotter transformation. We present results for energy, specific heat and susceptibility for the isotropic XXX and the anisotropic XY chain that are in good agreement with Monte Carlo results within the same approximation scheme. △ Less

Submitted 18 December, 2020; originally announced December 2020.

Comments: 10 figures

Journal ref: Phys. Rev. E 103, 063304 (2021)

arXiv:2010.09622 [pdf, other]

Inferring respiratory and circulatory parameters from electrical impedance tomography with deep recurrent models

Authors: Nils Strodthoff, Claas Strodthoff, Tobias Becher, Norbert Weiler, Inéz Frerichs

Abstract: Electrical impedance tomography (EIT) is a noninvasive imaging modality that allows a continuous assessment of changes in regional bioimpedance of different organs. One of its most common biomedical applications is monitoring regional ventilation distribution in critically ill patients treated in intensive care units. In this work, we put forward a proof-of-principle study that demonstrates how on… ▽ More Electrical impedance tomography (EIT) is a noninvasive imaging modality that allows a continuous assessment of changes in regional bioimpedance of different organs. One of its most common biomedical applications is monitoring regional ventilation distribution in critically ill patients treated in intensive care units. In this work, we put forward a proof-of-principle study that demonstrates how one can reconstruct synchronously measured respiratory or circulatory parameters from the EIT image sequence using a deep learning model trained in an end-to-end fashion. We demonstrate that one can accurately infer absolute volume, absolute flow, normalized airway pressure and within certain limitations even the normalized arterial blood pressure from the EIT signal alone, in a way that generalizes to unseen patients without prior calibration. As an outlook with direct clinical relevance, we furthermore demonstrate the feasibility of reconstructing the absolute transpulmonary pressure from a combination of EIT and absolute airway pressure, as a way to potentially replace the invasive measurement of esophageal pressure. With these results, we hope to stimulate further studies building on the framework put forward in this work. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 6 pages, 3 figures

arXiv:2004.13701 [pdf, other]

Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL

Authors: Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, Wojciech Samek

Abstract: Electrocardiography is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by automatic interpretation algorithms. The progress in the field of automatic ECG interpretation has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. T… ▽ More Electrocardiography is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by automatic interpretation algorithms. The progress in the field of automatic ECG interpretation has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. To alleviate these issues, we put forward first benchmarking results for the recently published, freely accessible PTB-XL dataset, covering a variety of tasks from different ECG statement prediction tasks over age and gender prediction to signal quality assessment. We find that convolutional neural networks, in particular resnet- and inception-based architectures, show the strongest performance across all tasks outperforming feature-based algorithms by a large margin. These results are complemented by deeper insights into the classification algorithm in terms of hidden stratification, model uncertainty and an exploratory interpretability analysis. We also put forward benchmarking results for the ICBEB2018 challenge ECG dataset and discuss prospects of transfer learning using classifiers pretrained on PTB-XL. With this resource, we aim to establish the PTB-XL dataset as a resource for structured benchmarking of ECG analysis algorithms and encourage other researchers in the field to join these efforts. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 12 pages, 8 figures

arXiv:2003.01504 [pdf, other]

doi 10.1103/PhysRevD.101.094507

Towards Novel Insights in Lattice Field Theory with Explainable Machine Learning

Authors: Stefan Bluecher, Lukas Kades, Jan M. Pawlowski, Nils Strodthoff, Julian M. Urban

Abstract: Machine learning has the potential to aid our understanding of phase structures in lattice quantum field theories through the statistical analysis of Monte Carlo samples. Available algorithms, in particular those based on deep learning, often demonstrate remarkable performance in the search for previously unidentified features, but tend to lack transparency if applied naively. To address these sho… ▽ More Machine learning has the potential to aid our understanding of phase structures in lattice quantum field theories through the statistical analysis of Monte Carlo samples. Available algorithms, in particular those based on deep learning, often demonstrate remarkable performance in the search for previously unidentified features, but tend to lack transparency if applied naively. To address these shortcomings, we propose representation learning in combination with interpretability methods as a framework for the identification of observables. More specifically, we investigate action parameter regression as a pretext task while using layer-wise relevance propagation (LRP) to identify the most important observables depending on the location in the phase diagram. The approach is put to work in the context of a scalar Yukawa model in (2+1)d. First, we investigate a multilayer perceptron to determine an importance hierarchy of several predefined, standard observables. The method is then applied directly to the raw field configurations using a convolutional network, demonstrating the ability to reconstruct all order parameters from the learned filter weights. Based on our results, we argue that due to its broad applicability, attribution methods such as LRP could prove a useful and versatile tool in our search for new physical insights. In the case of the Yukawa model, it facilitates the construction of an observable that characterises the symmetric phase. △ Less

Submitted 18 May, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: 13 pages, 11 figures

Journal ref: Phys. Rev. D 101, 094507 (2020)

arXiv:1910.13496 [pdf, other]

doi 10.1103/PhysRevE.101.023304

Asymptotically unbiased estimation of physical observables with neural samplers

Authors: Kim A. Nicoli, Shinichi Nakajima, Nils Strodthoff, Wojciech Samek, Klaus-Robert Müller, Pan Kessel

Abstract: We propose a general framework for the estimation of observables with generative neural samplers focusing on modern deep generative neural networks that provide an exact sampling probability. In this framework, we present asymptotically unbiased estimators for generic observables, including those that explicitly depend on the partition function such as free energy or entropy, and derive correspond… ▽ More We propose a general framework for the estimation of observables with generative neural samplers focusing on modern deep generative neural networks that provide an exact sampling probability. In this framework, we present asymptotically unbiased estimators for generic observables, including those that explicitly depend on the partition function such as free energy or entropy, and derive corresponding variance estimators. We demonstrate their practical applicability by numerical experiments for the 2d Ising model which highlight the superiority over existing methods. Our approach greatly enhances the applicability of generative neural samplers to real-world physical systems. △ Less

Submitted 13 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: 5 figures

Journal ref: Phys. Rev. E 101, 023304 (2020)

arXiv:1906.00735 [pdf, other]

doi 10.1007/978-3-030-33676-9_25

Achieving Generalizable Robustness of Deep Neural Networks by Stability Training

Authors: Jan Laermann, Wojciech Samek, Nils Strodthoff

Abstract: We study the recently introduced stability training as a general-purpose method to increase the robustness of deep neural networks against input perturbations. In particular, we explore its use as an alternative to data augmentation and validate its performance against a number of distortion types and transformations including adversarial examples. In our image classification experiments using Ima… ▽ More We study the recently introduced stability training as a general-purpose method to increase the robustness of deep neural networks against input perturbations. In particular, we explore its use as an alternative to data augmentation and validate its performance against a number of distortion types and transformations including adversarial examples. In our image classification experiments using ImageNet data stability training performs on a par or even outperforms data augmentation for specific transformations, while consistently offering improved robustness against a broader range of distortion strengths and types unseen during training, a considerably smaller hyperparameter dependence and less potentially negative side effects compared to data augmentation. △ Less

Submitted 12 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: 18 pages, 25 figures; Camera-ready version

Journal ref: DAGM GCPR 2019. Lecture Notes in Computer Science, vol. 11824, 360-373, 2019

arXiv:1903.11048 [pdf, other]

Comment on "Solving Statistical Mechanics Using VANs": Introducing saVANt - VANs Enhanced by Importance and MCMC Sampling

Authors: Kim Nicoli, Pan Kessel, Nils Strodthoff, Wojciech Samek, Klaus-Robert Müller, Shinichi Nakajima

Abstract: In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is… ▽ More In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is possible due to a singular property of VANs, namely that they provide the exact sample probability. With these modifications, we believe that their method could have a substantially greater impact on various important fields of physics, including strongly-interacting field theories and statistical physics. △ Less

Submitted 26 March, 2019; originally announced March 2019.

Comments: 6 pages, 4 figures

arXiv:1807.10495 [pdf, other]

doi 10.1109/JSAC.2019.2934001

Enhanced Machine Learning Techniques for Early HARQ Feedback Prediction in 5G

Authors: Nils Strodthoff, Barış Göktepe, Thomas Schierl, Cornelius Hellge, Wojciech Samek

Abstract: We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by machine learning techniques as a path towards ultra-reliable and low-latency communication (URLLC). To this end, we propose machine learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging f… ▽ More We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by machine learning techniques as a path towards ultra-reliable and low-latency communication (URLLC). To this end, we propose machine learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders. These methods are evaluated based on their prospects of complying with the URLLC requirements of effective block error rates below $10^{-5}$ at small latency overheads. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads, and show the benefit over regular HARQ and existing E-HARQ schemes without machine learning. △ Less

Submitted 25 October, 2019; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: 14 pages, 15 figures; accepted version

Journal ref: IEEE JSAC 37 (2019), no. 11, 2573-2587

arXiv:1806.07385 [pdf, other]

doi 10.1088/1361-6579/aaf34d

Detecting and interpreting myocardial infarction using fully convolutional neural networks

Authors: Nils Strodthoff, Claas Strodthoff

Abstract: Objective: We aim to provide an algorithm for the detection of myocardial infarction that operates directly on ECG data without any preprocessing and to investigate its decision criteria. Approach: We train an ensemble of fully convolutional neural networks on the PTB ECG dataset and apply state-of-the-art attribution methods. Main results: Our classifier reaches 93.3% sensitivity and 89.7% specif… ▽ More Objective: We aim to provide an algorithm for the detection of myocardial infarction that operates directly on ECG data without any preprocessing and to investigate its decision criteria. Approach: We train an ensemble of fully convolutional neural networks on the PTB ECG dataset and apply state-of-the-art attribution methods. Main results: Our classifier reaches 93.3% sensitivity and 89.7% specificity evaluated using 10-fold cross-validation with sampling based on patients. The presented method outperforms state-of-the-art approaches and reaches the performance level of human cardiologists for detection of myocardial infarction. We are able to discriminate channel-specific regions that contribute most significantly to the neural network's decision. Interestingly, the network's decision is influenced by signs also recognized by human cardiologists as indicative of myocardial infarction. Significance: Our results demonstrate the high prospects of algorithmic ECG analysis for future clinical applications considering both its quantitative performance as well as the possibility of assessing decision criteria on a per-example basis, which enhances the comprehensibility of the approach. △ Less

Submitted 5 February, 2019; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: 11 pages, 4 figures

Journal ref: Physiological Measurement, vol. 40, no. 1, p. 015001, 2019

Showing 1–29 of 29 results for author: Strodthoff, N