-
Detecting Edited Knowledge in Language Models
Authors:
Paul Youssef,
Zhixue Zhao,
Jörg Schlötterer,
Christin Seifert
Abstract:
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transpa…
▽ More
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs.
△ Less
Submitted 1 July, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Authors:
Van Bach Nguyen,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how…
▽ More
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.
△ Less
Submitted 26 April, 2024;
originally announced May 2024.
-
Feature importance to explain multimodal prediction models. A clinical use case
Authors:
Jorn-Jan van de Beld,
Shreyasi Pathak,
Jeroen Geerdink,
Johannes H. Hegeman,
Christin Seifert
Abstract:
Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative…
▽ More
Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative and per-operative data from elderly hip fracture patients. Specifically, we include static patient data, hip and chest images before surgery in pre-operative data, vital signals, and medications administered during surgery in per-operative data. We extract features from image modalities using ResNet and from vital signals using LSTM. Explainable model outcomes are essential for clinical applicability, therefore we compute Shapley values to explain the predictions of our multimodal black box model. We find that i) Shapley values can be used to estimate the relative contribution of each modality both locally and globally, and ii) a modified version of the chain rule can be used to propagate Shapley values through a sequence of models supporting interpretable local explanations. Our findings imply that a multimodal combination of black box models can be explained by propagating Shapley values through the model sequence.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Authors:
Van Bach Nguyen,
Jörg Schlötterer,
Christin Seifert
Abstract:
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, in…
▽ More
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Approximation of Random Evolution Equations
Authors:
Katharina Klioba,
Christian Seifert
Abstract:
In this paper, we present an abstract framework to obtain convergence rates for the approximation of random evolution equations corresponding to a random family of forms determined by finite-dimensional noise. The full discretisation error in space, time, and randomness is considered, where polynomial chaos expansion (PCE) is used for the semi-discretisation in randomness. The main result are regu…
▽ More
In this paper, we present an abstract framework to obtain convergence rates for the approximation of random evolution equations corresponding to a random family of forms determined by finite-dimensional noise. The full discretisation error in space, time, and randomness is considered, where polynomial chaos expansion (PCE) is used for the semi-discretisation in randomness. The main result are regularity conditions on the random forms under which convergence of polynomial order in randomness is obtained depending on the smoothness of the coefficients and the Sobolev regularity of the initial value. In space and time, the same convergence rates as in the deterministic setting are achieved. To this end, we derive error estimates for vector-valued PCE as well as a quantified version of the Trotter-Kato theorem for form-induced semigroups.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Authors:
Ahmad Idrissi-Yaghir,
Amin Dada,
Henning Schäfer,
Kamyar Arzideh,
Giulia Baldini,
Jan Trienes,
Max Hasin,
Jeanette Bewersdorff,
Cynthia S. Schmidt,
Marie Bauer,
Kaleb E. Smith,
Jiang Bian,
Yonghui Wu,
Jörg Schlötterer,
Torsten Zesch,
Peter A. Horn,
Christin Seifert,
Felix Nensa,
Jens Kleesiek,
Christoph M. Friedrich
Abstract:
Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are commo…
▽ More
Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.
△ Less
Submitted 8 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges
Authors:
Shreyasi Pathak,
Jörg Schlötterer,
Jeroen Veltman,
Jeroen Geerdink,
Maurice van Keulen,
Christin Seifert
Abstract:
Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Havi…
▽ More
Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Having high quality prototypes is a pre-requisite for a truly interpretable model. In this work, we propose a prototype evaluation framework for coherence (PEF-C) for quantitatively evaluating the quality of the prototypes based on domain knowledge. We show the use of PEF-C in the context of breast cancer prediction using mammography. Existing works on prototype-based models on breast cancer prediction using mammography have focused on improving the classification performance of prototype-based models compared to black-box models and have evaluated prototype quality through anecdotal evidence. We are the first to go beyond anecdotal evidence and evaluate the quality of the mammography prototypes systematically using our PEF-C. Specifically, we apply three state-of-the-art prototype-based models, ProtoPNet, BRAIxProtoPNet++ and PIP-Net on mammography images for breast cancer prediction and evaluate these models w.r.t. i) classification performance, and ii) quality of the prototypes, on three public datasets. Our results show that prototype-based models are competitive with black-box models in terms of classification performance, and achieve a higher score in detecting ROIs. However, the quality of the prototypes are not yet sufficient and can be improved in aspects of relevance, purity and learning a variety of prototypes. We call the XAI community to systematically evaluate the quality of the prototypes to check their true usability in high stake decisions and improve such models further.
△ Less
Submitted 21 April, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans
Authors:
Lisa Anita De Santi,
Jörg Schlötterer,
Michael Scheschenja,
Joel Wessendorf,
Meike Nauta,
Vincenzo Positano,
Christin Seifert
Abstract:
Information from neuroimaging examinations (CT, MRI) is increasingly used to support diagnoses of dementia, e.g., Alzheimer's disease. While current clinical practice is mainly based on visual inspection and feature engineering, Deep Learning approaches can be used to automate the analysis and to discover new image-biomarkers. Part-prototype neural networks (PP-NN) are an alternative to standard b…
▽ More
Information from neuroimaging examinations (CT, MRI) is increasingly used to support diagnoses of dementia, e.g., Alzheimer's disease. While current clinical practice is mainly based on visual inspection and feature engineering, Deep Learning approaches can be used to automate the analysis and to discover new image-biomarkers. Part-prototype neural networks (PP-NN) are an alternative to standard blackbox models, and have shown promising results in general computer vision. PP-NN's base their reasoning on prototypical image regions that are learned fully unsupervised, and combined with a simple-to-understand decision layer. We present PIPNet3D, a PP-NN for volumetric images. We apply PIPNet3D to the clinical case study of Alzheimer's Disease diagnosis from structural Magnetic Resonance Imaging (sMRI). We assess the quality of prototypes under a systematic evaluation framework, propose new metrics to evaluate brain prototypes and perform an evaluation with domain experts. Our results show that PIPNet3D is an interpretable, compact model for Alzheimer's diagnosis with its reasoning well aligned to medical domain knowledge. Notably, PIPNet3D achieves the same accuracy as its blackbox counterpart; and removing the remaining clinically irrelevant prototypes from its decision process does not decrease predictive performance.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study
Authors:
Osman Alperen Koraş,
Jörg Schlötterer,
Christin Seifert
Abstract:
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We…
▽ More
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We highlight the significance of paying careful attention even to reasonably omitted details for replicating advanced frameworks like BASS, and emphasize key practices for writing replicable papers.
△ Less
Submitted 25 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
On solitary waves for the Korteweg--de Vries equation on metric star graphs
Authors:
Delio Mugnolo,
Diego Noja,
Christian Seifert
Abstract:
We study the Korteweg--de Vries equation on a metric star graph and investigate existence of solitary waves on the metric graph in terms of the coefficients of the equation on each edge, the coupling condition at the central vertex of the star and the speeds of the travelling wave. We show that, with a continuity condition at the vertex, solitary waves can occur exactly when the parameters are cho…
▽ More
We study the Korteweg--de Vries equation on a metric star graph and investigate existence of solitary waves on the metric graph in terms of the coefficients of the equation on each edge, the coupling condition at the central vertex of the star and the speeds of the travelling wave. We show that, with a continuity condition at the vertex, solitary waves can occur exactly when the parameters are chosen in a fairly special manner. We also consider coupling conditions beyond continuity.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs
Authors:
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we co…
▽ More
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we consider a complementary aspect, namely the coherency of factual knowledge in PLMs, i.e., how often can PLMs predict the subject entity given its initial prediction of the object entity. This goes beyond evaluating how much PLMs know, and focuses on the internal state of knowledge inside them. Our results indicate that PLMs have low coherency using manually written, optimized and paraphrased prompts, but including an evidence paragraph leads to substantial improvement. This shows that PLMs fail to model inverse relations and need further enhancements to be able to handle retrieving facts from their parameters in a coherent manner, and to be considered as knowledge bases.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Authors:
Jan Trienes,
Sebastian Joseph,
Jörg Schlötterer,
Christin Seifert,
Kyle Lo,
Wei Xu,
Byron C. Wallace,
Junyi Jessy Li
Abstract:
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their…
▽ More
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
△ Less
Submitted 4 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Explainable Bayesian Optimization
Authors:
Tanmay Chakraborty,
Christin Seifert,
Christian Wirth
Abstract:
In industry, Bayesian optimization (BO) is widely applied in the human-AI collaborative parameter tuning of cyber-physical systems. However, BO's solutions may deviate from human experts' actual goal due to approximation errors and simplified objectives, requiring subsequent tuning. The black-box nature of BO limits the collaborative tuning process because the expert does not trust the BO recommen…
▽ More
In industry, Bayesian optimization (BO) is widely applied in the human-AI collaborative parameter tuning of cyber-physical systems. However, BO's solutions may deviate from human experts' actual goal due to approximation errors and simplified objectives, requiring subsequent tuning. The black-box nature of BO limits the collaborative tuning process because the expert does not trust the BO recommendations. Current explainable AI (XAI) methods are not tailored for optimization and thus fall short of addressing this gap. To bridge this gap, we propose TNTRules (TUNE-NOTUNE Rules), a post-hoc, rule-based explainability method that produces high quality explanations through multiobjective optimization. Our evaluation of benchmark optimization problems and real-world hyperparameter optimization tasks demonstrates TNTRules' superiority over state-of-the-art XAI methods in generating high quality explanations. This work contributes to the intersection of BO and XAI, providing interpretable optimization techniques for real-world applications.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Trust your BMS: Designing a Lightweight Authentication Architecture for Industrial Networks
Authors:
Fikret Basic,
Christian Steger,
Christian Seifert,
Robert Kofler
Abstract:
With the advent of clean energy awareness and systems that rely on extensive battery usage, the community has seen an increased interest in the development of more complex and secure Battery Management Systems (BMS). In particular, the inclusion of BMS in modern complex systems like electric vehicles and power grids has presented a new set of security-related challenges. A concern is shown when BM…
▽ More
With the advent of clean energy awareness and systems that rely on extensive battery usage, the community has seen an increased interest in the development of more complex and secure Battery Management Systems (BMS). In particular, the inclusion of BMS in modern complex systems like electric vehicles and power grids has presented a new set of security-related challenges. A concern is shown when BMS are intended to extend their communication with external system networks, as their interaction can leave many backdoors open that potential attackers could exploit. Hence, it is highly desirable to find a general design that can be used for BMS and its system inclusion. In this work, a security architecture solution is proposed intended for the communication between BMS and other system devices. The aim of the proposed architecture is to be easily applicable in different industrial settings and systems, while at the same time kee** the design lightweight in nature.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Feature Attribution Explanations for Spiking Neural Networks
Authors:
Elisa Nguyen,
Meike Nauta,
Gwenn Englebienne,
Christin Seifert
Abstract:
Third-generation artificial neural networks, Spiking Neural Networks (SNNs), can be efficiently implemented on hardware. Their implementation on neuromorphic chips opens a broad range of applications, such as machine learning-based autonomous control and intelligent biomedical devices. In critical applications, however, insight into the reasoning of SNNs is important, thus SNNs need to be equipped…
▽ More
Third-generation artificial neural networks, Spiking Neural Networks (SNNs), can be efficiently implemented on hardware. Their implementation on neuromorphic chips opens a broad range of applications, such as machine learning-based autonomous control and intelligent biomedical devices. In critical applications, however, insight into the reasoning of SNNs is important, thus SNNs need to be equipped with the ability to explain how decisions are reached. We present \textit{Temporal Spike Attribution} (TSA), a local explanation method for SNNs. To compute the explanation, we aggregate all information available in model-internal variables: spike times and model weights. We evaluate TSA on artificial and real-world time series data and measure explanation quality w.r.t. multiple quantitative criteria. We find that TSA correctly identifies a small subset of input features relevant to the decision (i.e., is output-complete and compact) and generates similar explanations for similar inputs (i.e., is continuous). Further, our experiments show that incorporating the notion of \emph{absent} spikes improves explanation quality. Our work can serve as a starting point for explainable SNNs, with future implementations on hardware yielding not only predictions but also explanations in a broad range of application scenarios. Source code is available at https://github.com/ElisaNguyen/tsa-explanations.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models
Authors:
Paul Youssef,
Osman Alperen Koraş,
Meijie Li,
Jörg Schlötterer,
Christin Seifert
Abstract:
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for…
▽ More
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge. Our contributions are: (1) We propose a categorization scheme for factual probing methods that is based on how their inputs, outputs and the probed PLMs are adapted; (2) We provide an overview of the datasets used for factual probing; (3) We synthesize insights about knowledge retention and prompt optimization in PLMs, analyze obstacles to adopting PLMs as knowledge bases and outline directions for future work.
△ Less
Submitted 4 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Weakly Supervised Learning for Breast Cancer Prediction on Mammograms in Realistic Settings
Authors:
Shreyasi Pathak,
Jörg Schlötterer,
Jeroen Geerdink,
Onno Dirk Vijlbrief,
Maurice van Keulen,
Christin Seifert
Abstract:
Automatic methods for early detection of breast cancer on mammography can significantly decrease mortality. Broad uptake of those methods in hospitals is currently hindered because the methods have too many constraints. They assume annotations available for single images or even regions-of-interest (ROIs), and a fixed number of images per patient. Both assumptions do not hold in a general hospital…
▽ More
Automatic methods for early detection of breast cancer on mammography can significantly decrease mortality. Broad uptake of those methods in hospitals is currently hindered because the methods have too many constraints. They assume annotations available for single images or even regions-of-interest (ROIs), and a fixed number of images per patient. Both assumptions do not hold in a general hospital setting. Relaxing those assumptions results in a weakly supervised learning setting, where labels are available per case, but not for individual images or ROIs. Not all images taken for a patient contain malignant regions and the malignant ROIs cover only a tiny part of an image, whereas most image regions represent benign tissue. In this work, we investigate a two-level multi-instance learning (MIL) approach for case-level breast cancer prediction on two public datasets (1.6k and 5k cases) and an in-house dataset of 21k cases. Observing that breast cancer is usually only present in one side, while images of both breasts are taken as a precaution, we propose a domain-specific MIL pooling variant. We show that two-level MIL can be applied in realistic clinical settings where only case labels, and a variable number of images per patient are available. Data in realistic settings scales with continuous patient intake, while manual annotation efforts do not. Hence, research should focus in particular on unsupervised ROI extraction, in order to improve breast cancer prediction for all patients.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Bridging the Gulf of Envisioning: Cognitive Design Challenges in LLM Interfaces
Authors:
Hariharan Subramonyam,
Roy Pea,
Christopher Lawrence Pondoc,
Maneesh Agrawala,
Colleen Seifert
Abstract:
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interact…
▽ More
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users 'envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to instruct the LLM to do the task, and (3) how to evaluate the success of the LLM's output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.
△ Less
Submitted 18 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Know What Not To Know: Users' Perception of Abstaining Classifiers
Authors:
Andrea Papenmeier,
Daniel Hienert,
Yvonne Kammerer,
Christin Seifert,
Dagmar Kern
Abstract:
Machine learning systems can help humans to make decisions by providing decision suggestions (i.e., a label for a datapoint). However, individual datapoints do not always provide enough clear evidence to make confident suggestions. Although methods exist that enable systems to identify those datapoints and subsequently abstain from suggesting a label, it remains unclear how users would react to su…
▽ More
Machine learning systems can help humans to make decisions by providing decision suggestions (i.e., a label for a datapoint). However, individual datapoints do not always provide enough clear evidence to make confident suggestions. Although methods exist that enable systems to identify those datapoints and subsequently abstain from suggesting a label, it remains unclear how users would react to such system behavior. This paper presents first findings from a user study on systems that do or do not abstain from labeling ambiguous datapoints. Our results show that label suggestions on ambiguous datapoints bear a high risk of unconsciously influencing the users' decisions, even toward incorrect ones. Furthermore, participants perceived a system that abstains from labeling uncertain datapoints as equally competent and trustworthy as a system that delivers label suggestions for all datapoints. Consequently, if abstaining does not impair a system's credibility, it can be a useful mechanism to increase decision quality.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?
Authors:
Phuong Quynh Le,
Jörg Schlötterer,
Christin Seifert
Abstract:
Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite cla…
▽ More
Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.
△ Less
Submitted 9 January, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers
Authors:
Meike Nauta,
Christin Seifert
Abstract:
Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype m…
▽ More
Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype models. Based on the Co-12 properties for explanation quality as introduced in arXiv:2201.08164 (e.g., correctness, completeness, compactness), we review existing work that evaluates part-prototype models, reveal research gaps and outline future approaches for evaluation of the explanation quality of part-prototype models. This paper, therefore, contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We additionally provide a ``Co-12 cheat sheet'' that acts as a concise summary of our findings on evaluating part-prototype models.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis
Authors:
Jan Trienes,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting…
▽ More
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Interpreting and Correcting Medical Image Classification with PIP-Net
Authors:
Meike Nauta,
Johannes H. Hegeman,
Jeroen Geerdink,
Jörg Schlötterer,
Maurice van Keulen,
Christin Seifert
Abstract:
Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability…
▽ More
Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
△ Less
Submitted 11 September, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Spectral Theory for Schrödinger operators on compact metric graphs with $δ$ and $δ'$ couplings: a survey
Authors:
Jonathan Rohleder,
Christian Seifert
Abstract:
Spectral properties of Schrödinger operators on compact metric graphs are studied and special emphasis is put on differences in the spectral behavior between different classes of vertex conditions. We survey recent results especially for $δ$ and $δ'$ couplings and demonstrate the spectral properties on many examples. Amongst other things, properties of the ground state eigenvalue and eigenfunction…
▽ More
Spectral properties of Schrödinger operators on compact metric graphs are studied and special emphasis is put on differences in the spectral behavior between different classes of vertex conditions. We survey recent results especially for $δ$ and $δ'$ couplings and demonstrate the spectral properties on many examples. Amongst other things, properties of the ground state eigenvalue and eigenfunction and the spectral behavior under various perturbations of the metric graph or the vertex conditions are considered.
△ Less
Submitted 3 July, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Perturbations of non-autonomous second-order abstract Cauchy problems
Authors:
Christian Budde,
Christian Seifert
Abstract:
In this paper we present time-dependent perturbations of second-order non-autonomous abstract Cauchy problems associated to a family of operators with constant domain. We make use of the equivalence to a first-order non-autonomous abstract Cauchy problem in a product space, which we elaborate in full detail. As an application we provide a perturbed non-autonomous wave equation.
In this paper we present time-dependent perturbations of second-order non-autonomous abstract Cauchy problems associated to a family of operators with constant domain. We make use of the equivalence to a first-order non-autonomous abstract Cauchy problem in a product space, which we elaborate in full detail. As an application we provide a perturbed non-autonomous wave equation.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
How Accurate Does It Feel? -- Human Perception of Different Types of Classification Mistakes
Authors:
Andrea Papenmeier,
Dagmar Kern,
Daniel Hienert,
Yvonne Kammerer,
Christin Seifert
Abstract:
Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user's perception of the classifier's performance. In our research, we investigated whether the classification difficu…
▽ More
Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user's perception of the classifier's performance. In our research, we investigated whether the classification difficulty of a data point influences how strongly a prediction mistake reduces the "perceived accuracy". In an experimental online study, 225 participants interacted with three fictive classifiers with equal accuracy (73%). The classifiers made prediction mistakes on three different types of data points (easy, difficult, impossible). After the interaction, participants judged the classifier's accuracy. We found that not all prediction mistakes reduced the perceived accuracy equally. Furthermore, the perceived accuracy differed significantly from the calculated accuracy. To conclude, accuracy and related measures seem unsuitable to represent how users perceive the performance of classifiers.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Explaining Machine Learning Models in Natural Conversations: Towards a Conversational XAI Agent
Authors:
Van Bach Nguyen,
Jörg Schlötterer,
Christin Seifert
Abstract:
The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard des…
▽ More
The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard design for the agent comprising natural language understanding and generation components. We build upon an XAI question bank which we extend by quality-controlled paraphrases to understand the user's information needs. We further systematically survey the literature for suitable explanation methods that provide the information to answer those questions, and present a comprehensive list of suggestions. Our work is the first step towards truly natural conversations about machine learning models with an explanation agent. The comprehensive list of XAI questions and the corresponding explanation methods may support other researchers in providing the necessary information to address users' demands.
△ Less
Submitted 28 August, 2023; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Human-AI Guidelines in Practice: Leaky Abstractions as an Enabler in Collaborative Software Teams
Authors:
Hariharan Subramonyam,
Jane Im,
Colleen Seifert,
Eytan Adar
Abstract:
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do…
▽ More
In conventional software development, user experience (UX) designers and engineers collaborate through separation of concerns (SoC): designers create human interface specifications, and engineers build to those specifications. However, we argue that Human-AI systems thwart SoC because human needs must shape the design of the AI interface, the underlying AI sub-components, and training data. How do designers and engineers currently collaborate on AI and UX design? To find out, we interviewed 21 industry professionals (UX researchers, AI engineers, data scientists, and managers) across 14 organizations about their collaborative work practices and associated challenges. We find that hidden information encapsulated by SoC challenges collaboration across design and engineering concerns. Practitioners describe inventing ad-hoc representations exposing low-level design and implementation details (which we characterize as leaky abstractions) to "puncture" SoC and share information across expertise boundaries. We identify how leaky abstractions are employed to collaborate at the AI-UX boundary and formalize a process of creating and using leaky abstractions.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
A note on the Lumer--Phillips theorem for bi-continuous semigroups
Authors:
Karsten Kruse,
Christian Seifert
Abstract:
Given a Banach space $X$ and an additional coarser Hausdorff locally convex topology $τ$ on $X$ we characterise the generators of $τ$-bi-continuous semigroups in the spirit of the Lumer--Phillips theorem, i.e. by means of dissipativity w.r.t.~a directed system of seminorms and a range condition.
Given a Banach space $X$ and an additional coarser Hausdorff locally convex topology $τ$ on $X$ we characterise the generators of $τ$-bi-continuous semigroups in the spirit of the Lumer--Phillips theorem, i.e. by means of dissipativity w.r.t.~a directed system of seminorms and a range condition.
△ Less
Submitted 8 November, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Final state observability estimates and cost-uniform approximate null-controllability for bi-continuous semigroups
Authors:
Karsten Kruse,
Christian Seifert
Abstract:
We consider final state observability estimates for bi-continuous semigroups on Banach spaces, i.e. for every initial value, estimating the state at a final time $T>0$ by taking into account the orbit of the initial value under the semigroup for $t\in [0,T]$, measured in a suitable norm. We state a sufficient criterion based on an uncertainty relation and a dissipation estimate and provide two exa…
▽ More
We consider final state observability estimates for bi-continuous semigroups on Banach spaces, i.e. for every initial value, estimating the state at a final time $T>0$ by taking into account the orbit of the initial value under the semigroup for $t\in [0,T]$, measured in a suitable norm. We state a sufficient criterion based on an uncertainty relation and a dissipation estimate and provide two examples of bi-continuous semigroups which share a final state observability estimate, namely the Gauss-Weierstrass semigroup and the Ornstein-Uhlenbeck semigroup on the space of bounded continuous functions. Moreover, we generalise the duality between cost-uniform approximate null-controllability and final state observability estimates to the setting of locally convex spaces for the case of bounded and continuous control functions, which seems to be new even for the Banach spaces case.
△ Less
Submitted 31 March, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers
Authors:
Stefan Haller,
Adina Aldea,
Christin Seifert,
Nicola Strisciuglio
Abstract:
Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published m…
▽ More
Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
△ Less
Submitted 11 March, 2022;
originally announced April 2022.
-
Observability for Non-autonomous Systems
Authors:
Clemens Bombach,
Fabian Gabel,
Christian Seifert,
Martin Tautenhahn
Abstract:
We study non-autonomous observation systems \begin{align*}
\dot{x}(t) = A(t) x(t),\quad y(t) = C(t) x(t),\quad x(0) = x_0\in X, \end{align*} where $(A(t))$ is a strongly measurable family of closed operators on a Banach space $X$ and $(C(t))$ is a family of bounded observation operators from $X$ to a Banach space $Y$. Based on an abstract uncertainty principle and a dissipation estimate, we prov…
▽ More
We study non-autonomous observation systems \begin{align*}
\dot{x}(t) = A(t) x(t),\quad y(t) = C(t) x(t),\quad x(0) = x_0\in X, \end{align*} where $(A(t))$ is a strongly measurable family of closed operators on a Banach space $X$ and $(C(t))$ is a family of bounded observation operators from $X$ to a Banach space $Y$. Based on an abstract uncertainty principle and a dissipation estimate, we prove that the observation system satisfies a final-state observability estimate in $\mathrm{L}^r(E; Y)$ for measurable subsets $E \subseteq [0,T], T > 0$. We present applications of the above result to families $(A(t))$ of uniformly strongly elliptic differential operators as well as non-autonomous Ornstein-Uhlenbeck operators $P(t)$ on $\mathrm{L}^p(\mathbb{R}^d)$ with observation operators $C(t)u = u|_{Ω(t)}$. In the setting of non-autonomous strongly elliptic operators, we derive necessary and sufficient geometric conditions on the family of sets $(Ω(t))$ such that the corresponding observation system satisfies a final-state observability estimate.
△ Less
Submitted 27 February, 2023; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Final State Observability in Banach spaces with applications to Subordination and Semigroups induced by L{é}vy processes
Authors:
Dennis Gallaun,
Jan Meichsner,
Christian Seifert
Abstract:
This paper generalizes the abstract method of proving an observability estimate by combining an uncertainty principle and a dissipation estimate. In these estimates we allow for a large class of growth/decay rates satisfying an integrability condition. In contrast to previous results, we use an iterative argument which enables us to give an asymptotically sharp estimate for the observation constan…
▽ More
This paper generalizes the abstract method of proving an observability estimate by combining an uncertainty principle and a dissipation estimate. In these estimates we allow for a large class of growth/decay rates satisfying an integrability condition. In contrast to previous results, we use an iterative argument which enables us to give an asymptotically sharp estimate for the observation constant and which is explicit in the model parameters. We give two types of applications where the extension of the growth/decay rates naturally appear. By exploiting subordination techniques we show how the dissipation estimate of a semigroup transfers to subordinated semigroups. Furthermore, we apply our results to semigroups related to L{é}vy processes.
△ Less
Submitted 3 January, 2023; v1 submitted 11 February, 2022;
originally announced February 2022.
-
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI
Authors:
Meike Nauta,
Jan Trienes,
Shreyasi Pathak,
Elisa Nguyen,
Michelle Peters,
Yasmin Schmitt,
Jörg Schlötterer,
Maurice van Keulen,
Christin Seifert
Abstract:
The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness a…
▽ More
The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practices of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. This survey also contributes to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. Our systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. The Co-12 categorization scheme and our identified evaluation methods open up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously.
△ Less
Submitted 24 February, 2023; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Towards a trustworthy, secure and reliable enclave for machine learning in a hospital setting: The Essen Medical Computing Platform (EMCP)
Authors:
Hendrik F. R. Schmidt,
Jörg Schlötterer,
Marcel Bargull,
Enrico Nasca,
Ryan Aydelott,
Christin Seifert,
Folker Meyer
Abstract:
AI/Computing at scale is a difficult problem, especially in a health care setting. We outline the requirements, planning and implementation choices as well as the guiding principles that led to the implementation of our secure research computing enclave, the Essen Medical Computing Platform (EMCP), affiliated with a major German hospital. Compliance, data privacy and usability were the immutable r…
▽ More
AI/Computing at scale is a difficult problem, especially in a health care setting. We outline the requirements, planning and implementation choices as well as the guiding principles that led to the implementation of our secure research computing enclave, the Essen Medical Computing Platform (EMCP), affiliated with a major German hospital. Compliance, data privacy and usability were the immutable requirements of the system. We will discuss the features of our computing enclave and we will provide our recipe for groups wishing to adopt a similar setup.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Living-Off-The-Land Command Detection Using Active Learning
Authors:
Talha Ongun,
Jack W. Stokes,
Jonathan Bar Or,
Ke Tian,
Farid Tajaddodianfar,
Joshua Neil,
Christian Seifert,
Alina Oprea,
John C. Platt
Abstract:
In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distrib…
▽ More
In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called "Living-Off-The-Land". Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 0.96 at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Non-autonomous Desch-Schappacher perturbations
Authors:
Christian Budde,
Christian Seifert
Abstract:
We consider time-dependent Desch-Schappacher perturbations of non-autonomous abstract Cauchy problems and apply our result to non-autonomous uniformly strongly elliptic differential operators on $\mathrm{L}^p$-spaces.
We consider time-dependent Desch-Schappacher perturbations of non-autonomous abstract Cauchy problems and apply our result to non-autonomous uniformly strongly elliptic differential operators on $\mathrm{L}^p$-spaces.
△ Less
Submitted 29 June, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Sufficient criteria for stabilization properties in Banach spaces
Authors:
Michela Egidi,
Dennis Gallaun,
Christian Seifert,
Martin Tautenhahn
Abstract:
We study abstract sufficient criteria for open-loop stabilizability of linear control systems in a Banach space with a bounded control operator, which build up and generalize a sufficient condition for null-controllability in Banach spaces given by an uncertainty principle and a dissipation estimate. For stabilizability these estimates are only needed for a single spectral parameter and, in partic…
▽ More
We study abstract sufficient criteria for open-loop stabilizability of linear control systems in a Banach space with a bounded control operator, which build up and generalize a sufficient condition for null-controllability in Banach spaces given by an uncertainty principle and a dissipation estimate. For stabilizability these estimates are only needed for a single spectral parameter and, in particular, their constants do not depend on the growth rate w.r.t. this parameter. Our result unifies and generalizes earlier results obtained in the context of Hilbert spaces. As an application we consider fractional powers of elliptic differential operators with constant coefficients in $L_p(\mathbb{R}^d)$ for $p\in [1,\infty)$ and thick control sets.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Towards A Process Model for Co-Creating AI Experiences
Authors:
Hariharan Subramonyam,
Colleen Seifert,
Eytan Adar
Abstract:
Thinking of technology as a design material is appealing. It encourages designers to explore the material's properties to understand its capabilities and limitations, a prerequisite to generative design thinking. However, as a material, AI resists this approach because its properties emerge as part of the design process itself. Therefore, designers and AI engineers must collaborate in new ways to…
▽ More
Thinking of technology as a design material is appealing. It encourages designers to explore the material's properties to understand its capabilities and limitations, a prerequisite to generative design thinking. However, as a material, AI resists this approach because its properties emerge as part of the design process itself. Therefore, designers and AI engineers must collaborate in new ways to create both the material and its application experience. We investigate the co-creation process through a design study with 10 pairs of designers and engineers. We find that design 'probes' with user data are a useful tool in defining AI materials. Through data probes, designers construct designerly representations of the envisioned AI experience (AIX) to identify desirable AI characteristics. Data probes facilitate divergent thinking, material testing, and design validation. Based on our findings, we propose a process model for co-creating AIX and offer design considerations for incorporating data probes in design tools.
△ Less
Submitted 6 May, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Neural Prototype Trees for Interpretable Fine-grained Image Recognition
Authors:
Meike Nauta,
Ron van Bree,
Christin Seifert
Abstract:
Prototype-based methods use interpretable representations to address the black-box nature of deep learning models, in contrast to post-hoc explanation methods that only approximate such models. We propose the Neural Prototype Tree (ProtoTree), an intrinsically interpretable deep learning method for fine-grained image recognition. ProtoTree combines prototype learning with decision trees, and thus…
▽ More
Prototype-based methods use interpretable representations to address the black-box nature of deep learning models, in contrast to post-hoc explanation methods that only approximate such models. We propose the Neural Prototype Tree (ProtoTree), an intrinsically interpretable deep learning method for fine-grained image recognition. ProtoTree combines prototype learning with decision trees, and thus results in a globally interpretable model by design. Additionally, ProtoTree can locally explain a single prediction by outlining a decision path through the tree. Each node in our binary tree contains a trainable prototypical part. The presence or absence of this learned prototype in an image determines the routing through a node. Decision making is therefore similar to human reasoning: Does the bird have a red throat? And an elongated beak? Then it's a hummingbird! We tune the accuracy-interpretability trade-off using ensemble methods, pruning and binarizing. We apply pruning without sacrificing accuracy, resulting in a small tree with only 8 learned prototypes along a path to classify a bird from 200 species. An ensemble of 5 ProtoTrees achieves competitive accuracy on the CUB-200- 2011 and Stanford Cars data sets. Code is available at https://github.com/M-Nauta/ProtoTree
△ Less
Submitted 15 April, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition
Authors:
Meike Nauta,
Annemarie Jutte,
Jesper Provoost,
Christin Seifert
Abstract:
Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learned by the classification model. Hence, only visualising prototypes can be insufficient for a user to understand what a…
▽ More
Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learned by the classification model. Hence, only visualising prototypes can be insufficient for a user to understand what a prototype exactly represents, and why the model considers a prototype and an image to be similar. We address this ambiguity and argue that prototypes should be explained. We improve interpretability by automatically enhancing visual prototypes with textual quantitative information about visual characteristics deemed important by the classification model. Specifically, our method clarifies the meaning of a prototype by quantifying the influence of colour hue, shape, texture, contrast and saturation and can generate both global and local explanations. Because of the generality of our approach, it can improve the interpretability of any similarity-based method for prototypical image recognition. In our experiments, we apply our method to the existing Prototypical Part Network (ProtoPNet). Our analysis confirms that the global explanations are generalisable, and often correspond to the visually perceptible properties of a prototype. Our explanations are especially relevant for prototypes which might have been interpreted incorrectly otherwise. By explaining such 'misleading' prototypes, we improve the interpretability and simulatability of a prototype-based classification model. We also use our method to check whether visually similar prototypes have similar explanations, and are able to discover redundancy. Code is available at https://github.com/M-Nauta/Explaining_Prototypes .
△ Less
Submitted 31 March, 2021; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Improvement of Brain MRI at 7T Using an Inductively Coupled RF Resonator Array
Authors:
Akbar Alipour,
Alan C Seifert,
Bradley Delman,
Gregor Adriany,
Priti Balchandani
Abstract:
It is well known that magnetic resonance imaging (MRI) at 7 Tesla (7T) and higher magnets can provide much better signal sensitivity compared with lower field strengths. However, variety of commercially available ultra-high-field MRI coils are still limited, due to the technical challenges associated with wavelength effect, such as flip angle inhomogeneity and asymmetric transmit and receive RF fi…
▽ More
It is well known that magnetic resonance imaging (MRI) at 7 Tesla (7T) and higher magnets can provide much better signal sensitivity compared with lower field strengths. However, variety of commercially available ultra-high-field MRI coils are still limited, due to the technical challenges associated with wavelength effect, such as flip angle inhomogeneity and asymmetric transmit and receive RF field patterns. We aimed to develop a passive RF shimming technique using an inductively coupled RF resonator array to improve brain MRI at 7T, focusing of cerebellum. Method: an inductively coupled RF resonator array was designed for placing inside the commercial head coil to enhance the transmit field homogeneity and to improve the receive signal sensitivity. Each element of the array is a coupled-split-ring resonator (CSRR), which they are decoupled for each other using critical overlap technique. Phantom and ex-vivo MRI experiments were performed to evaluate the transmit efficiency and signal sensitivity in the presence of the array. Results: MRI experiments showed 2 to 4-fold improvement in Signal-to-noise ratio (SNR) and 2-fold improvement in contrast-to-noise ratio (CNR) in cerebellum. Conclusion: We modeled a wireless RF resonator array that can improve the transmit efficiency of the standard head coil and enhance the signal sensitivity at brain MRI without compromising RF safety.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Observability and null-controllability for parabolic equations in $L_p$-spaces
Authors:
Clemens Bombach,
Dennis Gallaun,
Christian Seifert,
Martin Tautenhahn
Abstract:
We study (approximate) null-controllability of parabolic equations in $L_p(\mathbb{R}^d)$ and provide explicit bounds on the control cost. In particular we consider systems of the form $\dot{x}(t) = -A_p x(t) + \mathbf{1}_E u(t)$, $x(0) = x_0\in L_p (\mathbb{R}^d)$, with interior control on a so-called thick set $E \subset \mathbb{R}^d$, where $p\in [1,\infty)$, and where $A$ is an elliptic operat…
▽ More
We study (approximate) null-controllability of parabolic equations in $L_p(\mathbb{R}^d)$ and provide explicit bounds on the control cost. In particular we consider systems of the form $\dot{x}(t) = -A_p x(t) + \mathbf{1}_E u(t)$, $x(0) = x_0\in L_p (\mathbb{R}^d)$, with interior control on a so-called thick set $E \subset \mathbb{R}^d$, where $p\in [1,\infty)$, and where $A$ is an elliptic operator of order $m \in \mathbb{N}$ in $L_p(\mathbb{R}^d)$. We prove null-controllability of this system via duality and a sufficient condition for observability. This condition is given by an uncertainty principle and a dissipation estimate. Our result unifies and generalizes earlier results obtained in the context of Hilbert and Banach spaces. In particular, our result applies to the case $p=1$.
△ Less
Submitted 28 October, 2022; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports
Authors:
Valentine Legoy,
Marco Caselli,
Christin Seifert,
Andreas Peter
Abstract:
Over the last years, threat intelligence sharing has steadily grown, leading cybersecurity professionals to access increasingly larger amounts of heterogeneous data. Among those, cyber attacks' Tactics, Techniques and Procedures (TTPs) have proven to be particularly valuable to characterize threat actors' behaviors and, thus, improve defensive countermeasures. Unfortunately, this information is of…
▽ More
Over the last years, threat intelligence sharing has steadily grown, leading cybersecurity professionals to access increasingly larger amounts of heterogeneous data. Among those, cyber attacks' Tactics, Techniques and Procedures (TTPs) have proven to be particularly valuable to characterize threat actors' behaviors and, thus, improve defensive countermeasures. Unfortunately, this information is often hidden within human-readable textual reports and must be extracted manually. In this paper, we evaluate several classification approaches to automatically retrieve TTPs from unstructured text. To implement these approaches, we take advantage of the MITRE ATT&CK framework, an open knowledge base of adversarial tactics and techniques, to train classifiers and label results. Finally, we present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Evolutionary Equations
Authors:
Christian Seifert,
Sascha Trostorff,
Marcus Waurick
Abstract:
This is the final version of the lecture notes of the 23rd Internet Seminar on Evolutionary Equations, see also https://www.mat.tuhh.de/isem23/.
This is the final version of the lecture notes of the 23rd Internet Seminar on Evolutionary Equations, see also https://www.mat.tuhh.de/isem23/.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records
Authors:
Jan Trienes,
Dolf Trieschnigg,
Christin Seifert,
Djoerd Hiemstra
Abstract:
Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English…
▽ More
Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.
△ Less
Submitted 16 January, 2020;
originally announced January 2020.
-
Limit operators techniques on general metric measure spaces of bounded geometry
Authors:
Raffael Hagger,
Christian Seifert
Abstract:
We study band-dominated operators on (subspaces of) $L_p$-spaces over metric measure spaces of bounded geometry satisfying an additional property. We single out core assumptions to obtain, in an abstract setting, definitions of limit operators, characterizations of compactness and Fredholmness using limit operators; and thus also spectral consequences. In this way, we recover and unify the classic…
▽ More
We study band-dominated operators on (subspaces of) $L_p$-spaces over metric measure spaces of bounded geometry satisfying an additional property. We single out core assumptions to obtain, in an abstract setting, definitions of limit operators, characterizations of compactness and Fredholmness using limit operators; and thus also spectral consequences. In this way, we recover and unify the classical and recent results on limit operator techniques, but also gain new insights and are able to treat further applications.
△ Less
Submitted 22 April, 2020; v1 submitted 6 August, 2019;
originally announced August 2019.
-
How model accuracy and explanation fidelity influence user trust
Authors:
Andrea Papenmeier,
Gwenn Englebienne,
Christin Seifert
Abstract:
Machine learning systems have become popular in fields such as marketing, financing, or data mining. While they are highly accurate, complex machine learning systems pose challenges for engineers and users. Their inherent complexity makes it impossible to easily judge their fairness and the correctness of statistically learned relations between variables and classes. Explainable AI aims to solve t…
▽ More
Machine learning systems have become popular in fields such as marketing, financing, or data mining. While they are highly accurate, complex machine learning systems pose challenges for engineers and users. Their inherent complexity makes it impossible to easily judge their fairness and the correctness of statistically learned relations between variables and classes. Explainable AI aims to solve this challenge by modelling explanations alongside with the classifiers, potentially improving user trust and acceptance. However, users should not be fooled by persuasive, yet untruthful explanations. We therefore conduct a user study in which we investigate the effects of model accuracy and explanation fidelity, i.e. how truthfully the explanation represents the underlying model, on user trust. Our findings show that accuracy is more important for user trust than explainability. Adding an explanation for a classification result can potentially harm trust, e.g. when adding nonsensical explanations. We also found that users cannot be tricked by high-fidelity explanations into having trust for a bad classifier. Furthermore, we found a mismatch between observed (implicit) and self-reported (explicit) trust.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Sufficient criteria and sharp geometric conditions for observability in Banach spaces
Authors:
Dennis Gallaun,
Christian Seifert,
Martin Tautenhahn
Abstract:
Let $X,Y$ be Banach spaces, $(S_t)_{t \geq 0}$ a $C_0$-semigroup on $X$, $-A$ the corresponding infinitesimal generator on $X$, $C$ a bounded linear operator from $X$ to $Y$, and $T > 0$. We consider the system \[ \dot{x}(t) = -Ax(t), \quad y(t) = Cx(t) \quad t\in (0,T], \quad x(0) = x_0 \in X. \] We provide sufficient conditions such that this system satisfies a final state observability estimate…
▽ More
Let $X,Y$ be Banach spaces, $(S_t)_{t \geq 0}$ a $C_0$-semigroup on $X$, $-A$ the corresponding infinitesimal generator on $X$, $C$ a bounded linear operator from $X$ to $Y$, and $T > 0$. We consider the system \[ \dot{x}(t) = -Ax(t), \quad y(t) = Cx(t) \quad t\in (0,T], \quad x(0) = x_0 \in X. \] We provide sufficient conditions such that this system satisfies a final state observability estimate in $L_r ((0,T) ; Y)$, $r \in [1,\infty]$. These sufficient conditions are given by an uncertainty relation and a dissipation estimate. Our approach unifies and generalizes the respective advantages from earlier results obtained in the context of Hilbert spaces. As an application we consider the example where $A$ is an elliptic operator in $L_p(\mathbb{R}^d)$ for $1<p<\infty$, and where $C = \mathbf{1}_ω$ is the restriction onto a thick set $ω\subset \mathbb{R}^d$. In this case, we show that the above system satisfies a final state observability estimate if and only if $ω\subset \mathbb{R}^d$ is a thick set. Finally, we make use of the well-known relation between observability and null-controllability of the predual system, and investigate bounds on the corresponding control costs.
△ Less
Submitted 16 November, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
On the harmonic extension approach to fractional powers in Banach spaces
Authors:
Jan Meichsner,
Christian Seifert
Abstract:
We show that fractional powers of general sectorial operators on Banach spaces can be obtained by the harmonic extension approach. Moreover, for the corresponding second order ordinary differential equation with incomplete data describing the harmonic extension we prove existence and uniqueness of a bounded solution (i.e. of the harmonic extension).
We show that fractional powers of general sectorial operators on Banach spaces can be obtained by the harmonic extension approach. Moreover, for the corresponding second order ordinary differential equation with incomplete data describing the harmonic extension we prove existence and uniqueness of a bounded solution (i.e. of the harmonic extension).
△ Less
Submitted 4 September, 2020; v1 submitted 16 May, 2019;
originally announced May 2019.