-
Statistical Multicriteria Benchmarking via the GSD-Front
Authors:
Christoph Jansen,
Georg Schollmeyer,
Julian Rodemann,
Hannah Blocher,
Thomas Augustin
Abstract:
Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmar…
▽ More
Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of a set of state-of-the-art classifiers. For (3), we relax our proposed test using techniques from robust statistics and imprecise probabilities. We illustrate our concepts on the benchmark suite PMLB and on the platform OpenML.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Semi-Supervised Learning guided by the Generalized Bayes Rule under Soft Revision
Authors:
Stefan Dietrich,
Julian Rodemann,
Christoph Jansen
Abstract:
We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-M…
▽ More
We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-Maximin method with soft revision. We eventually select pseudo-labeled data that are most likely in light of the least favorable distribution from the so updated credal set. We formalize the task of finding optimal pseudo-labeled data w.r.t. the Gamma-Maximin method with soft revision as an optimization problem. A concrete implementation for the class of logistic models then allows us to compare the predictive power of the method with competing approaches. It is observed that the Gamma-Maximin method with soft revision can achieve very promising results, especially when the proportion of labeled data is low.
△ Less
Submitted 4 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Deep learning-based auto-segmentation of paraganglioma for growth monitoring
Authors:
E. M. C. Sijben,
J. C. Jansen,
M. de Ridder,
P. A. N. Bosman,
T. Alderliesten
Abstract:
Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variat…
▽ More
Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variation. Growth modeling could play a significant role in solving a decades-old dilemma (stemming from uncertainty regarding how the tumor will develop over time). By giving paraganglioma patients treatment, severe symptoms can be prevented. However, treating patients who do not actually need it, comes at the cost of unnecessary possible side effects and complications. Improved measurement techniques could enable growth model studies with a large amount of tumor volume data, possibly giving valuable insights into how these tumors develop over time. Therefore, we propose an automated tumor volume measurement method based on a deep learning segmentation model using no-new-UNnet (nnUNet). We assess the performance of the model based on visual inspection by a senior otorhinolaryngologist and several quantitative metrics by comparing model outputs with manual delineations, including a comparison with variation in manual delineation by multiple observers. Our findings indicate that the automatic method performs (at least) equal to manual delineation. Finally, using the created model, and a linking procedure that we propose to track the tumor over time, we show how additional volume measurements affect the fit of known growth functions.
△ Less
Submitted 19 March, 2024;
originally announced April 2024.
-
Function Class Learning with Genetic Programming: Towards Explainable Meta Learning for Tumor Growth Functionals
Authors:
E. M. C. Sijben,
J. C. Jansen,
P. A. N. Bosman,
T. Alderliesten
Abstract:
Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able…
▽ More
Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able to predict the growth accurately could assist in determining whether a patient will need treatment during their lifetime and, if so, the timing of this treatment. The aim of this work is to learn the general underlying growth pattern of paragangliomas from multiple tumor growth data sets, in which each data set contains a tumor's volume over time. To do so, we propose a novel approach based on genetic programming to learn a function class, i.e., a parameterized function that can be fit anew for each tumor. We do so in a unique, multi-modal, multi-objective fashion to find multiple potentially interesting function classes in a single run. We evaluate our approach on a synthetic and a real-world data set. By analyzing the resulting function classes, we can effectively explain the general patterns in the data.
△ Less
Submitted 9 April, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA Initiative
Authors:
Norman Zerbe,
Lars Ole Schwen,
Christian Geißler,
Katja Wiesemann,
Tom Bisson,
Peter Boor,
Rita Carvalho,
Michael Franz,
Christoph Jansen,
Tim-Rasmus Kiehl,
Björn Lindequist,
Nora Charlotte Pohlan,
Sarah Schmell,
Klaus Strohmenger,
Falk Zakrzewski,
Markus Plass,
Michael Takla,
Tobias Küster,
André Homeyer,
Peter Hufnagl
Abstract:
Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses…
▽ More
Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses these challenges. Here, we provide an overview of EMPAIA's achievements and lessons learned. EMPAIA integrates various stakeholders of the pathology AI ecosystem, i.e., pathologists, computer scientists, and industry. In close collaboration, we developed technical interoperability standards, recommendations for AI testing and product development, and explainability methods. We implemented the modular and open-source EMPAIA platform and successfully integrated 14 AI-based image analysis apps from 8 different vendors, demonstrating how different apps can use a single standardized interface. We prioritized requirements and evaluated the use of AI in real clinical settings with 14 different pathology laboratories in Europe and Asia. In addition to technical developments, we created a forum for all stakeholders to share information and experiences on digital pathology and AI. Commercial, clinical, and academic stakeholders can now adopt EMPAIA's common open-source interfaces, providing a unique opportunity for large-scale standardization and streamlining of processes. Further efforts are needed to effectively and broadly establish AI assistance in routine laboratory use. To this end, a sustainable infrastructure, the non-profit association EMPAIA International, has been established to continue standardization and support broad implementation and advocacy for an AI-assisted digital pathology future.
△ Less
Submitted 16 April, 2024; v1 submitted 22 December, 2023;
originally announced January 2024.
-
Comparing Machine Learning Algorithms by Union-Free Generic Depth
Authors:
Hannah Blocher,
Georg Schollmeyer,
Malte Nalenz,
Christoph Jansen
Abstract:
We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) de…
▽ More
We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we provide two examples of classifier comparisons on samples of standard benchmark data sets. Our results demonstrate promisingly the wide variety of different analysis approaches based on ufg methods. Furthermore, the examples outline that our approach differs substantially from existing benchmarking approaches, and thus adds a new perspective to the vivid debate on classifier comparison.
△ Less
Submitted 21 February, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement
Authors:
Christoph Jansen,
Georg Schollmeyer,
Hannah Blocher,
Julian Rodemann,
Thomas Augustin
Abstract:
Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mappin…
▽ More
Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables map** into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.
△ Less
Submitted 4 March, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms
Authors:
Hannah Blocher,
Georg Schollmeyer,
Christoph Jansen,
Malte Nalenz
Abstract:
We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-fr…
▽ More
We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.
△ Less
Submitted 9 February, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning
Authors:
Julian Rodemann,
Christoph Jansen,
Georg Schollmeyer,
Thomas Augustin
Abstract:
Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we aim at rendering PLS more robust towards the involved modeling assumptions. To this end, we propose to select pseudo-label…
▽ More
Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we aim at rendering PLS more robust towards the involved modeling assumptions. To this end, we propose to select pseudo-labeled data that maximize a multi-objective utility function. The latter is constructed to account for different sources of uncertainty, three of which we discuss in more detail: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian alpha-cut updating rule for credal sets. As a practical proof of concept, we spotlight the application of three of our robust extensions on simulated and real-world data. Results suggest that in particular robustness w.r.t. model choice can lead to substantial accuracy gains.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Multi-Target Decision Making under Conditions of Severe Uncertainty
Authors:
Christoph Jansen,
Georg Schollmeyer,
Thomas Augustin
Abstract:
The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent develo…
▽ More
The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent developments from abstract decision theory with incomplete preferential and probabilistic information to this multi-target setting and show how -- by exploiting the (potentially) partial cardinal and partial probabilistic information -- more informative orders for comparing decisions can be given than the Pareto order. We discuss some interesting properties of the proposed orders between decision options and show how they can be concretely computed by linear optimization. We conclude the paper by demonstrating our framework in an artificial (but quite real-world) example in the context of comparing algorithms under different performance measures.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Anonymization of Whole Slide Images in Histopathology for Research and Education
Authors:
Tom Bisson,
Michael Franz,
Isil Dogan O,
Daniel Romberg,
Christoph Jansen,
Peter Hufnagl,
Norman Zerbe
Abstract:
Objective: The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples…
▽ More
Objective: The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples inevitably generates identifying data that can consist of sensitive but also acquisition-related information stored in vendor-specific file formats. Distribution and off-clinical use of these Whole Slide Images (WSI) is usually done in these formats, as an industry-wide standardization such as DICOM is yet only tentatively adopted and slide scanner vendors currently do not provide anonymization functionality.
Methods: We developed a guideline for the proper handling of histopathological image data particularly for research and education with regard to the GDPR. In this context, we evaluated existing anonymization methods and examined proprietary format specifications to identify all sensitive information for the most common WSI formats. This work results in a software library that enables GDPR-compliant anonymization of WSIs while preserving the native formats.
Results: Based on the analysis of proprietary formats, all occurrences of sensitive information were identified for file formats frequently used in clinical routine, and finally, an open-source programming library with an executable CLI-tool and wrappers for different programming languages was developed.
Conclusions: Our analysis showed that there is no straightforward software solution to anonymize WSIs in a GDPR-compliant way while maintaining the data format. We closed this gap with our extensible open-source library that works instantaneously and offline.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Statistical Comparisons of Classifiers by Generalized Stochastic Dominance
Authors:
Christoph Jansen,
Malte Nalenz,
Georg Schollmeyer,
Thomas Augustin
Abstract:
Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data…
▽ More
Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data sets. In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory. Based on so-called preference systems, our framework ranks classifiers by a generalized concept of stochastic dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates. Moreover, we show that generalized stochastic dominance can be operationalized by solving easy-to-handle linear programs and moreover statistically tested employing an adapted two-sample observation-randomization test. This yields indeed a powerful framework for the statistical comparison of classifiers over multiple data sets with respect to multiple quality criteria simultaneously. We illustrate and investigate our framework in a simulation study and with a set of standard benchmark data sets.
△ Less
Submitted 5 July, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Information efficient learning of complexly structured preferences: Elicitation procedures and their application to decision making under uncertainty
Authors:
Christoph Jansen,
Hannah Blocher,
Thomas Augustin,
Georg Schollmeyer
Abstract:
In this paper we propose efficient methods for elicitation of complexly structured preferences and utilize these in problems of decision making under (severe) uncertainty. Based on the general framework introduced in Jansen, Schollmeyer and Augustin (2018, Int. J. Approx. Reason), we now design elicitation procedures and algorithms that enable decision makers to reveal their underlying preference…
▽ More
In this paper we propose efficient methods for elicitation of complexly structured preferences and utilize these in problems of decision making under (severe) uncertainty. Based on the general framework introduced in Jansen, Schollmeyer and Augustin (2018, Int. J. Approx. Reason), we now design elicitation procedures and algorithms that enable decision makers to reveal their underlying preference system (i.e. two relations, one encoding the ordinal, the other the cardinal part of the preferences) while having to answer as few as possible simple ranking questions. Here, two different approaches are followed. The first approach directly utilizes the collected ranking data for obtaining the ordinal part of the preferences, while their cardinal part is constructed implicitly by measuring meta data on the decision maker's consideration times. In contrast, the second approach explicitly elicits also the cardinal part of the decision maker's preference system, however, only an approximate version of it. This approximation is obtained by additionally collecting labels of preference strength during the elicitation procedure. For both approaches, we give conditions under which they produce the decision maker's true preference system and investigate how their efficiency can be improved. For the latter purpose, besides data-free approaches, we also discuss ways for effectively guiding the elicitation procedure if data from previous elicitation rounds is available. Finally, we demonstrate how the proposed elicitation methods can be utilized in problems of decision under (severe) uncertainty. Precisely, we show that under certain conditions optimal decisions can be found without fully specifying the preference system.
△ Less
Submitted 1 February, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
A global model for predicting the arrival of imported dengue infections
Authors:
Jessica Liebig,
Cassie Jansen,
Dean Paini,
Lauren Gardner,
Raja Jurdak
Abstract:
With approximately half of the world's population at risk of contracting dengue, this mosquito-borne disease is of global concern. International travellers significantly contribute to dengue's rapid and large-scale spread by importing the disease from endemic into non-endemic countries. To prevent future outbreaks and dengue from establishing in non-endemic countries, knowledge about the arrival t…
▽ More
With approximately half of the world's population at risk of contracting dengue, this mosquito-borne disease is of global concern. International travellers significantly contribute to dengue's rapid and large-scale spread by importing the disease from endemic into non-endemic countries. To prevent future outbreaks and dengue from establishing in non-endemic countries, knowledge about the arrival time and location of infected travellers is crucial. We propose a network model that predicts the monthly number of dengue-infected air passengers arriving at any given airport. We consider international air travel volumes to construct weighted networks, representing passenger flows between airports. We further calculate the probability of passengers, who travel through the international air transport network, being infected with dengue. The probability of being infected depends on the destination, duration and timing of travel. Our findings shed light onto dengue importation routes and reveal country-specific reporting rates that have been until now largely unknown. This paper provides important new knowledge about the spreading dynamics of dengue that is highly beneficial for public health authorities to strategically allocate the often limited resources to more efficiently prevent the spread of dengue.
△ Less
Submitted 10 November, 2019; v1 submitted 31 August, 2018;
originally announced August 2018.
-
Graph-Based Shape Analysis Beyond Context-Freeness
Authors:
Hannah Arndt,
Christina Jansen,
Christoph Matheja,
Thomas Noll
Abstract:
We develop a shape analysis for reasoning about relational properties of data structures. Both the concrete and the abstract domain are represented by hypergraphs. The analysis is parameterized by user-supplied indexed graph grammars to guide concretization and abstraction. This novel extension of context-free graph grammars is powerful enough to model complex data structures such as balanced bina…
▽ More
We develop a shape analysis for reasoning about relational properties of data structures. Both the concrete and the abstract domain are represented by hypergraphs. The analysis is parameterized by user-supplied indexed graph grammars to guide concretization and abstraction. This novel extension of context-free graph grammars is powerful enough to model complex data structures such as balanced binary trees with parent pointers, while preserving most desirable properties of context-free graph grammars. One strength of our analysis is that no artifacts apart from grammars are required from the user; it thus offers a high degree of automation. We implemented our analysis and successfully applied it to various programs manipulating AVL trees, (doubly-linked) lists, and combinations of both.
△ Less
Submitted 19 April, 2018; v1 submitted 10 May, 2017;
originally announced May 2017.
-
Unified Reasoning about Robustness Properties of Symbolic-Heap Separation Logic
Authors:
Christina Jansen,
Jens Katelaan,
Christoph Matheja,
Thomas Noll,
Florian Zuleger
Abstract:
We introduce heap automata, a formalism for automatic reasoning about robustness properties of the symbolic heap fragment of separation logic with user-defined inductive predicates. Robustness properties, such as satisfiability, reachability, and acyclicity, are important for a wide range of reasoning tasks in automated program analysis and verification based on separation logic. Previously, such…
▽ More
We introduce heap automata, a formalism for automatic reasoning about robustness properties of the symbolic heap fragment of separation logic with user-defined inductive predicates. Robustness properties, such as satisfiability, reachability, and acyclicity, are important for a wide range of reasoning tasks in automated program analysis and verification based on separation logic. Previously, such properties have appeared in many places in the separation logic literature, but have not been studied in a systematic manner. In this paper, we develop an algorithmic framework based on heap automata that allows us to derive asymptotically optimal decision procedures for a wide range of robustness properties in a uniform way. We implemented a protoype of our framework and obtained promising results for all of the aforementioned robustness properties. Further, we demonstrate the applicability of heap automata beyond robustness properties. We apply our algorithmic framework to the model checking and the entailment problem for symbolic-heap separation logic.
△ Less
Submitted 22 October, 2016;
originally announced October 2016.