Search | arXiv e-print repository

Overcoming Common Flaws in the Evaluation of Selective Classification Systems

Authors: Jeremias Traub, Till J. Bungert, Carsten T. Lüth, Michael Baumgartner, Klaus H. Maier-Hein, Lena Maier-Hein, Paul F Jaeger

Abstract: Selective Classification, wherein models can reject low-confidence predictions, promises reliable translation of machine-learning based classification systems to real-world scenarios such as clinical diagnostics. While current evaluation of these systems typically assumes fixed working points based on pre-defined rejection thresholds, methodological progress requires benchmarking the general perfo… ▽ More Selective Classification, wherein models can reject low-confidence predictions, promises reliable translation of machine-learning based classification systems to real-world scenarios such as clinical diagnostics. While current evaluation of these systems typically assumes fixed working points based on pre-defined rejection thresholds, methodological progress requires benchmarking the general performance of systems akin to the $\mathrm{AUROC}$ in standard classification. In this work, we define 5 requirements for multi-threshold metrics in selective classification regarding task alignment, interpretability, and flexibility, and show how current approaches fail to meet them. We propose the Area under the Generalized Risk Coverage curve ($\mathrm{AUGRC}$), which meets all requirements and can be directly interpreted as the average risk of undetected failures. We empirically demonstrate the relevance of $\mathrm{AUGRC}$ on a comprehensive benchmark spanning 6 data sets and 13 confidence scoring functions. We find that the proposed metric substantially changes metric rankings on 5 out of the 6 data sets. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.03323 [pdf, other]

Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation

Authors: Maximilian Zenk, David Zimmerer, Fabian Isensee, Jeremias Traub, Tobias Norajitra, Paul F. Jäger, Klaus Maier-Hein

Abstract: Semantic segmentation is an essential component of medical image analysis research, with recent deep learning algorithms offering out-of-the-box applicability across diverse datasets. Despite these advancements, segmentation failures remain a significant concern for real-world clinical applications, necessitating reliable detection mechanisms. This paper introduces a comprehensive benchmarking fra… ▽ More Semantic segmentation is an essential component of medical image analysis research, with recent deep learning algorithms offering out-of-the-box applicability across diverse datasets. Despite these advancements, segmentation failures remain a significant concern for real-world clinical applications, necessitating reliable detection mechanisms. This paper introduces a comprehensive benchmarking framework aimed at evaluating failure detection methodologies within medical image segmentation. Through our analysis, we identify the strengths and limitations of current failure detection metrics, advocating for the risk-coverage analysis as a holistic evaluation approach. Utilizing a collective dataset comprising five public 3D medical image collections, we assess the efficacy of various failure detection strategies under realistic test-time distribution shifts. Our findings highlight the importance of pixel confidence aggregation and we observe superior performance of the pairwise Dice score (Roy et al., 2019) between ensemble predictions, positioning it as a simple and robust baseline for failure detection in medical image segmentation. To promote ongoing research, we make the benchmarking framework available to the community. △ Less

Submitted 24 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.11058 [pdf, other]

Representation Learning with Diffusion Models

Authors: Jeremias Traub

Abstract: Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation. Applied in the latent space of a powerful pretrained autoencoder (LDM), their immense computational requirements can be significantly reduced without sacrificing sampling quality. However, DMs and LDMs lack a semantically meaningful representation space as the diffusion process gr… ▽ More Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation. Applied in the latent space of a powerful pretrained autoencoder (LDM), their immense computational requirements can be significantly reduced without sacrificing sampling quality. However, DMs and LDMs lack a semantically meaningful representation space as the diffusion process gradually destroys information in the latent variables. We introduce a framework for learning such representations with diffusion models (LRDM). To that end, a LDM is conditioned on the representation extracted from the clean image by a separate encoder. In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process. By introducing a tractable representation prior, we can efficiently sample from the representation distribution for unconditional image synthesis without training of any additional model. We demonstrate that i) competitive image generation results can be achieved with image-parameterized LDMs, ii) LRDMs are capable of learning semantically meaningful representations, allowing for faithful image reconstructions and semantic interpolations. Our implementation is available at https://github.com/jeremiastraub/diffusion. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:1912.04648 [pdf, other]

SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence

Authors: Jonas Traub, Julius Hülsmann, Sebastian Breß, Tilmann Rabl, Volker Markl

Abstract: Data analysis in the Internet of Things (IoT) requires us to combine event streams from a huge amount of sensors. This combination (join) of events is usually based on the time stamps associated with the events. We address two challenges in environments which acquire and join events in the IoT: First, due to the growing number of sensors, we are facing the performance limits of central joins with… ▽ More Data analysis in the Internet of Things (IoT) requires us to combine event streams from a huge amount of sensors. This combination (join) of events is usually based on the time stamps associated with the events. We address two challenges in environments which acquire and join events in the IoT: First, due to the growing number of sensors, we are facing the performance limits of central joins with respect to throughput, latency, and network utilization. Second, in the IoT, diverse sensor nodes are operated by different organizations and use different time synchronization techniques. Thus, events with the same timestamps are not necessarily recorded at the exact same time and joined data tuples have an unknown time incoherence. This can cause undetected failures, such as false correlations and wrong predictions. We present SENSE, a system for scalable data acquisition from distributed sensors. SENSE introduces time coherence measures as a fundamental data characteristic in addition to common time synchronization techniques. The time coherence of a data tuple is the time span in which all values contained in the tuple have been read from sensors. We explore concepts and algorithms to quantify and optimize time coherence and show that SENSE scales to thousands of sensors, operates efficiently under latency and coherence constraints, and adapts to changing network conditions. △ Less

Submitted 10 December, 2019; originally announced December 2019.

arXiv:1910.07867 [pdf, other]

The NebulaStream Platform: Data and Application Management for the Internet of Things

Authors: Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

Abstract: The Internet of Things (IoT) presents a novel computing architecture for data management: a distributed, highly dynamic, and heterogeneous environment of massive scale. Applications for the IoT introduce new challenges for integrating the concepts of fog and cloud computing as well as sensor networks in one unified environment. In this paper, we highlight these major challenges and outline how exi… ▽ More The Internet of Things (IoT) presents a novel computing architecture for data management: a distributed, highly dynamic, and heterogeneous environment of massive scale. Applications for the IoT introduce new challenges for integrating the concepts of fog and cloud computing as well as sensor networks in one unified environment. In this paper, we highlight these major challenges and outline how existing systems handle them. To address these challenges, we introduce the NebulaStream platform, a general purpose, endto-end data management system for the IoT. NebulaStream addresses the heterogeneity and distribution of compute and data, supports diverse data and programming models going beyond relational algebra, deals with potentially unreliable communication, and enables constant evolution under continuous operation. In our evaluation, we demonstrate the effectiveness of our approach by providing early results on partial aspects. △ Less

Submitted 4 March, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

arXiv:1909.03026 [pdf, other]

Agora: A Unified Asset Ecosystem Going Beyond Marketplaces and Cloud Services

Authors: Jonas Traub, Jorge-Arnulfo Quiané-Ruiz, Zoi Kaoudi, Volker Markl

Abstract: Data, algorithms, and compute/storage infrastructure are key assets that drive data science and artificial intelligence applications. As providing all these assets requires a huge investment, data science and artificial intelligence technologies are currently dominated by a small number of providers who can afford these investments. This leads to lock-in effects and hinders features that require a… ▽ More Data, algorithms, and compute/storage infrastructure are key assets that drive data science and artificial intelligence applications. As providing all these assets requires a huge investment, data science and artificial intelligence technologies are currently dominated by a small number of providers who can afford these investments. This leads to lock-in effects and hinders features that require a flexible exchange of assets among users. In this vision paper, we present Agora, a unified asset ecosystem. The Agora system provides the technical infrastructure that allows for offering and using data and algorithms, as well as physical infrastructure components. Agora is designed as an open ecosystem of asset marketplaces and provides to a broad audience not only data but the entire data value chain (including computational resources and human expertise). Agora (i) leverages a fine-grained exchange of assets, (ii) allows for combining assets to novel applications, and (iii) flexibly executes such applications on available resources. As a result, Agora overcomes lock-in effects and removes entry barriers for new asset providers. In contrast to existing data management systems, Agora operates in a heavily decentralized and dynamic environment: Data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. Agora presents novel research directions for the data management community as a whole: It requires to combine our traditional expertise in scalable data processing and management with infrastructure provisioning as well as economic and application aspects of data, algorithms, and infrastructure. △ Less

Submitted 19 July, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

Showing 1–6 of 6 results for author: Traub, J