-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
An Industrial Experience Report about Challenges from Continuous Monitoring, Improvement, and Deployment for Autonomous Driving Features
Authors:
Ali Nouri,
Christian Berger,
Fredrik Torner
Abstract:
Using continuous development, deployment, and monitoring (CDDM) to understand and improve applications in a customer's context is widely used for non-safety applications such as smartphone apps or web applications to enable rapid and innovative feature improvements. Having demonstrated its potential in such domains, it may have the potential to also improve the software development for automotive…
▽ More
Using continuous development, deployment, and monitoring (CDDM) to understand and improve applications in a customer's context is widely used for non-safety applications such as smartphone apps or web applications to enable rapid and innovative feature improvements. Having demonstrated its potential in such domains, it may have the potential to also improve the software development for automotive functions as some OEMs described on a high level in their financial company communiqus. However, the application of a CDDM strategy also faces challenges from a process adherence and documentation perspective as required by safety-related products such as autonomous driving systems (ADS) and guided by industry standards such as ISO-26262 and ISO21448. There are publications on CDDM in safety-relevant contexts that focus on safety-critical functions on a rather generic level and thus, not specifically ADS or automotive, or that are concentrating only on software and hence, missing out the particular context of an automotive OEM: Well-established legacy processes and the need of their adaptations, and aspects originating from the role of being a system integrator for software/software, hardware/hardware, and hardware/software. In this paper, particular challenges from the automotive domain to better adopt CDDM are identified and discussed to shed light on research gaps to enhance CDDM, especially for the software development of safe ADS. The challenges are identified from today's industrial well-established ways of working by conducting interviews with domain experts and complemented by a literature study.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient
Authors:
Florian Kofler,
Ivan Ezhov,
Fabian Isensee,
Fabian Balsiger,
Christoph Berger,
Maximilian Koerner,
Beatrice Demiray,
Julia Rackerseder,
Johannes Paetzold,
Hongwei Li,
Suprosanna Shit,
Richard McKinley,
Marie Piraud,
Spyridon Bakas,
Claus Zimmer,
Nassir Navab,
Jan Kirschke,
Benedikt Wiestler,
Bjoern Menze
Abstract:
Metrics optimized in complex machine learning tasks are often selected in an ad-hoc manner. It is unknown how they align with human expert perception. We explore the correlations between established quantitative segmentation quality metrics and qualitative evaluations by professionally trained human raters. Therefore, we conduct psychophysical experiments for two complex biomedical semantic segmen…
▽ More
Metrics optimized in complex machine learning tasks are often selected in an ad-hoc manner. It is unknown how they align with human expert perception. We explore the correlations between established quantitative segmentation quality metrics and qualitative evaluations by professionally trained human raters. Therefore, we conduct psychophysical experiments for two complex biomedical semantic segmentation problems. We discover that current standard metrics and loss functions correlate only moderately with the segmentation quality assessment of experts. Importantly, this effect is particularly pronounced for clinically relevant structures, such as the enhancing tumor compartment of glioma in brain magnetic resonance and grey matter in ultrasound imaging. It is often unclear how to optimize abstract metrics, such as human expert perception, in convolutional neural network (CNN) training. To cope with this challenge, we propose a novel strategy employing techniques of classical statistics to create complementary compound loss functions to better approximate human expert perception. Across all rating experiments, human experts consistently scored computer-generated segmentations better than the human-curated reference labels. Our results, therefore, strongly question many current practices in medical image segmentation and provide meaningful cues for future research.
△ Less
Submitted 2 May, 2023; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Traction Adaptive Motion Planning at the Limits of Handling
Authors:
Lars Svensson,
Monimoy Bujarbaruah,
Arpit Karsolia,
Christian Berger,
Martin Törngren
Abstract:
In this paper, we address the problem of motion planning and control at the limits of handling, under locally varying traction conditions. We propose a novel solution method where traction variations over the prediction horizon are represented by time-varying tire force constraints, derived from a predictive friction estimate. A constrained finite time optimal control problem is solved in a recedi…
▽ More
In this paper, we address the problem of motion planning and control at the limits of handling, under locally varying traction conditions. We propose a novel solution method where traction variations over the prediction horizon are represented by time-varying tire force constraints, derived from a predictive friction estimate. A constrained finite time optimal control problem is solved in a receding horizon fashion, imposing these time-varying constraints. Furthermore, our method features an integrated sampling augmentation procedure that addresses the problems of infeasibility and sensitivity to local minima that arise at abrupt constraint alterations, e.g., due to sudden friction changes.
We validate the proposed algorithm on a Volvo FH16 heavy-duty vehicle, in a range of critical scenarios. Experimental results indicate that traction adaptive motion planning and control improves the vehicle's capacity to avoid accidents, both when adapting to low local traction, by ensuring dynamic feasibility of the planned motion, and when adapting to high local traction, by realizing high traction utilization.
△ Less
Submitted 18 November, 2021; v1 submitted 9 September, 2020;
originally announced September 2020.
-
ModelHub.AI: Dissemination Platform for Deep Learning Models
Authors:
Ahmed Hosny,
Michael Schwier,
Christoph Berger,
Evin P Örnek,
Mehmet Turan,
Phi V Tran,
Leon Weninger,
Fabian Isensee,
Klaus H Maier-Hein,
Richard McKinley,
Michael T Lu,
Udo Hoffmann,
Bjoern Menze,
Spyridon Bakas,
Andriy Fedorov,
Hugo JWL Aerts
Abstract:
Recent advances in artificial intelligence research have led to a profusion of studies that apply deep learning to problems in image analysis and natural language processing among others. Additionally, the availability of open-source computational frameworks has lowered the barriers to implementing state-of-the-art methods across multiple domains. Albeit leading to major performance breakthroughs…
▽ More
Recent advances in artificial intelligence research have led to a profusion of studies that apply deep learning to problems in image analysis and natural language processing among others. Additionally, the availability of open-source computational frameworks has lowered the barriers to implementing state-of-the-art methods across multiple domains. Albeit leading to major performance breakthroughs in some tasks, effective dissemination of deep learning algorithms remains challenging, inhibiting reproducibility and benchmarking studies, impeding further validation, and ultimately hindering their effectiveness in the cumulative scientific progress. In develo** a platform for sharing research outputs, we present ModelHub.AI (www.modelhub.ai), a community-driven container-based software engine and platform for the structured dissemination of deep learning models. For contributors, the engine controls data flow throughout the inference cycle, while the contributor-facing standard template exposes model-specific functions including inference, as well as pre- and post-processing. Python and RESTful Application programming interfaces (APIs) enable users to interact with models hosted on ModelHub.AI and allows both researchers and developers to utilize models out-of-the-box. ModelHub.AI is domain-, data-, and framework-agnostic, catering to different workflows and contributors' preferences.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.