Search | arXiv e-print repository

Data-driven Modeling in Metrology -- A Short Introduction, Current Developments and Future Perspectives

Authors: Linda-Sophie Schneider, Patrick Krauss, Nadine Schiering, Christopher Syben, Richard Schielein, Andreas Maier

Abstract: Mathematical models are vital to the field of metrology, playing a key role in the derivation of measurement results and the calculation of uncertainties from measurement data, informed by an understanding of the measurement process. These models generally represent the correlation between the quantity being measured and all other pertinent quantities. Such relationships are used to construct meas… ▽ More Mathematical models are vital to the field of metrology, playing a key role in the derivation of measurement results and the calculation of uncertainties from measurement data, informed by an understanding of the measurement process. These models generally represent the correlation between the quantity being measured and all other pertinent quantities. Such relationships are used to construct measurement systems that can interpret measurement data to generate conclusions and predictions about the measurement system itself. Classic models are typically analytical, built on fundamental physical principles. However, the rise of digital technology, expansive sensor networks, and high-performance computing hardware have led to a growing shift towards data-driven methodologies. This trend is especially prominent when dealing with large, intricate networked sensor systems in situations where there is limited expert understanding of the frequently changing real-world contexts. Here, we demonstrate the variety of opportunities that data-driven modeling presents, and how they have been already implemented in various real-world applications. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 31 pages, Preprint

arXiv:2406.14576 [pdf, other]

Towards Intelligent Speech Assistants in Operating Rooms: A Multimodal Model for Surgical Workflow Analysis

Authors: Kubilay Can Demir, Belen Lojo Rodriguez, Tobias Weise, Andreas Maier, Seung Hee Yang

Abstract: To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations… ▽ More To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations. Our method merges speech and image models and uses them separately in different surgical phases. Based on the evaluation of 28 operations, we report a frame-wise accuracy of 92.65 $\pm$ 3.52% and an F1-score of 92.30 $\pm$ 3.82%. Our results show approximately 10% improvement in both metrics over previous work and validate the effectiveness of integrating multimodal data for the surgical phase recognition task. We further investigate the contribution of individual data channels by comparing mono-modal models with multimodal models. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 5 Pages, Interspeech 2024

MSC Class: 00b20

arXiv:2405.19079 [pdf, other]

On the Influence of Smoothness Constraints in Computed Tomography Motion Compensation

Authors: Mareike Thies, Fabian Wagner, Noah Maul, Siyuan Mei, Mingxuan Gu, Laura Pfaff, Nastassia Vysotskaya, Haijun Yu, Andreas Maier

Abstract: Computed tomography (CT) relies on precise patient immobilization during image acquisition. Nevertheless, motion artifacts in the reconstructed images can persist. Motion compensation methods aim to correct such artifacts post-acquisition, often incorporating temporal smoothness constraints on the estimated motion patterns. This study analyzes the influence of a spline-based motion model within an… ▽ More Computed tomography (CT) relies on precise patient immobilization during image acquisition. Nevertheless, motion artifacts in the reconstructed images can persist. Motion compensation methods aim to correct such artifacts post-acquisition, often incorporating temporal smoothness constraints on the estimated motion patterns. This study analyzes the influence of a spline-based motion model within an existing rigid motion compensation algorithm for cone-beam CT on the recoverable motion frequencies. Results demonstrate that the choice of motion model crucially influences recoverable frequencies. The optimization-based motion compensation algorithm is able to accurately fit the spline nodes for frequencies almost up to the node-dependent theoretical limit according to the Nyquist-Shannon theorem. Notably, a higher node count does not compromise reconstruction performance for slow motion patterns, but can extend the range of recoverable high frequencies for the investigated algorithm. Eventually, the optimal motion model is dependent on the imaged anatomy, clinical use case, and scanning protocol and should be tailored carefully to the expected motion frequency spectrum to ensure accurate motion compensation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.08064 [pdf]

The Impact of Speech Anonymization on Pathology and Its Limits

Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks. △ Less

Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.03541 [pdf, other]

Segmentation-Guided Knee Radiograph Generation using Conditional Diffusion Models

Authors: Siyuan Mei, Fuxin Fan, Fabian Wagner, Mareike Thies, Mingxuan Gu, Yipeng Sun, Andreas Maier

Abstract: Deep learning-based medical image processing algorithms require representative data during development. In particular, surgical data might be difficult to obtain, and high-quality public datasets are limited. To overcome this limitation and augment datasets, a widely adopted solution is the generation of synthetic images. In this work, we employ conditional diffusion models to generate knee radiog… ▽ More Deep learning-based medical image processing algorithms require representative data during development. In particular, surgical data might be difficult to obtain, and high-quality public datasets are limited. To overcome this limitation and augment datasets, a widely adopted solution is the generation of synthetic images. In this work, we employ conditional diffusion models to generate knee radiographs from contour and bone segmentations. Remarkably, two distinct strategies are presented by incorporating the segmentation as a condition into the sampling and training process, namely, conditional sampling and conditional training. The results demonstrate that both methods can generate realistic images while adhering to the conditioning segmentation. The conditional training method outperforms the conditional sampling method and the conventional U-Net. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.14440 [pdf, other]

Analysing Diffusion Segmentation for Medical Images

Authors: Mathias Öttl, Siyuan Mei, Frauke Wilm, Jana Steenpass, Matthias Rübner, Arndt Hartmann, Matthias Beckmann, Peter Fasching, Andreas Maier, Ramona Erber, Katharina Breininger

Abstract: Denoising Diffusion Probabilistic models have become increasingly popular due to their ability to offer probabilistic modeling and generate diverse outputs. This versatility inspired their adaptation for image segmentation, where multiple predictions of the model can produce segmentation results that not only achieve high quality but also capture the uncertainty inherent in the model. Here, powerf… ▽ More Denoising Diffusion Probabilistic models have become increasingly popular due to their ability to offer probabilistic modeling and generate diverse outputs. This versatility inspired their adaptation for image segmentation, where multiple predictions of the model can produce segmentation results that not only achieve high quality but also capture the uncertainty inherent in the model. Here, powerful architectures were proposed for improving diffusion segmentation performance. However, there is a notable lack of analysis and discussions on the differences between diffusion segmentation and image generation, and thorough evaluations are missing that distinguish the improvements these architectures provide for segmentation in general from their benefit for diffusion segmentation specifically. In this work, we critically analyse and discuss how diffusion segmentation for medical images differs from diffusion image generation, with a particular focus on the training behavior. Furthermore, we conduct an assessment how proposed diffusion segmentation architectures perform when trained directly for segmentation. Lastly, we explore how different medical segmentation tasks influence the diffusion segmentation behavior and the diffusion process could be adapted accordingly. With these analyses, we aim to provide in-depth insights into the behavior of diffusion segmentation that allow for a better design and evaluation of diffusion segmentation methods in the future. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.10695 [pdf, other]

EAGLE: An Edge-Aware Gradient Localization Enhanced Loss for CT Image Reconstruction

Authors: Yipeng Sun, Yixing Huang, Linda-Sophie Schneider, Mareike Thies, Mingxuan Gu, Siyuan Mei, Siming Bayer, Andreas Maier

Abstract: Computed Tomography (CT) image reconstruction is crucial for accurate diagnosis and deep learning approaches have demonstrated significant potential in improving reconstruction quality. However, the choice of loss function profoundly affects the reconstructed images. Traditional mean squared error loss often produces blurry images lacking fine details, while alternatives designed to improve may in… ▽ More Computed Tomography (CT) image reconstruction is crucial for accurate diagnosis and deep learning approaches have demonstrated significant potential in improving reconstruction quality. However, the choice of loss function profoundly affects the reconstructed images. Traditional mean squared error loss often produces blurry images lacking fine details, while alternatives designed to improve may introduce structural artifacts or other undesirable effects. To address these limitations, we propose Eagle-Loss, a novel loss function designed to enhance the visual quality of CT image reconstructions. Eagle-Loss applies spectral analysis of localized features within gradient changes to enhance sharpness and well-defined edges. We evaluated Eagle-Loss on two public datasets across low-dose CT reconstruction and CT field-of-view extension tasks. Our results show that Eagle-Loss consistently improves the visual quality of reconstructed images, surpassing state-of-the-art methods across various network architectures. Code and data are available at \url{https://github.com/sypsyp97/Eagle_Loss}. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2403.03326 [pdf, other]

AnatoMix: Anatomy-aware Data Augmentation for Multi-organ Segmentation

Authors: Chang Liu, Fuxin Fan, Annette Schwarz, Andreas Maier

Abstract: Multi-organ segmentation in medical images is a widely researched task and can save much manual efforts of clinicians in daily routines. Automating the organ segmentation process using deep learning (DL) is a promising solution and state-of-the-art segmentation models are achieving promising accuracy. In this work, We proposed a novel data augmentation strategy for increasing the generalizibility… ▽ More Multi-organ segmentation in medical images is a widely researched task and can save much manual efforts of clinicians in daily routines. Automating the organ segmentation process using deep learning (DL) is a promising solution and state-of-the-art segmentation models are achieving promising accuracy. In this work, We proposed a novel data augmentation strategy for increasing the generalizibility of multi-organ segmentation datasets, namely AnatoMix. By object-level matching and manipulation, our method is able to generate new images with correct anatomy, i.e. organ segmentation mask, exponentially increasing the size of the segmentation dataset. Initial experiments have been done to investigate the segmentation performance influenced by our method on a public CT dataset. Our augmentation method can lead to mean dice of 76.1, compared with 74.8 of the baseline method. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.12114 [pdf, other]

doi 10.1109/ISBI53787.2023.10230526

A Spatiotemporal Illumination Model for 3D Image Fusion in Optical Coherence Tomography

Authors: Stefan Ploner, Jungeun Won, Julia Schottenhamml, Jessica Girgis, Kenneth Lam, Nadia Waheed, James Fujimoto, Andreas Maier

Abstract: Optical coherence tomography (OCT) is a non-invasive, micrometer-scale imaging modality that has become a clinical standard in ophthalmology. By raster-scanning the retina, sequential cross-sectional image slices are acquired to generate volumetric data. In-vivo imaging suffers from discontinuities between slices that show up as motion and illumination artifacts. We present a new illumination mode… ▽ More Optical coherence tomography (OCT) is a non-invasive, micrometer-scale imaging modality that has become a clinical standard in ophthalmology. By raster-scanning the retina, sequential cross-sectional image slices are acquired to generate volumetric data. In-vivo imaging suffers from discontinuities between slices that show up as motion and illumination artifacts. We present a new illumination model that exploits continuity in orthogonally raster-scanned volume data. Our novel spatiotemporal parametrization adheres to illumination continuity both temporally, along the imaged slices, as well as spatially, in the transverse directions. Yet, our formulation does not make inter-slice assumptions, which could have discontinuities. This is the first optimization of a 3D inverse model in an image reconstruction context in OCT. Evaluation in 68 volumes from eyes with pathology showed reduction of illumination artifacts in 88\% of the data, and only 6\% showed moderate residual illumination artifacts. The method enables the use of forward-warped motion corrected data, which is more accurate, and enables supersampling and advanced 3D image reconstruction in OCT. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Presented orally & as poster on 20th April 2023 at the IEEE International Symposium on Biomedical Imaging (ISBI) in Cartagena, Colombia. 6 pages, 3 figures. You can find the official version with broken equations and bad contrast figures under https://ieeexplore.ieee.org/document/10230526

arXiv:2401.16104 [pdf, other]

A 2D Sinogram-Based Approach to Defect Localization in Computed Tomography

Authors: Yuzhong Zhou, Linda-Sophie Schneider, Fuxin Fan, Andreas Maier

Abstract: The rise of deep learning has introduced a transformative era in the field of image processing, particularly in the context of computed tomography. Deep learning has made a significant contribution to the field of industrial Computed Tomography. However, many defect detection algorithms are applied directly to the reconstructed domain, often disregarding the raw sensor data. This paper shifts the… ▽ More The rise of deep learning has introduced a transformative era in the field of image processing, particularly in the context of computed tomography. Deep learning has made a significant contribution to the field of industrial Computed Tomography. However, many defect detection algorithms are applied directly to the reconstructed domain, often disregarding the raw sensor data. This paper shifts the focus to the use of sinograms. Within this framework, we present a comprehensive three-step deep learning algorithm, designed to identify and analyze defects within objects without resorting to image reconstruction. These three steps are defect segmentation, mask isolation, and defect analysis. We use a U-Net-based architecture for defect segmentation. Our method achieves the Intersection over Union of 92.02% on our simulated data, with an average position error of 1.3 pixels for defect detection on a 512-pixel-wide detector. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.16039 [pdf, other]

Data-Driven Filter Design in FBP: Transforming CT Reconstruction with Trainable Fourier Series

Authors: Yipeng Sun, Linda-Sophie Schneider, Fuxin Fan, Mareike Thies, Mingxuan Gu, Siyuan Mei, Yuzhong Zhou, Siming Bayer, Andreas Maier

Abstract: In this study, we introduce a Fourier series-based trainable filter for computed tomography (CT) reconstruction within the filtered backprojection (FBP) framework. This method overcomes the limitation in noise reduction, inherent in conventional FBP methods, by optimizing Fourier series coefficients to construct the filter. This method enables robust performance across different resolution scales… ▽ More In this study, we introduce a Fourier series-based trainable filter for computed tomography (CT) reconstruction within the filtered backprojection (FBP) framework. This method overcomes the limitation in noise reduction, inherent in conventional FBP methods, by optimizing Fourier series coefficients to construct the filter. This method enables robust performance across different resolution scales and maintains computational efficiency with minimal increment for the trainable parameters compared to other deep learning frameworks. Additionally, we propose Gaussian edge-enhanced (GEE) loss function that prioritizes the $L_1$ norm of high-frequency magnitudes, effectively countering the blurring problems prevalent in mean squared error (MSE) approaches. The model's foundation in the FBP algorithm ensures excellent interpretability, as it relies on a data-driven filter with all other parameters derived through rigorous mathematical procedures. Designed as a plug-and-play solution, our Fourier series-based filter can be easily integrated into existing CT reconstruction models, making it a versatile tool for a wide range of practical applications. Our research presents a robust and scalable method that expands the utility of FBP in both medical and scientific imaging. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Preprint

arXiv:2401.12725 [pdf, other]

Two-View Topogram-Based Anatomy-Guided CT Reconstruction for Prospective Risk Minimization

Authors: Chang Liu, Laura Klein, Yixing Huang, Edith Baader, Michael Lell, Marc Kachelrieß, Andreas Maier

Abstract: To facilitate a prospective estimation of CT effective dose and risk minimization process, a prospective spatial dose estimation and the known anatomical structures are expected. To this end, a CT reconstruction method is required to reconstruct CT volumes from as few projections as possible, i.e. by using the topograms, with anatomical structures as correct as possible. In this work, an optimized… ▽ More To facilitate a prospective estimation of CT effective dose and risk minimization process, a prospective spatial dose estimation and the known anatomical structures are expected. To this end, a CT reconstruction method is required to reconstruct CT volumes from as few projections as possible, i.e. by using the topograms, with anatomical structures as correct as possible. In this work, an optimized CT reconstruction model based on a generative adversarial network (GAN) is proposed. The GAN is trained to reconstruct 3D volumes from an anterior-posterior and a lateral CT projection. To enhance anatomical structures, a pre-trained organ segmentation network and the 3D perceptual loss are applied during the training phase, so that the model can then generate both organ-enhanced CT volume and the organ segmentation mask. The proposed method can reconstruct CT volumes with PSNR of 26.49, RMSE of 196.17, and SSIM of 0.64, compared to 26.21, 201.55 and 0.63 using the baseline method. In terms of the anatomical structure, the proposed method effectively enhances the organ shape and boundary and allows for a straight-forward identification of the relevant anatomical structures. We note that conventional reconstruction metrics fail to indicate the enhancement of anatomical structures. In addition to such metrics, the evaluation is expanded with assessing the organ segmentation performance. The average organ dice of the proposed method is 0.71 compared with 0.63 in baseline model, indicating the enhancement of anatomical structures. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.09283 [pdf, other]

A gradient-based approach to fast and accurate head motion compensation in cone-beam CT

Authors: Mareike Thies, Fabian Wagner, Noah Maul, Haijun Yu, Manuela Meier, Linda-Sophie Schneider, Mingxuan Gu, Siyuan Mei, Lukas Folle, Andreas Maier

Abstract: Cone-beam computed tomography (CBCT) systems, with their portability, present a promising avenue for direct point-of-care medical imaging, particularly in critical scenarios such as acute stroke assessment. However, the integration of CBCT into clinical workflows faces challenges, primarily linked to long scan duration resulting in patient motion during scanning and leading to image quality degrad… ▽ More Cone-beam computed tomography (CBCT) systems, with their portability, present a promising avenue for direct point-of-care medical imaging, particularly in critical scenarios such as acute stroke assessment. However, the integration of CBCT into clinical workflows faces challenges, primarily linked to long scan duration resulting in patient motion during scanning and leading to image quality degradation in the reconstructed volumes. This paper introduces a novel approach to CBCT motion estimation using a gradient-based optimization algorithm, which leverages generalized derivatives of the backprojection operator for cone-beam CT geometries. Building on that, a fully differentiable target function is formulated which grades the quality of the current motion estimate in reconstruction space. We drastically accelerate motion estimation yielding a 19-fold speed-up compared to existing methods. Additionally, we investigate the architecture of networks used for quality metric regression and propose predicting voxel-wise quality maps, favoring autoencoder-like architectures over contracting ones. This modification improves gradient flow, leading to more accurate motion estimation. The presented method is evaluated through realistic experiments on head anatomy. It achieves a reduction in reprojection error from an initial average of 3mm to 0.61mm after motion compensation and consistently demonstrates superior performance compared to existing approaches. The analytic Jacobian for the backprojection operation, which is at the core of the proposed method, is made publicly available. In summary, this paper contributes to the advancement of CBCT integration into clinical workflows by proposing a robust motion estimation approach that enhances efficiency and accuracy, addressing critical challenges in time-sensitive scenarios. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2401.03912 [pdf, other]

Attention-Guided Erasing: A Novel Augmentation Method for Enhancing Downstream Breast Density Classification

Authors: Adarsh Bhandary Panambur, Hui Yu, Sheethal Bhat, Prathmesh Madhu, Siming Bayer, Andreas Maier

Abstract: The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed Attention-Guided Erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendat… ▽ More The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed Attention-Guided Erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendation in the Vietnamese cohort. The proposed method integrates supplementary information during transfer learning, utilizing visual attention maps derived from a vision transformer backbone trained using the self-supervised DINO method. These maps are utilized to erase background regions in the mammogram images, unveiling only the potential areas of dense breast tissues to the network. Through the incorporation of AGE during transfer learning with varying random probabilities, we consistently surpass classification performance compared to scenarios without AGE and the traditional random erasing transformation. We validate our methodology using the publicly available VinDr-Mammo dataset. Specifically, we attain a mean F1-score of 0.5910, outperforming values of 0.5594 and 0.5691 corresponding to scenarios without AGE and with random erasing (RE), respectively. This superiority is further substantiated by t-tests, revealing a p-value of p<0.0001, underscoring the statistical significance of our approach. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.13744 [pdf]

Modelling of Networked Measuring Systems -- From White-Box Models to Data Based Approaches

Authors: Klaus-Dieter Sommer, Peter Harris, Sascha Eichstädt, Roland Füssl, Tanja Dorst, Andreas Schütze, Michael Heizmann, Nadine Schiering, Andreas Maier, Yuhui Luo, Christos Tachtatzis, Ivan Andonovic, Gordon Gourlay

Abstract: Mathematical modelling is at the core of metrology as it transforms raw measured data into useful measurement results. A model captures the relationship between the measurand and all relevant quantities on which the measurand depends, and is used to design measuring systems, analyse measured data, make inferences and predictions, and is the basis for evaluating measurement uncertainties. Tradition… ▽ More Mathematical modelling is at the core of metrology as it transforms raw measured data into useful measurement results. A model captures the relationship between the measurand and all relevant quantities on which the measurand depends, and is used to design measuring systems, analyse measured data, make inferences and predictions, and is the basis for evaluating measurement uncertainties. Traditional modelling approaches are typically analytical, for example, based on principles of physics. But with the increasing use of digital technologies, large sensor networks and powerful computing hardware, these traditional approaches are being replaced more and more by data-driven methods. This paradigm shift holds true in particular for the digital future of measurement in all spheres of our lives and the environment, where data provided by large and complex interconnected systems of sensors are to be analysed. Additionally, there is a requirement for existing guidelines and standards in metrology to take the paradigm shift into account. In this paper we lay the foundation for the development from traditional to data-driven modelling approaches. We identify key aspects from traditional modelling approaches and discuss their transformation to data-driven modelling. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.08255 [pdf, other]

doi 10.1038/s41597-024-03182-7

OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods

Authors: Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, Andreas Maier

Abstract: Optical coherence tomography (OCT) is a non-invasive imaging technique with extensive clinical applications in ophthalmology. OCT enables the visualization of the retinal layers, playing a vital role in the early detection and monitoring of retinal diseases. OCT uses the principle of light wave interference to create detailed images of the retinal microstructures, making it a valuable tool for dia… ▽ More Optical coherence tomography (OCT) is a non-invasive imaging technique with extensive clinical applications in ophthalmology. OCT enables the visualization of the retinal layers, playing a vital role in the early detection and monitoring of retinal diseases. OCT uses the principle of light wave interference to create detailed images of the retinal microstructures, making it a valuable tool for diagnosing ocular conditions. This work presents an open-access OCT dataset (OCTDL) comprising over 2000 OCT images labeled according to disease group and retinal pathology. The dataset consists of OCT records of patients with Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), Epiretinal Membrane (ERM), Retinal Artery Occlusion (RAO), Retinal Vein Occlusion (RVO), and Vitreomacular Interface Disease (VID). The images were acquired with an Optovue Avanti RTVue XR using raster scanning protocols with dynamic scan length and image resolution. Each retinal b-scan was acquired by centering on the fovea and interpreted and cataloged by an experienced retinal specialist. In this work, we applied Deep Learning classification techniques to this new open-access dataset. △ Less

Submitted 31 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2306.14596 [pdf, other]

Deep Learning for Cancer Prognosis Prediction Using Portrait Photos by StyleGAN Embedding

Authors: Amr Hagag, Ahmed Gomaa, Dominik Kornek, Andreas Maier, Rainer Fietkau, Christoph Bert, Florian Putz, Yixing Huang

Abstract: Survival prediction for cancer patients is critical for optimal treatment selection and patient management. Current patient survival prediction methods typically extract survival information from patients' clinical record data or biological and imaging data. In practice, experienced clinicians can have a preliminary assessment of patients' health status based on patients' observable physical appea… ▽ More Survival prediction for cancer patients is critical for optimal treatment selection and patient management. Current patient survival prediction methods typically extract survival information from patients' clinical record data or biological and imaging data. In practice, experienced clinicians can have a preliminary assessment of patients' health status based on patients' observable physical appearances, which are mainly facial features. However, such assessment is highly subjective. In this work, the efficacy of objectively capturing and using prognostic information contained in conventional portrait photographs using deep learning for survival predication purposes is investigated for the first time. A pre-trained StyleGAN2 model is fine-tuned on a custom dataset of our cancer patients' photos to empower its generator with generative ability suitable for patients' photos. The StyleGAN2 is then used to embed the photographs to its highly expressive latent space. Utilizing the state-of-the-art survival analysis models and based on StyleGAN's latent space photo embeddings, this approach achieved a C-index of 0.677, which is notably higher than chance and evidencing the prognostic value embedded in simple 2D facial images. In addition, thanks to StyleGAN's interpretable latent space, our survival prediction model can be validated for relying on essential facial features, eliminating any biases from extraneous information like clothing or background. Moreover, a health attribute is obtained from regression coefficients, which has important potential value for patient care. △ Less

Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.02901 [pdf, other]

doi 10.1007/978-3-658-41657-7_11

A Vessel-Segmentation-Based CycleGAN for Unpaired Multi-modal Retinal Image Synthesis

Authors: Aline Sindel, Andreas Maier, Vincent Christlein

Abstract: Unpaired image-to-image translation of retinal images can efficiently increase the training dataset for deep-learning-based multi-modal retinal registration methods. Our method integrates a vessel segmentation network into the image-to-image translation task by extending the CycleGAN framework. The segmentation network is inserted prior to a UNet vision transformer generator network and serves as… ▽ More Unpaired image-to-image translation of retinal images can efficiently increase the training dataset for deep-learning-based multi-modal retinal registration methods. Our method integrates a vessel segmentation network into the image-to-image translation task by extending the CycleGAN framework. The segmentation network is inserted prior to a UNet vision transformer generator network and serves as a shared representation between both domains. We reformulate the original identity loss to learn the direct map** between the vessel segmentation and the real image. Additionally, we add a segmentation loss term to ensure shared vessel locations between fake and real images. In the experiments, our method shows a visually realistic look and preserves the vessel structures, which is a prerequisite for generating multi-modal training data for image registration. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted to BVM 2023

Journal ref: BVM 2023

arXiv:2306.01752 [pdf, other]

Handling Label Uncertainty on the Example of Automatic Detection of Shepherd's Crook RCA in Coronary CT Angiography

Authors: Felix Denzinger, Michael Wels, Oliver Taubmann, Florian Kordon, Fabian Wagner, Stephanie Mehltretter, Mehmet A. Gülsün, Max Schöbinger, Florian André, Sebastian Buss, Johannes Görich, Michael Sühling, Andreas Maier

Abstract: Coronary artery disease (CAD) is often treated minimally invasively with a catheter being inserted into the diseased coronary vessel. If a patient exhibits a Shepherd's Crook (SC) Right Coronary Artery (RCA) - an anatomical norm variant of the coronary vasculature - the complexity of this procedure is increased. Automated reporting of this variant from coronary CT angiography screening would ease… ▽ More Coronary artery disease (CAD) is often treated minimally invasively with a catheter being inserted into the diseased coronary vessel. If a patient exhibits a Shepherd's Crook (SC) Right Coronary Artery (RCA) - an anatomical norm variant of the coronary vasculature - the complexity of this procedure is increased. Automated reporting of this variant from coronary CT angiography screening would ease prior risk assessment. We propose a 1D convolutional neural network which leverages a sequence of residual dilated convolutions to automatically determine this norm variant from a prior extracted vessel centerline. As the SC RCA is not clearly defined with respect to concrete measurements, labeling also includes qualitative aspects. Therefore, 4.23% samples in our dataset of 519 RCA centerlines were labeled as unsure SC RCAs, with 5.97% being labeled as sure SC RCAs. We explore measures to handle this label uncertainty, namely global/model-wise random assignment, exclusion, and soft label assignment. Furthermore, we evaluate how this uncertainty can be leveraged for the determination of a rejection class. With our best configuration, we reach an area under the receiver operating characteristic curve (AUC) of 0.938 on confident labels. Moreover, we observe an increase of up to 0.020 AUC when rejecting 10% of the data and leveraging the labeling uncertainty information in the exclusion process. △ Less

Submitted 22 May, 2023; originally announced June 2023.

Comments: Accepted at ISBI 2023

arXiv:2305.11284 [pdf, other]

doi 10.21437/Interspeech.2023-2108

Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

Authors: Soroosh Tayebi Arasteh, Cristian David Rios-Urrego, Elmar Noeth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave

Abstract: Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing pati… ▽ More Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing patient speech data with each other. In this paper, we employ federated learning (FL) for PD detection using speech signals from 3 real-world language corpora of German, Spanish, and Czech, each from a separate institution. Our results indicate that the FL model outperforms all the local models in terms of diagnostic accuracy, while not performing very differently from the model based on centrally combined training sets, with the advantage of not requiring any data sharing among collaborators. This will simplify inter-institutional collaborations, resulting in enhancement of patient outcomes. △ Less

Submitted 21 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: INTERSPEECH 2023, pp. 5003--5007, Dublin, Ireland

Journal ref: INTERSPEECH 2023

arXiv:2305.08227 [pdf, other]

DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

Authors: Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas Maier

Abstract: Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep Filtering (DF) was proposed to directly estimate a complex filter in frequency domain to take advantage of these correlations. In this work, we present a real-time speech enhancement demo using DeepFilterNet. DeepFilterNet's efficiency is enabled by ex… ▽ More Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep Filtering (DF) was proposed to directly estimate a complex filter in frequency domain to take advantage of these correlations. In this work, we present a real-time speech enhancement demo using DeepFilterNet. DeepFilterNet's efficiency is enabled by exploiting domain knowledge of speech production and psychoacoustic perception. Our model is able to match state-of-the-art speech enhancement benchmarks while achieving a real-time-factor of 0.19 on a single threaded notebook CPU. The framework as well as pretrained weights have been published under an open source license. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: Accepted as show and tell demo to interspeech 2023

arXiv:2305.08225 [pdf, other]

Deep Multi-Frame Filtering for Hearing Aids

Authors: Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas Maier

Abstract: Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep filtering (DF) recently demonstrated its capabilities for low-latency scenarios like hearing aids with its complex multi-frame (MF) filter. Alternatively, the complex filter can be estimated via an MF minimum variance distortionless response (MVDR), or… ▽ More Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep filtering (DF) recently demonstrated its capabilities for low-latency scenarios like hearing aids with its complex multi-frame (MF) filter. Alternatively, the complex filter can be estimated via an MF minimum variance distortionless response (MVDR), or MF Wiener filter (WF). Previous studies have shown that incorporating algorithm domain knowledge using an MVDR filter might be beneficial compared to the direct filter estimation via DF. In this work, we compare the usage of various multi-frame filters such as DF, MF-MVDR, or MF-WF for HAs. We assess different covariance estimation methods for both MF-MVDR and MF-WF and objectively demonstrate an improved performance compared to direct DF estimation, significantly outperforming related work while improving the runtime performance. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: Submitted to Interspeech 2023

arXiv:2303.14711 [pdf, other]

Unsupervised detection of small hyperreflective features in ultrahigh resolution optical coherence tomography

Authors: Marcel Reimann, Jungeun Won, Hiroyuki Takahashi, Antonio Yaghy, Yunchan Hwang, Stefan Ploner, Junhong Lin, Jessica Girgis, Kenneth Lam, Siyu Chen, Nadia K. Waheed, Andreas Maier, James G. Fujimoto

Abstract: Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential associa… ▽ More Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential association with disease progression and treatment outcomes. Therefore, it is necessary to reliably detect these features in 3D volumetric scans. Because manual labeling of entire volumes is infeasible a need for automatic detection arises. Labeled datasets are often not publicly available and there are usually large variations in scan protocols and scanner types. Thus, this work focuses on an unsupervised approach that is based on local peak-detection and random walker segmentation to detect small features on each B-scan of the volume. △ Less

Submitted 26 March, 2023; originally announced March 2023.

Comments: Accepted as poster at BVM workshop 2023 (https://www.bvm-workshop.org/). The arXiv version provides full quality figures. 6 pages content (2 figures)

arXiv:2303.11724 [pdf, other]

Task-based Generation of Optimized Projection Sets using Differentiable Ranking

Authors: Linda-Sophie Schneider, Mareike Thies, Christopher Syben, Richard Schielein, Mathias Unberath, Andreas Maier

Abstract: We present a method for selecting valuable projections in computed tomography (CT) scans to enhance image reconstruction and diagnosis. The approach integrates two important factors, projection-based detectability and data completeness, into a single feed-forward neural network. The network evaluates the value of projections, processes them through a differentiable ranking function and makes the f… ▽ More We present a method for selecting valuable projections in computed tomography (CT) scans to enhance image reconstruction and diagnosis. The approach integrates two important factors, projection-based detectability and data completeness, into a single feed-forward neural network. The network evaluates the value of projections, processes them through a differentiable ranking function and makes the final selection using a straight-through estimator. Data completeness is ensured through the label provided during training. The approach eliminates the need for heuristically enforcing data completeness, which may exclude valuable projections. The method is evaluated on simulated data in a non-destructive testing scenario, where the aim is to maximize the reconstruction quality within a specified region of interest. We achieve comparable results to previous methods, laying the foundation for using reconstruction-based loss functions to learn the selection of projections. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.00500 [pdf, other]

Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals

Authors: Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner

Abstract: Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from sever… ▽ More Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models. △ Less

Submitted 8 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted to MIDL 2023

arXiv:2302.11612 [pdf]

Retinal blood flow speed quantification at the capillary level using temporal autocorrelation fitting OCTA

Authors: Yunchan Hwang, Jungeun Won, Antonio Yaghy, Hiroyuki Takahashi, Jessica M. Girgis, Kenneth Lam, Siyu Chen, Eric M. Moult, Stefan B. Ploner, Andreas Maier, Nadia K. Waheed, James G. Fujimoto

Abstract: Optical coherence tomography angiography (OCTA) can visualize vasculature structures, but provides limited information about the blood flow speeds. Here, we present a second generation variable interscan time analysis (VISTA) OCTA, which evaluates a quantitative surrogate marker for blood flow speed in vasculature. At the capillary level, spatially compiled OCTA and a simple temporal autocorrelati… ▽ More Optical coherence tomography angiography (OCTA) can visualize vasculature structures, but provides limited information about the blood flow speeds. Here, we present a second generation variable interscan time analysis (VISTA) OCTA, which evaluates a quantitative surrogate marker for blood flow speed in vasculature. At the capillary level, spatially compiled OCTA and a simple temporal autocorrelation model, ρ(τ) = exp(-ατ), were used to evaluate a temporal autocorrelation decay constant, α, as the blood flow speed marker. A 600 kHz A-scan rate swept-source provides short interscan time OCTA and fine A-scan spacing acquisition, while maintaining multi mm2 field of views for human retinal imaging. We demonstrate the cardiac pulsatility and repeatability of α measured with VISTA. We show different α for different retinal capillary plexuses in healthy eyes and present representative VISTA OCTA of eyes with diabetic retinopathy. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.06251 [pdf, other]

Optimizing CT Scan Geometries With and Without Gradients

Authors: Mareike Thies, Fabian Wagner, Noah Maul, Laura Pfaff, Linda-Sophie Schneider, Christopher Syben, Andreas Maier

Abstract: In computed tomography (CT), the projection geometry used for data acquisition needs to be known precisely to obtain a clear reconstructed image. Rigid patient motion is a cause for misalignment between measured data and employed geometry. Commonly, such motion is compensated by solving an optimization problem that, e.g., maximizes the quality of the reconstructed image with respect to the project… ▽ More In computed tomography (CT), the projection geometry used for data acquisition needs to be known precisely to obtain a clear reconstructed image. Rigid patient motion is a cause for misalignment between measured data and employed geometry. Commonly, such motion is compensated by solving an optimization problem that, e.g., maximizes the quality of the reconstructed image with respect to the projection geometry. So far, gradient-free optimization algorithms have been utilized to find the solution for this problem. Here, we show that gradient-based optimization algorithms are a possible alternative and compare the performance to their gradient-free counterparts on a benchmark motion compensation problem. Gradient-based algorithms converge substantially faster while being comparable to gradient-free algorithms in terms of capture range and robustness to the number of free parameters. Hence, gradient-based optimization is a viable alternative for the given type of problems. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2301.09282 [pdf, other]

Classification of Luminal Subtypes in Full Mammogram Images Using Transfer Learning

Authors: Adarsh Bhandary Panambur, Prathmesh Madhu, Andreas Maier

Abstract: Automatic identification of patients with luminal and non-luminal subtypes during a routine mammography screening can support clinicians in streamlining breast cancer therapy planning. Recent machine learning techniques have shown promising results in molecular subtype classification in mammography; however, they are highly dependent on pixel-level annotations, handcrafted, and radiomic features.… ▽ More Automatic identification of patients with luminal and non-luminal subtypes during a routine mammography screening can support clinicians in streamlining breast cancer therapy planning. Recent machine learning techniques have shown promising results in molecular subtype classification in mammography; however, they are highly dependent on pixel-level annotations, handcrafted, and radiomic features. In this work, we provide initial insights into the luminal subtype classification in full mammogram images trained using only image-level labels. Transfer learning is applied from a breast abnormality classification task, to finetune a ResNet-18-based luminal versus non-luminal subtype classification task. We present and compare our results on the publicly available CMMD dataset and show that our approach significantly outperforms the baseline classifier by achieving a mean AUC score of 0.6688 and a mean F1 score of 0.6693 on the test dataset. The improvement over baseline is statistically significant, with a p-value of p<0.0001. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: Submitted to IEEE ISBI 2023

arXiv:2301.04423 [pdf, other]

Multi-Scanner Canine Cutaneous Squamous Cell Carcinoma Histopathology Dataset

Authors: Frauke Wilm, Marco Fragoso, Christof A. Bertram, Nikolas Stathonikos, Mathias Öttl, **gna Qiu, Robert Klopfleisch, Andreas Maier, Katharina Breininger, Marc Aubreville

Abstract: In histopathology, scanner-induced domain shifts are known to impede the performance of trained neural networks when tested on unseen data. Multi-domain pre-training or dedicated domain-generalization techniques can help to develop domain-agnostic algorithms. For this, multi-scanner datasets with a high variety of slide scanning systems are highly desirable. We present a publicly available multi-s… ▽ More In histopathology, scanner-induced domain shifts are known to impede the performance of trained neural networks when tested on unseen data. Multi-domain pre-training or dedicated domain-generalization techniques can help to develop domain-agnostic algorithms. For this, multi-scanner datasets with a high variety of slide scanning systems are highly desirable. We present a publicly available multi-scanner dataset of canine cutaneous squamous cell carcinoma histopathology images, composed of 44 samples digitized with five slide scanners. This dataset provides local correspondences between images and thereby isolates the scanner-induced domain shift from other inherent, e.g. morphology-induced domain shifts. To highlight scanner differences, we present a detailed evaluation of color distributions, sharpness, and contrast of the individual scanner subsets. Additionally, to quantify the inherent scanner-induced domain shift, we train a tumor segmentation network on each scanner subset and evaluate the performance both in- and cross-domain. We achieve a class-averaged in-domain intersection over union coefficient of up to 0.86 and observe a cross-domain performance decrease of up to 0.38, which confirms the inherent domain shift of the presented dataset and its negative impact on the performance of deep neural networks. △ Less

Submitted 27 February, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: 6 pages, 3 figures, 1 table, accepted at BVM workshop 2023

arXiv:2212.04832 [pdf, other]

Noise2Contrast: Multi-Contrast Fusion Enables Self-Supervised Tomographic Image Denoising

Authors: Fabian Wagner, Mareike Thies, Laura Pfaff, Noah Maul, Sabrina Pechmann, Mingxuan Gu, Jonas Utz, Oliver Aust, Daniela Weidner, Georgiana Neag, Stefan Uderhardt, Jang-Hwan Choi, Andreas Maier

Abstract: Self-supervised image denoising techniques emerged as convenient methods that allow training denoising models without requiring ground-truth noise-free data. Existing methods usually optimize loss metrics that are calculated from multiple noisy realizations of similar images, e.g., from neighboring tomographic slices. However, those approaches fail to utilize the multiple contrasts that are routin… ▽ More Self-supervised image denoising techniques emerged as convenient methods that allow training denoising models without requiring ground-truth noise-free data. Existing methods usually optimize loss metrics that are calculated from multiple noisy realizations of similar images, e.g., from neighboring tomographic slices. However, those approaches fail to utilize the multiple contrasts that are routinely acquired in medical imaging modalities like MRI or dual-energy CT. In this work, we propose the new self-supervised training scheme Noise2Contrast that combines information from multiple measured image contrasts to train a denoising model. We stack denoising with domain-transfer operators to utilize the independent noise realizations of different image contrasts to derive a self-supervised loss. The trained denoising operator achieves convincing quantitative and qualitative results, outperforming state-of-the-art self-supervised methods by 4.7-11.0%/4.8-7.3% (PSNR/SSIM) on brain MRI data and by 43.6-50.5%/57.1-77.1% (PSNR/SSIM) on dual-energy CT X-ray microscopy data with respect to the noisy baseline. Our experiments on different real measured data sets indicate that Noise2Contrast training generalizes to other multi-contrast imaging modalities. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2212.02177 [pdf, other]

doi 10.1088/1361-6560/acf90e

Gradient-Based Geometry Learning for Fan-Beam CT Reconstruction

Authors: Mareike Thies, Fabian Wagner, Noah Maul, Lukas Folle, Manuela Meier, Maximilian Rohleder, Linda-Sophie Schneider, Laura Pfaff, Mingxuan Gu, Jonas Utz, Felix Denzinger, Michael Manhart, Andreas Maier

Abstract: Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. In this paper, the differentiable formulation of fan-beam C… ▽ More Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. In this paper, the differentiable formulation of fan-beam CT reconstruction is extended to the acquisition geometry. This allows to propagate gradient information from a loss function on the reconstructed image into the geometry parameters. As a proof-of-concept experiment, this idea is applied to rigid motion compensation. The cost function is parameterized by a trained neural network which regresses an image quality metric from the motion affected reconstruction alone. Using the proposed method, we are the first to optimize such an autofocus-inspired algorithm based on analytical gradients. The algorithm achieves a reduction in MSE by 35.5 % and an improvement in SSIM by 12.6 % over the motion affected reconstruction. Next to motion compensation, we see further use cases of our differentiable method for scanner calibration or hybrid techniques employing deep models. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.16219 [pdf, other]

Metal-conscious Embedding for CBCT Projection Inpainting

Authors: Fuxin Fan, Yangkong Wang, Ludwig Ritschl, Ramyar Biniazan, Marcel Beister, Björn Kreher, Yixing Huang, Steffen Kappler, Andreas Maier

Abstract: The existence of metallic implants in projection images for cone-beam computed tomography (CBCT) introduces undesired artifacts which degrade the quality of reconstructed images. In order to reduce metal artifacts, projection inpainting is an essential step in many metal artifact reduction algorithms. In this work, a hybrid network combining the shift window (Swin) vision transformer (ViT) and a c… ▽ More The existence of metallic implants in projection images for cone-beam computed tomography (CBCT) introduces undesired artifacts which degrade the quality of reconstructed images. In order to reduce metal artifacts, projection inpainting is an essential step in many metal artifact reduction algorithms. In this work, a hybrid network combining the shift window (Swin) vision transformer (ViT) and a convolutional neural network is proposed as a baseline network for the inpainting task. To incorporate metal information for the Swin ViT-based encoder, metal-conscious self-embedding and neighborhood-embedding methods are investigated. Both methods have improved the performance of the baseline network. Furthermore, by choosing appropriate window size, the model with neighborhood-embedding could achieve the lowest mean absolute error of 0.079 in metal regions and the highest peak signal-to-noise ratio of 42.346 in CBCT projections. At the end, the efficiency of metal-conscious embedding on both simulated and real cadaver CBCT data has been demonstrated, where the inpainting capability of the baseline network has been enhanced. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.16141 [pdf, other]

Mind the Gap: Scanner-induced domain shifts pose challenges for representation learning in histopathology

Authors: Frauke Wilm, Marco Fragoso, Christof A. Bertram, Nikolas Stathonikos, Mathias Öttl, **gna Qiu, Robert Klopfleisch, Andreas Maier, Marc Aubreville, Katharina Breininger

Abstract: Computer-aided systems in histopathology are often challenged by various sources of domain shift that impact the performance of these algorithms considerably. We investigated the potential of using self-supervised pre-training to overcome scanner-induced domain shifts for the downstream task of tumor segmentation. For this, we present the Barlow Triplets to learn scanner-invariant representations… ▽ More Computer-aided systems in histopathology are often challenged by various sources of domain shift that impact the performance of these algorithms considerably. We investigated the potential of using self-supervised pre-training to overcome scanner-induced domain shifts for the downstream task of tumor segmentation. For this, we present the Barlow Triplets to learn scanner-invariant representations from a multi-scanner dataset with local image correspondences. We show that self-supervised pre-training successfully aligned different scanner representations, which, interestingly only results in a limited benefit for our downstream task. We thereby provide insights into the influence of scanner characteristics for downstream applications and contribute to a better understanding of why established self-supervised methods have not yet shown the same success on histopathology data as they have for natural images. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 5 pages, 4 figures, 1 table. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.06150 [pdf, other]

doi 10.1109/ISBI53787.2023.10230503

Improved HER2 Tumor Segmentation with Subtype Balancing using Deep Generative Networks

Authors: Mathias Öttl, Jana Mönius, Matthias Rübner, Carol I. Geppert, **gna Qiu, Frauke Wilm, Arndt Hartmann, Matthias W. Beckmann, Peter A. Fasching, Andreas Maier, Ramona Erber, Katharina Breininger

Abstract: Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic i… ▽ More Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic images with the original dataset to achieve better segmentation performance. We show the suitability of Generative Adversarial Networks (GANs) and especially diffusion models to create realistic images based on subtype-conditioning for the use case of HER2-stained histopathology. Additionally, we show the capability of diffusion models to conditionally inpaint HER2 tumor areas with modified subtypes. Combining the original dataset with the same amount of diffusion-generated images increased the tumor Dice score from 0.833 to 0.854 and almost halved the variance between the HER2 subtype recalls. These results create the basis for more reliable automatic HER2 analysis with lower performance variance between individual HER2 subtypes. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 5 pages, 6 figures

arXiv:2211.06146 [pdf, other]

An unobtrusive quality supervision approach for medical image annotation

Authors: Sonja Kunzmann, Mathias Öttl, Prathmesh Madhu, Felix Denzinger, Andreas Maier

Abstract: Image annotation is one essential prior step to enable data-driven algorithms. In medical imaging, having large and reliably annotated data sets is crucial to recognize various diseases robustly. However, annotator performance varies immensely, thus impacts model training. Therefore, often multiple annotators should be employed, which is however expensive and resource-intensive. Hence, it is desir… ▽ More Image annotation is one essential prior step to enable data-driven algorithms. In medical imaging, having large and reliably annotated data sets is crucial to recognize various diseases robustly. However, annotator performance varies immensely, thus impacts model training. Therefore, often multiple annotators should be employed, which is however expensive and resource-intensive. Hence, it is desirable that users should annotate unseen data and have an automated system to unobtrusively rate their performance during this process. We examine such a system based on whole slide images (WSIs) showing lung fluid cells. We evaluate two methods the generation of synthetic individual cell images: conditional Generative Adversarial Networks and Diffusion Models (DM). For qualitative and quantitative evaluation, we conduct a user study to highlight the suitability of generated cells. Users could not detect 52.12% of generated images by DM proofing the feasibility to replace the original cells with synthetic cells without being noticed. △ Less

Submitted 22 November, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 4 pages, 4 figures

arXiv:2211.01323 [pdf, other]

Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems

Authors: Kai Packhäuser, Lukas Folle, Florian Thamm, Andreas Maier

Abstract: The availability of large-scale chest X-ray datasets is a requirement for develo** well-performing deep learning-based algorithms in thoracic abnormality detection and classification. However, biometric identifiers in chest radiographs hinder the public sharing of such data for research purposes due to the risk of patient re-identification. To counteract this issue, synthetic data generation off… ▽ More The availability of large-scale chest X-ray datasets is a requirement for develo** well-performing deep learning-based algorithms in thoracic abnormality detection and classification. However, biometric identifiers in chest radiographs hinder the public sharing of such data for research purposes due to the risk of patient re-identification. To counteract this issue, synthetic data generation offers a solution for anonymizing medical images. This work employs a latent diffusion model to synthesize an anonymous chest X-ray dataset of high-quality class-conditional images. We propose a privacy-enhancing sampling strategy to ensure the non-transference of biometric information during the image generation process. The quality of the generated images and the feasibility of serving as exclusive training data are evaluated on a thoracic abnormality classification task. Compared to a real classifier, we achieve competitive results with a performance gap of only 3.5% in the area under the receiver operating characteristic curve. △ Less

Submitted 4 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.01111 [pdf, other]

On the Benefit of Dual-domain Denoising in a Self-supervised Low-dose CT Setting

Authors: Fabian Wagner, Mareike Thies, Laura Pfaff, Oliver Aust, Sabrina Pechmann, Daniela Weidner, Noah Maul, Maximilian Rohleder, Mingxuan Gu, Jonas Utz, Felix Denzinger, Andreas Maier

Abstract: Computed tomography (CT) is routinely used for three-dimensional non-invasive imaging. Numerous data-driven image denoising algorithms were proposed to restore image quality in low-dose acquisitions. However, considerably less research investigates methods already intervening in the raw detector data due to limited access to suitable projection data or correct reconstruction algorithms. In this wo… ▽ More Computed tomography (CT) is routinely used for three-dimensional non-invasive imaging. Numerous data-driven image denoising algorithms were proposed to restore image quality in low-dose acquisitions. However, considerably less research investigates methods already intervening in the raw detector data due to limited access to suitable projection data or correct reconstruction algorithms. In this work, we present an end-to-end trainable CT reconstruction pipeline that contains denoising operators in both the projection and the image domain and that are optimized simultaneously without requiring ground-truth high-dose CT data. Our experiments demonstrate that including an additional projection denoising operator improved the overall denoising performance by 82.4-94.1%/12.5-41.7% (PSNR/SSIM) on abdomen CT and 1.5-2.9%/0.4-0.5% (PSNR/SSIM) on XRM data relative to the low-dose baseline. We make our entire helical CT reconstruction framework publicly available that contains a raw projection rebinning step to render helical projection data suitable for differentiable fan-beam reconstruction operators and end-to-end learning. △ Less

Submitted 3 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.07611 [pdf, other]

Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion

Authors: Srikrishna Jaganathan, Maximilian Kukla, Jian Wang, Karthik Shetty, Andreas Maier

Abstract: Deep Learning-based 2D/3D registration enables fast, robust, and accurate X-ray to CT image fusion when large annotated paired datasets are available for training. However, the need for paired CT volume and X-ray images with ground truth registration limits the applicability in interventional scenarios. An alternative is to use simulated X-ray projections from CT volumes, thus removing the need fo… ▽ More Deep Learning-based 2D/3D registration enables fast, robust, and accurate X-ray to CT image fusion when large annotated paired datasets are available for training. However, the need for paired CT volume and X-ray images with ground truth registration limits the applicability in interventional scenarios. An alternative is to use simulated X-ray projections from CT volumes, thus removing the need for paired annotated datasets. Deep Neural Networks trained exclusively on simulated X-ray projections can perform significantly worse on real X-ray images due to the domain gap. We propose a self-supervised 2D/3D registration framework combining simulated training with unsupervised feature and pixel space domain adaptation to overcome the domain gap and eliminate the need for paired annotated datasets. Our framework achieves a registration accuracy of 1.83$\pm$1.16 mm with a high success ratio of 90.1% on real X-ray images showing a 23.9% increase in success ratio compared to reference annotation-free algorithms. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted at WACV 2023

arXiv:2209.14970 [pdf, other]

doi 10.1109/ISSCS52333.2021.9497438

3D Rendering Framework for Data Augmentation in Optical Character Recognition

Authors: Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess, Jürgen Seiler, André Kaup

Abstract: In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the availa… ▽ More In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: IEEE International Symposium on Signals, Circuits and Systems (ISSCS), 1-4, July 2021

arXiv:2209.14448 [pdf, other]

Synthesizing Annotated Image and Video Data Using a Rendering-Based Pipeline for Improved License Plate Recognition

Authors: Andreas Spruck, Maximilane Gruber, Anatol Maier, Denise Moussa, Jürgen Seiler, Christian Riess, André Kaup

Abstract: An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating… ▽ More An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating and annotating synthetic and partly-real image and video data in a fully automatic procedure. Moreover, the pipeline can aid the acquisition of real data. The proposed pipeline is based on a rendering process. This process generates synthetic data. Partly-real data bring the synthetic sequences closer to reality by incorporating real cameras during the acquisition process. The benefits of the proposed data generation pipeline, especially for machine learning scenarios with limited available training data, are demonstrated by an extensive experimental validation in the context of automatic license plate recognition. The experiments demonstrate a significant reduction of the character error rate and miss rate from 73.74% and 100% to 14.11% and 41.27% respectively, compared to an OCR algorithm trained on a real data set solely. These improvements are achieved by training the algorithm on synthesized data solely. When additionally incorporating real data, the error rates can be decreased further. Thereby, the character error rate and miss rate can be reduced to 11.90% and 39.88% respectively. All data used during the experiments as well as the proposed rendering-based pipeline for the automated data generation is made publicly available under (URL will be revealed upon publication). △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: submitted to IEEE Transactions on Intelligent Transportation Systems

arXiv:2209.11531 [pdf, other]

Deep Learning-based Anonymization of Chest Radiographs: A Utility-preserving Measure for Patient Privacy

Authors: Kai Packhäuser, Sebastian Gündel, Florian Thamm, Felix Denzinger, Andreas Maier

Abstract: Robust and reliable anonymization of chest radiographs constitutes an essential step before publishing large datasets of such for research purposes. The conventional anonymization process is carried out by obscuring personal information in the images with black boxes and removing or replacing meta-information. However, such simple measures retain biometric information in the chest radiographs, all… ▽ More Robust and reliable anonymization of chest radiographs constitutes an essential step before publishing large datasets of such for research purposes. The conventional anonymization process is carried out by obscuring personal information in the images with black boxes and removing or replacing meta-information. However, such simple measures retain biometric information in the chest radiographs, allowing patients to be re-identified by a linkage attack. Therefore, there is an urgent need to obfuscate the biometric information appearing in the images. We propose the first deep learning-based approach (PriCheXy-Net) to targetedly anonymize chest radiographs while maintaining data utility for diagnostic and machine learning purposes. Our model architecture is a composition of three independent neural networks that, when collectively used, allow for learning a deformation field that is able to impede patient re-identification. Quantitative results on the ChestX-ray14 dataset show a reduction of patient re-identification from 81.8% to 57.7% (AUC) after re-training with little impact on the abnormality classification performance. This indicates the ability to preserve underlying abnormality patterns while increasing patient privacy. Lastly, we compare our proposed anonymization approach with two other obfuscation-based methods (Privacy-Net, DP-Pix) and demonstrate the superiority of our method towards resolving the privacy-utility trade-off for chest radiographs. △ Less

Submitted 24 July, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Accepted at MICCAI 2023

arXiv:2209.09733 [pdf, other]

Metal Inpainting in CBCT Projections Using Score-based Generative Model

Authors: Siyuan Mei, Fuxin Fan, Andreas Maier

Abstract: During orthopaedic surgery, the inserting of metallic implants or screws are often performed under mobile C-arm systems. Due to the high attenuation of metals, severe metal artifacts occur in 3D reconstructions, which degrade the image quality greatly. To reduce the artifacts, many metal artifact reduction algorithms have been developed and metal inpainting in projection domain is an essential ste… ▽ More During orthopaedic surgery, the inserting of metallic implants or screws are often performed under mobile C-arm systems. Due to the high attenuation of metals, severe metal artifacts occur in 3D reconstructions, which degrade the image quality greatly. To reduce the artifacts, many metal artifact reduction algorithms have been developed and metal inpainting in projection domain is an essential step. In this work, a score-based generative model is trained on simulated knee projections and the inpainted image is obtained by removing the noise in conditional resampling process. The result implies that the inpainted images by score-based generative model have more detailed information and achieve the lowest mean absolute error and the highest peak-signal-to-noise-ratio compared with interpolation and CNN based method. Besides, the score-based model can also recover projections with big circlar and rectangular masks, showing its generalization in inpainting task. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.07232 [pdf, other]

A Spatiotemporal Model for Precise and Efficient Fully-automatic 3D Motion Correction in OCT

Authors: Stefan Ploner, Siyu Chen, Jungeun Won, Lennart Husvogt, Katharina Breininger, Julia Schottenhamml, James Fujimoto, Andreas Maier

Abstract: Optical coherence tomography (OCT) is a micrometer-scale, volumetric imaging modality that has become a clinical standard in ophthalmology. OCT instruments image by raster-scanning a focused light spot across the retina, acquiring sequential cross-sectional images to generate volumetric data. Patient eye motion during the acquisition poses unique challenges: Non-rigid, discontinuous distortions ca… ▽ More Optical coherence tomography (OCT) is a micrometer-scale, volumetric imaging modality that has become a clinical standard in ophthalmology. OCT instruments image by raster-scanning a focused light spot across the retina, acquiring sequential cross-sectional images to generate volumetric data. Patient eye motion during the acquisition poses unique challenges: Non-rigid, discontinuous distortions can occur, leading to gaps in data and distorted topographic measurements. We present a new distortion model and a corresponding fully-automatic, reference-free optimization strategy for computational motion correction in orthogonally raster-scanned, retinal OCT volumes. Using a novel, domain-specific spatiotemporal parametrization of forward-war** displacements, eye motion can be corrected continuously for the first time. Parameter estimation with temporal regularization improves robustness and accuracy over previous spatial approaches. We correct each A-scan individually in 3D in a single map**, including repeated acquisitions used in OCT angiography protocols. Specialized 3D forward image war** reduces median runtime to < 9 s, fast enough for clinical use. We present a quantitative evaluation on 18 subjects with ocular pathology and demonstrate accurate correction during microsaccades. Transverse correction is limited only by ocular tremor, whereas submicron repeatability is achieved axially (0.51 um median of medians), representing a dramatic improvement over previous work. This allows assessing longitudinal changes in focal retinal pathologies as a marker of disease progression or treatment response, and promises to enable multiple new capabilities such as supersampled/super-resolution volume reconstruction and analysis of pathological eye motion occuring in neurological diseases. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Presented at MICCAI 2022 (main conference). The arXiv version provides full quality figures. 9 pages content (5 figures) + 2 pages references + 2 pages supplementary material (2 figures)

arXiv:2208.13224 [pdf]

doi 10.3389/fonc.2023.1115258

Deep learning for automatic head and neck lymph node level delineation provides expert-level accuracy

Authors: Thomas Weissmann, Yixing Huang, Stefan Fischer, Johannes Roesch, Sina Mansoorian, Horacio Ayala Gaona, Antoniu-Oreste Gostian, Markus Hecht, Sebastian Lettmaier, Lisa Deloch, Benjamin Frey, Udo S. Gaipl, Luitpold V. Distel, Andreas Maier, Heinrich Iro, Sabine Semrau, Christoph Bert, Rainer Fietkau, Florian Putz

Abstract: Background: Deep learning (DL)-based head and neck lymph node level (HN_LNL) autodelineation is of high relevance to radiotherapy research and clinical treatment planning but still underinvestigated in academic literature. Methods: An expert-delineated cohort of 35 planning CTs was used for training of an nnU-net 3D-fullres/2D-ensemble model for autosegmentation of 20 different HN_LNL. A second co… ▽ More Background: Deep learning (DL)-based head and neck lymph node level (HN_LNL) autodelineation is of high relevance to radiotherapy research and clinical treatment planning but still underinvestigated in academic literature. Methods: An expert-delineated cohort of 35 planning CTs was used for training of an nnU-net 3D-fullres/2D-ensemble model for autosegmentation of 20 different HN_LNL. A second cohort acquired at the same institution later in time served as the test set (n=20). In a completely blinded evaluation, 3 clinical experts rated the quality of DL autosegmentations in a head-to-head comparison with expert-created contours. For a subgroup of 10 cases, intraobserver variability was compared to the average DL autosegmentation accuracy on the original and recontoured set of expert segmentations. A postprocessing step to adjust craniocaudal boundaries of level autosegmentations to the CT slice plane was introduced and the effect on geometric accuracy and expert rating was investigated. Results: Blinded expert ratings for DL segmentations and expert-created contours were not significantly different. DL segmentations with slice plane adjustment were rated numerically higher (mean, 81.0 vs. 79.6,p=0.185) and DL segmentations without slice plane adjustment were rated numerically lower (77.2 vs. 79.6,p=0.167) than manually drawn contours. DL segmentations with CT slice plane adjustment were rated significantly better than DL contours without slice plane adjustment (81.0 vs. 77.2,p=0.004). Geometric accuracy of DL segmentations was not different from intraobserver variability (mean, 0.76 vs. 0.77, p=0.307). Conclusions: We show that a nnU-net 3D-fullres/2D-ensemble model can be used for highly accurate autodelineation of HN_LNL using only a limited training dataset that is ideally suited for large-scale standardized autodelineation of HN_LNL in the research setting. △ Less

Submitted 1 March, 2023; v1 submitted 28 August, 2022; originally announced August 2022.

Comments: 14 pages, 6 figures, published in Frontiers in Oncology

Journal ref: Front. Oncol. 13:1115258

arXiv:2207.14650 [pdf]

SYNTA: A novel approach for deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data

Authors: Leonid Mill, Oliver Aust, Jochen A. Ackermann, Philipp Burger, Monica Pascual, Katrin Palumbo-Zerr, Gerhard Krönke, Stefan Uderhardt, Georg Schett, Christoph S. Clemen, Rolf Schröder, Christian Holtzhausen, Samir Jabari, Andreas Maier, Anika Grüneboom

Abstract: Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic dat… ▽ More Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic data) as a novel approach for the generation of synthetic, photo-realistic, and highly complex biomedical images as training data for DL systems. We show the versatility of our approach in the context of muscle fiber and connective tissue analysis in histological sections. We demonstrate that it is possible to perform robust and expert-level segmentation tasks on previously unseen real-world data, without the need for manual annotations using synthetic training data alone. Being a fully parametric technique, our approach poses an interpretable and controllable alternative to Generative Adversarial Networks (GANs) and has the potential to significantly accelerate quantitative image analysis in a variety of biomedical applications in microscopy and beyond. △ Less

Submitted 3 January, 2024; v1 submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.07368 [pdf, other]

doi 10.1038/s41598-022-22530-4

Trainable Joint Bilateral Filters for Enhanced Prediction Stability in Low-dose CT

Authors: Fabian Wagner, Mareike Thies, Felix Denzinger, Mingxuan Gu, Mayank Patwari, Stefan Ploner, Noah Maul, Laura Pfaff, Yixing Huang, Andreas Maier

Abstract: Low-dose computed tomography (CT) denoising algorithms aim to enable reduced patient dose in routine CT acquisitions while maintaining high image quality. Recently, deep learning~(DL)-based methods were introduced, outperforming conventional denoising algorithms on this task due to their high model capacity. However, for the transition of DL-based denoising to clinical practice, these data-driven… ▽ More Low-dose computed tomography (CT) denoising algorithms aim to enable reduced patient dose in routine CT acquisitions while maintaining high image quality. Recently, deep learning~(DL)-based methods were introduced, outperforming conventional denoising algorithms on this task due to their high model capacity. However, for the transition of DL-based denoising to clinical practice, these data-driven approaches must generalize robustly beyond the seen training data. We, therefore, propose a hybrid denoising approach consisting of a set of trainable joint bilateral filters (JBFs) combined with a convolutional DL-based denoising network to predict the guidance image. Our proposed denoising pipeline combines the high model capacity enabled by DL-based feature extraction with the reliability of the conventional JBF. The pipeline's ability to generalize is demonstrated by training on abdomen CT scans without metal implants and testing on abdomen scans with metal implants as well as on head CT data. When embedding two well-established DL-based denoisers (RED-CNN/QAE) in our pipeline, the denoising performance is improved by $10\,\%$/$82\,\%$ (RMSE) and $3\,\%$/$81\,\%$ (PSNR) in regions containing metal and by $6\,\%$/$78\,\%$ (RMSE) and $2\,\%$/$4\,\%$ (PSNR) on head CT data, compared to the respective vanilla model. Concluding, the proposed trainable JBFs limit the error bound of deep neural networks to facilitate the applicability of DL-based denoisers in low-dose CT pipelines. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Journal ref: Sci.Rep. 12 (2022) 17540

arXiv:2207.02392 [pdf, other]

AutoSpeed: A Linked Autoencoder Approach for Pulse-Echo Speed-of-Sound Imaging for Medical Ultrasound

Authors: Farnaz Khun Jush, Markus Biele, Peter M. Dueppenbecker, Andreas Maier

Abstract: Quantitative ultrasound, e.g., speed-of-sound (SoS) in tissues, provides information about tissue properties that have diagnostic value. Recent studies showed the possibility of extracting SoS information from pulse-echo ultrasound raw data (a.k.a. RF data) using deep neural networks that are fully trained on simulated data. These methods take sensor domain data, i.e., RF data, as input and train… ▽ More Quantitative ultrasound, e.g., speed-of-sound (SoS) in tissues, provides information about tissue properties that have diagnostic value. Recent studies showed the possibility of extracting SoS information from pulse-echo ultrasound raw data (a.k.a. RF data) using deep neural networks that are fully trained on simulated data. These methods take sensor domain data, i.e., RF data, as input and train a network in an end-to-end fashion to learn the implicit map** between the RF data domain and SoS domain. However, such networks are prone to overfitting to simulated data which results in poor performance and instability when tested on measured data. We propose a novel method for SoS map** employing learned representations from two linked autoencoders. We test our approach on simulated and measured data acquired from human breast mimicking phantoms. We show that SoS map** is possible using linked autoencoders. The proposed method has a Mean Absolute Percentage Error (MAPE) of 2.39% on the simulated data. On the measured data, the predictions of the proposed method are close to the expected values with MAPE of 1.1%. Compared to an end-to-end trained network, the proposed method shows higher stability and reproducibility. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: 12 pages, 7 figures, submitted to Medical Image Analysis

arXiv:2206.12320 [pdf, other]

PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis

Authors: Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang

Abstract: This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for develo** a smart speech assistant in ope… ▽ More This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for develo** a smart speech assistant in operating rooms. In particular, it may be used to develop a speech controlled system that enables surgeons to control the operation parameters such as C-arm movements and table positions. In order to record the dataset, we acquired consent by the institutional review board and workers council in the University Hospital Erlangen and by the patients for data privacy. We describe the recording set-up, data structure, workflow and preprocessing steps, and report the first PoCaP Corpus speech recognition analysis results with 11.52 $\%$ word error rate using pretrained models. The findings suggest that the data has the potential to build a robust command recognition system and will allow the development of a novel intervention support systems using speech and image processing in the medical domain. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 Conference

MSC Class: 00b20

arXiv:2205.13297 [pdf, other]

DeepTechnome: Mitigating Unknown Bias in Deep Learning Based Assessment of CT Images

Authors: Simon Langer, Oliver Taubmann, Felix Denzinger, Andreas Maier, Alexander Mühlberg

Abstract: Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogate… ▽ More Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogates that carry information regarding the bias, employ the classifier model to extract features, and suppress biased intermediate features with our custom, modular DecorreLayer. We evaluate our method on a dataset of 952 lung computed tomography scans by introducing simulated biases w.r.t. reconstruction kernel and noise level and propose including an adversarial test set in evaluations of bias reduction techniques. In a moderately sized model architecture, applying the proposed method to learn from data exhibiting a strong bias, it near-perfectly recovers the classification performance observed when training with corresponding unbiased data. △ Less

Submitted 26 May, 2022; originally announced May 2022.

ACM Class: I.2.6; I.4; I.5

arXiv:2205.05474 [pdf, other]

DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio

Authors: Hendrik Schröter, Alberto N. Escalante-B., Tobias Rosenkranz, Andreas Maier

Abstract: Deep learning-based speech enhancement has seen huge improvements and recently also expanded to full band audio (48 kHz). However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e.g. due to temporal convolutions or attention. Both make those approaches not feasible on embedded devices. This work further extends DeepFilterNet, which… ▽ More Deep learning-based speech enhancement has seen huge improvements and recently also expanded to full band audio (48 kHz). However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e.g. due to temporal convolutions or attention. Both make those approaches not feasible on embedded devices. This work further extends DeepFilterNet, which exploits harmonic structure of speech allowing for efficient speech enhancement (SE). Several optimizations in the training procedure, data augmentation, and network structure result in state-of-the-art SE performance while reducing the real-time factor to 0.04 on a notebook Core-i5 CPU. This makes the algorithm applicable to run on embedded devices in real-time. The DeepFilterNet framework can be obtained under an open source license. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to IWAENC 2022

Showing 1–50 of 160 results for author: Maier, A