Search | arXiv e-print repository

A comparative study between vision transformers and CNNs in digital pathology

Authors: Luca Deininger, Bernhard Stimpel, Anil Yuce, Samaneh Abbasi-Sureshjani, Simon Schönenberger, Paolo Ocampo, Konstanty Korski, Fabien Gaire

Abstract: Recently, vision transformers were shown to be capable of outperforming convolutional neural networks when pretrained on sufficient amounts of data. In comparison to convolutional neural networks, vision transformers have a weaker inductive bias and therefore allow a more flexible feature detection. Due to their promising feature detection, this work explores vision transformers for tumor detectio… ▽ More Recently, vision transformers were shown to be capable of outperforming convolutional neural networks when pretrained on sufficient amounts of data. In comparison to convolutional neural networks, vision transformers have a weaker inductive bias and therefore allow a more flexible feature detection. Due to their promising feature detection, this work explores vision transformers for tumor detection in digital pathology whole slide images in four tissue types, and for tissue type identification. We compared the patch-wise classification performance of the vision transformer DeiT-Tiny to the state-of-the-art convolutional neural network ResNet18. Due to the sparse availability of annotated whole slide images, we further compared both models pretrained on large amounts of unlabeled whole-slide images using state-of-the-art self-supervised approaches. The results show that the vision transformer performed slightly better than the ResNet18 for three of four tissue types for tumor detection while the ResNet18 performed slightly better for the remaining tasks. The aggregated predictions of both models on slide level were correlated, indicating that the models captured similar imaging features. All together, the vision transformer models performed on par with the ResNet18 while requiring more effort to train. In order to surpass the performance of convolutional neural networks, vision transformers might require more challenging tasks to benefit from their weak inductive bias. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 8 pages, 2 figures, accepted for workshop T4Vision (CVPR 2022)

ACM Class: I.4.9; I.5.4

arXiv:2005.03472 [pdf, other]

doi 10.1007/978-3-658-29267-6_26

Fully-automatic CT data preparation for interventional X-ray skin dose simulation

Authors: Philipp Roser, Annette Birkhold, Alexander Preuhs, Bernhard Stimpel, Christopher Syben, Norbert Strobel, Markus Kowarschik, Rebecca Fahrig, Andreas Maier

Abstract: Recently, deep learning (DL) found its way to interventional X-ray skin dose estimation. While its performance was found to be acceptable, even more accurate results could be achieved if more data sets were available for training. One possibility is to turn to computed tomography (CT) data sets. Typically, computed tomography (CT) scans can be mapped to tissue labels and mass densities to obtain t… ▽ More Recently, deep learning (DL) found its way to interventional X-ray skin dose estimation. While its performance was found to be acceptable, even more accurate results could be achieved if more data sets were available for training. One possibility is to turn to computed tomography (CT) data sets. Typically, computed tomography (CT) scans can be mapped to tissue labels and mass densities to obtain training data. However, care has to be taken to make sure that the different clinical settings are properly accounted for. First, the interventional environment is characterized by wide variety of table setups that are significantly different from the typical patient tables used in conventional CT. This cannot be ignored, since tables play a crucial role in sound skin dose estimation in an interventional setup, e. g., when the X-ray source is directly underneath a patient (posterior-anterior view). Second, due to interpolation errors, most CT scans do not facilitate a clean segmentation of the skin border. As a solution to these problems, we applied connected component labeling (CCL) and Canny edge detection to (a) robustly separate the patient from the table and (b) to identify the outermost skin layer. Our results show that these extensions enable fully-automatic, generalized pre-processing of CT scans for further simulation of both skin dose and corresponding X-ray projections. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 6 pages, 4 figures, Bildverarbeitung für die Medizin 2020, code will be accessible soon (url)

arXiv:1911.13162 [pdf, other]

Deep autofocus with cone-beam CT consistency constraint

Authors: Alexander Preuhs, Michael Manhart, Philipp Roser, Bernhard Stimpel, Christopher Syben, Marios Psychogios, Markus Kowarschik, Andreas Maier

Abstract: High quality reconstruction with interventional C-arm cone-beam computed tomography (CBCT) requires exact geometry information. If the geometry information is corrupted, e. g., by unexpected patient or system movement, the measured signal is misplaced in the backprojection operation. With prolonged acquisition times of interventional C-arm CBCT the likelihood of rigid patient motion increases. To… ▽ More High quality reconstruction with interventional C-arm cone-beam computed tomography (CBCT) requires exact geometry information. If the geometry information is corrupted, e. g., by unexpected patient or system movement, the measured signal is misplaced in the backprojection operation. With prolonged acquisition times of interventional C-arm CBCT the likelihood of rigid patient motion increases. To adapt the backprojection operation accordingly, a motion estimation strategy is necessary. Recently, a novel learning-based approach was proposed, capable of compensating motions within the acquisition plane. We extend this method by a CBCT consistency constraint, which was proven to be efficient for motions perpendicular to the acquisition plane. By the synergistic combination of these two measures, in and out-plane motion is well detectable, achieving an average artifact suppression of 93 [percent]. This outperforms the entropy-based state-of-the-art autofocus measure which achieves on average an artifact suppression of 54 [percent]. △ Less

Submitted 4 December, 2019; v1 submitted 29 November, 2019; originally announced November 2019.

Comments: Accepted at BVM 2020, review score under Top-6 of the conference

arXiv:1911.08163 [pdf, other]

Projection-to-Projection Translation for Hybrid X-ray and Magnetic Resonance Imaging

Authors: Bernhard Stimpel, Christopher Syben, Tobias Würfl, Katharina Breininger, Philipp Hoelter, Arnd Dörfler, Andreas Maier

Abstract: Hybrid X-ray and magnetic resonance (MR) imaging promises large potential in interventional medical imaging applications due to the broad variety of contrast of MRI combined with fast imaging of X-ray-based modalities. To fully utilize the potential of the vast amount of existing image enhancement techniques, the corresponding information from both modalities must be present in the same domain. Fo… ▽ More Hybrid X-ray and magnetic resonance (MR) imaging promises large potential in interventional medical imaging applications due to the broad variety of contrast of MRI combined with fast imaging of X-ray-based modalities. To fully utilize the potential of the vast amount of existing image enhancement techniques, the corresponding information from both modalities must be present in the same domain. For image-guided interventional procedures, X-ray fluoroscopy has proven to be the modality of choice. Synthesizing one modality from another in this case is an ill-posed problem due to ambiguous signal and overlap** structures in projective geometry. To take on these challenges, we present a learning-based solution to MR to X-ray projection-to-projection translation. We propose an image generator network that focuses on high representation capacity in higher resolution layers to allow for accurate synthesis of fine details in the projection images. Additionally, a weighting scheme in the loss computation that favors high-frequency structures is proposed to focus on the important details and contours in projection imaging. The proposed extensions prove valuable in generating X-ray projection images with natural appearance. Our approach achieves a deviation from the ground truth of only $6$% and structural similarity measure of $0.913\,\pm\,0.005$. In particular the high frequency weighting assists in generating projection images with sharp appearance and reduces erroneously synthesized fine details. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1911.07731 [pdf, other]

doi 10.1109/TMI.2019.2955184

Multi-modal Deep Guided Filtering for Comprehensible Medical Image Processing

Authors: Bernhard Stimpel, Christopher Syben, Franziska Schirrmacher, Philipp Hoelter, Arnd Dörfler, Andreas Maier

Abstract: Deep learning-based image processing is capable of creating highly appealing results. However, it is still widely considered as a "blackbox" transformation. In medical imaging, this lack of comprehensibility of the results is a sensitive issue. The integration of known operators into the deep learning environment has proven to be advantageous for the comprehensibility and reliability of the comput… ▽ More Deep learning-based image processing is capable of creating highly appealing results. However, it is still widely considered as a "blackbox" transformation. In medical imaging, this lack of comprehensibility of the results is a sensitive issue. The integration of known operators into the deep learning environment has proven to be advantageous for the comprehensibility and reliability of the computations. Consequently, we propose the use of the locally linear guided filter in combination with a learned guidance map for general purpose medical image processing. The output images are only processed by the guided filter while the guidance map can be trained to be task-optimal in an end-to-end fashion. We investigate the performance based on two popular tasks: image super resolution and denoising. The evaluation is conducted based on pairs of multi-modal magnetic resonance imaging and cross-modal computed tomography and magnetic resonance imaging datasets. For both tasks, the proposed approach is on par with state-of-the-art approaches. Additionally, we can show that the input image's content is almost unchanged after the processing which is not the case for conventional deep learning approaches. On top, the proposed pipeline offers increased robustness against degraded input as well as adversarial attacks. △ Less

Submitted 28 May, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

Journal ref: IEEE Transactions on Medical Imaging, vol. 39, no. 5, pp. 1703-1711, May 2020

arXiv:1910.04254 [pdf, other]

Image Quality Assessment for Rigid Motion Compensation

Authors: Alexander Preuhs, Michael Manhart, Philipp Roser, Bernhard Stimpel, Christopher Syben, Marios Psychogios, Markus Kowarschik, Andreas Maier

Abstract: Diagnostic stroke imaging with C-arm cone-beam computed tomography (CBCT) enables reduction of time-to-therapy for endovascular procedures. However, the prolonged acquisition time compared to helical CT increases the likelihood of rigid patient motion. Rigid motion corrupts the geometry alignment assumed during reconstruction, resulting in image blurring or streaking artifacts. To reestablish the… ▽ More Diagnostic stroke imaging with C-arm cone-beam computed tomography (CBCT) enables reduction of time-to-therapy for endovascular procedures. However, the prolonged acquisition time compared to helical CT increases the likelihood of rigid patient motion. Rigid motion corrupts the geometry alignment assumed during reconstruction, resulting in image blurring or streaking artifacts. To reestablish the geometry, we estimate the motion trajectory by an autofocus method guided by a neural network, which was trained to regress the reprojection error, based on the image information of a reconstructed slice. The network was trained with CBCT scans from 19 patients and evaluated using an additional test patient. It adapts well to unseen motion amplitudes and achieves superior results in a motion estimation benchmark compared to the commonly used entropy-based method. △ Less

Submitted 29 November, 2019; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted at MedNeurips 2019

Showing 1–6 of 6 results for author: Stimpel, B