Search | arXiv e-print repository

SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery

Authors: Joao Cartucho, Alistair Weld, Samyakh Tukra, Haozheng Xu, Hiroki Matsuzaki, Taiyo Ishikawa, Minjun Kwon, Yong Eun Jang, Kwang-Ju Kim, Gwang Lee, Bizhe Bai, Lueder Kahrs, Lars Boecking, Simeon Allmendinger, Leopold Muller, Yitong Zhang, Yueming **, Sophia Bano, Francisco Vasconcelos, Wolfgang Reiter, Jonas Hajek, Bruno Silva, Estevao Lima, Joao L. Vilaca, Sandro Queiros , et al. (1 additional authors not shown)

Abstract: This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated da… ▽ More This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of develo** algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/. △ Less

Submitted 30 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2104.12376 [pdf, other]

Recalibration of Aleatoric and Epistemic Regression Uncertainty in Medical Imaging

Authors: Max-Heinrich Laves, Sontje Ihler, Jacob F. Fast, Lüder A. Kahrs, Tobias Ortmaier

Abstract: The consideration of predictive uncertainty in medical imaging with deep learning is of utmost importance. We apply estimation of both aleatoric and epistemic uncertainty by variational Bayesian inference with Monte Carlo dropout to regression tasks and show that predictive uncertainty is systematically underestimated. We apply $ σ$ scaling with a single scalar value; a simple, yet effective calib… ▽ More The consideration of predictive uncertainty in medical imaging with deep learning is of utmost importance. We apply estimation of both aleatoric and epistemic uncertainty by variational Bayesian inference with Monte Carlo dropout to regression tasks and show that predictive uncertainty is systematically underestimated. We apply $ σ$ scaling with a single scalar value; a simple, yet effective calibration method for both types of uncertainty. The performance of our approach is evaluated on a variety of common medical regression data sets using different state-of-the-art convolutional network architectures. In our experiments, $ σ$ scaling is able to reliably recalibrate predictive uncertainty. It is easy to implement and maintains the accuracy. Well-calibrated uncertainty in regression allows robust rejection of unreliable predictions or detection of out-of-distribution samples. Our source code is available at https://github.com/mlaves/well-calibrated-regression-uncertainty △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org

arXiv:1904.00790 [pdf, other]

Retinal OCT disease classification with variational autoencoder regularization

Authors: Max-Heinrich Laves, Sontje Ihler, Lüder A. Kahrs, Tobias Ortmaier

Abstract: According to the World Health Organization, 285 million people worldwide live with visual impairment. The most commonly used imaging technique for diagnosis in ophthalmology is optical coherence tomography (OCT). However, analysis of retinal OCT requires trained ophthalmologists and time, making a comprehensive early diagnosis unlikely. A recent study established a diagnostic tool based on convolu… ▽ More According to the World Health Organization, 285 million people worldwide live with visual impairment. The most commonly used imaging technique for diagnosis in ophthalmology is optical coherence tomography (OCT). However, analysis of retinal OCT requires trained ophthalmologists and time, making a comprehensive early diagnosis unlikely. A recent study established a diagnostic tool based on convolutional neural networks (CNN), which was trained on a large database of retinal OCT images. The performance of the tool in classifying retinal conditions was on par to that of trained medical experts. However, the training of these networks is based on an enormous amount of labeled data, which is expensive and difficult to obtain. Therefore, this paper describes a method based on variational autoencoder regularization that improves classification performance when using a limited amount of labeled data. This work uses a two-path CNN model combining a classification network with an autoencoder (AE) for regularization. The key idea behind this is to prevent overfitting when using a limited training dataset size with small number of patients. Results show superior classification performance compared to a pre-trained and fully fine-tuned baseline ResNet-34. Clustering of the latent space in relation to the disease class is distinct. Neural networks for disease classification on OCTs can benefit from regularization using variational autoencoders when trained with limited amount of patient data. Especially in the medical imaging domain, data annotated by experts is expensive to obtain. △ Less

Submitted 23 March, 2019; originally announced April 2019.

Comments: Accepted for publication at 33rd international conference on Computer Assisted Radiology and Surgery (CARS) 2019

arXiv:1903.09809 [pdf, other]

Semantic denoising autoencoders for retinal optical coherence tomography

Authors: Max-Heinrich Laves, Sontje Ihler, Lüder Alexander Kahrs, Tobias Ortmaier

Abstract: Noise in speckle-prone optical coherence tomography tends to obfuscate important details necessary for medical diagnosis. In this paper, a denoising approach that preserves disease characteristics on retinal optical coherence tomography images in ophthalmology is presented. By combining a deep convolutional autoencoder with a priorly trained ResNet image classifier as regularizer, the perceptibili… ▽ More Noise in speckle-prone optical coherence tomography tends to obfuscate important details necessary for medical diagnosis. In this paper, a denoising approach that preserves disease characteristics on retinal optical coherence tomography images in ophthalmology is presented. By combining a deep convolutional autoencoder with a priorly trained ResNet image classifier as regularizer, the perceptibility of delicate details is encouraged and only information-less background noise is filtered out. With our approach, higher peak signal-to-noise ratios with $ \mathrm{PSNR} = 31.2\,\mathrm{dB} $ and higher classification accuracy of $\mathrm{ACC} = 85.0\,\%$ can be achieved for denoised images compared to state-of-the-art denoising with $ \mathrm{PSNR} = 29.4\,\mathrm{dB} $ or $\mathrm{ACC} = 70.3\,\%$, depending on the method. It is shown that regularized autoencoders are capable of denoising retinal OCT images without blurring details of diseases. △ Less

Submitted 23 March, 2019; originally announced March 2019.

Comments: Accepted for publication at the SPIE/OSA European Conferences on Biomedical Optics (ECBO) 2019

arXiv:1901.06490 [pdf, other]

Endoscopic vs. volumetric OCT imaging of mastoid bone structure for pose estimation in minimally invasive cochlear implant surgery

Authors: Max-Heinrich Laves, Sarah Latus, Jan Bergmeier, Tobias Ortmaier, Lüder A. Kahrs, Alexander Schlaefer

Abstract: Purpose: The facial recess is a delicate structure that must be protected in minimally invasive cochlear implant surgery. Current research estimates the drill trajectory by using endoscopy of the unique mastoid patterns. However, missing depth information limits available features for a registration to preoperative CT data. Therefore, this paper evaluates OCT for enhanced imaging of drill holes in… ▽ More Purpose: The facial recess is a delicate structure that must be protected in minimally invasive cochlear implant surgery. Current research estimates the drill trajectory by using endoscopy of the unique mastoid patterns. However, missing depth information limits available features for a registration to preoperative CT data. Therefore, this paper evaluates OCT for enhanced imaging of drill holes in mastoid bone and compares OCT data to original endoscopic images. Methods: A catheter-based OCT probe is inserted into a drill trajectory of a mastoid phantom in a translation-rotation manner to acquire the inner surface state. The images are undistorted and stitched to create volumentric data of the drill hole. The mastoid cell pattern is segmented automatically and compared to ground truth. Results: The mastoid pattern segmented on images acquired with OCT show a similarity of J = 73.6 % to ground truth based on endoscopic images and measured with the Jaccard metric. Leveraged by additional depth information, automated segmentation tends to be more robust and fail-safe compared to endoscopic images. Conclusion: The feasibility of using a clinically approved OCT probe for imaging the drill hole in cochlear implantation is shown. The resulting volumentric images provide additional information on the shape of caveties in the bone structure, which will be useful for image-to-patient registration and to estimate the drill trajectory. This will be another step towards safe minimally invasive cochlear implantation. △ Less

Submitted 23 March, 2019; v1 submitted 19 January, 2019; originally announced January 2019.

Comments: Accepted for publication at the 33rd international conference on Computer Assisted Radiology and Surgery (CARS) 2019

arXiv:1810.11205 [pdf, other]

Deep learning based 2.5D flow field estimation for maximum intensity projections of 4D optical coherence tomography

Authors: Max-Heinrich Laves, Lüder A. Kahrs, Tobias Ortmaier

Abstract: In microsurgery, lasers have emerged as precise tools for bone ablation. A challenge is automatic control of laser bone ablation with 4D optical coherence tomography (OCT). OCT as high resolution imaging modality provides volumetric images of tissue and foresees information of bone position and orientation (pose) as well as thickness. However, existing approaches for OCT based laser ablation contr… ▽ More In microsurgery, lasers have emerged as precise tools for bone ablation. A challenge is automatic control of laser bone ablation with 4D optical coherence tomography (OCT). OCT as high resolution imaging modality provides volumetric images of tissue and foresees information of bone position and orientation (pose) as well as thickness. However, existing approaches for OCT based laser ablation control rely on external tracking systems or invasively ablated artificial landmarks for tracking the pose of the OCT probe relative to the tissue. This can be superseded by estimating the scene flow caused by relative movement between OCT-based laser ablation system and patient. Therefore, this paper deals with 2.5D scene flow estimation of volumetric OCT images for application in laser ablation. We present a semi-supervised convolutional neural network based tracking scheme for subsequent 3D OCT volumes and apply it to a realistic semi-synthetic data set of ex vivo human temporal bone specimen. The scene flow is estimated in a two-stage approach. In the first stage, 2D lateral scene flow is computed on census-transformed en-face arguments-of-maximum intensity projections. Subsequent to this, the projections are warped by predicted lateral flow and 1D depth flow is estimated. The neural network is trained semi-supervised by combining error to ground truth and the reconstruction error of warped images with assumptions of spatial flow smoothness. Quantitative evaluation reveals a mean endpoint error of $ (4.7\pm{}3.5) $ voxel or $ 27.5 \pm 20.5 μ\mathrm{m} $ for scene flow estimation caused by simulated relative movement between the OCT probe and bone. The scene flow estimation for 4D OCT enables its use for markerless tracking of mastoid bone structures for image guidance in general, and automated laser ablation control. △ Less

Submitted 18 February, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

Comments: Accepted for SPIE Medical Imaging 2019

arXiv:1807.06081 [pdf, other]

A Dataset of Laryngeal Endoscopic Images with Comparative Study on Convolution Neural Network Based Semantic Segmentation

Authors: Max-Heinrich Laves, Jens Bicker, Lüder A. Kahrs, Tobias Ortmaier

Abstract: Purpose Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer and robot aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing… ▽ More Purpose Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer and robot aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration. Methods Four machine learning based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning. Results In this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7 %. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning. Conclusion CNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data. △ Less

Submitted 21 September, 2020; v1 submitted 16 July, 2018; originally announced July 2018.

Comments: Accepted for publication in International Journal of Computer Assisted Radiology and Surgery

Showing 1–7 of 7 results for author: Kahrs, L