Search | arXiv e-print repository

arXiv:2403.19265 [pdf, other]

Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips

Authors: Beerend G. A. Gerats, Jelmer M. Wolterink, Seb P. Mol, Ivo A. M. J. Broeders

Abstract: Laparoscopic video tracking primarily focuses on two target types: surgical instruments and anatomy. The former could be used for skill assessment, while the latter is necessary for the projection of virtual overlays. Where instrument and anatomy tracking have often been considered two separate problems, in this paper, we propose a method for joint tracking of all structures simultaneously. Based… ▽ More Laparoscopic video tracking primarily focuses on two target types: surgical instruments and anatomy. The former could be used for skill assessment, while the latter is necessary for the projection of virtual overlays. Where instrument and anatomy tracking have often been considered two separate problems, in this paper, we propose a method for joint tracking of all structures simultaneously. Based on a single 2D monocular video clip, we train a neural field to represent a continuous spatiotemporal scene, used to create 3D tracks of all surfaces visible in at least one frame. Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy. Therefore, we propose enhanced class weighting to improve the instrument tracks. We evaluate tracking on video clips from laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments. Additionally, we assess the quality of depth maps obtained from the method's scene reconstructions. We show that these pseudo-depths have comparable quality to a state-of-the-art pre-trained depth estimator. On laparoscopic videos in the SCARED dataset, the method predicts depth with an MAE of 2.9 mm and a relative error of 9.2%. These results show the feasibility of using neural fields for monocular 3D reconstruction of laparoscopic scenes. △ Less

Submitted 28 March, 2024; originally announced March 2024.

ACM Class: I.4.5; I.4.9; I.4.10

arXiv:2308.14369 [pdf, other]

Improving Lesion Volume Measurements on Digital Mammograms

Authors: Nikita Moriakov, Jim Peters, Ritse Mann, Nico Karssemeijer, Jos van Dijck, Mireille Broeders, Jonas Teuwen

Abstract: Lesion volume is an important predictor for prognosis in breast cancer. We make a step towards a more accurate lesion volume measurement on digital mammograms by develo** a model that allows to estimate lesion volumes on processed mammograms, which are the images routinely used by radiologists in clinical practice as well as in breast cancer screening and are available in medical centers. Proces… ▽ More Lesion volume is an important predictor for prognosis in breast cancer. We make a step towards a more accurate lesion volume measurement on digital mammograms by develo** a model that allows to estimate lesion volumes on processed mammograms, which are the images routinely used by radiologists in clinical practice as well as in breast cancer screening and are available in medical centers. Processed mammograms are obtained from raw mammograms, which are the X-ray data coming directly from the scanner, by applying certain vendor-specific non-linear transformations. At the core of our volume estimation method is a physics-based algorithm for measuring lesion volumes on raw mammograms. We subsequently extend this algorithm to processed mammograms via a deep learning image-to-image translation model that produces synthetic raw mammograms from processed mammograms in a multi-vendor setting. We assess the reliability and validity of our method using a dataset of 1778 mammograms with an annotated mass. Firstly, we investigate the correlations between lesion volumes computed from mediolateral oblique and craniocaudal views, with a resulting Pearson correlation of 0.93 [95% confidence interval (CI) 0.92 - 0.93]. Secondly, we compare the resulting lesion volumes from true and synthetic raw data, with a resulting Pearson correlation of 0.998 [95% CI 0.998 - 0.998] . Finally, for a subset of 100 mammograms with a malign mass and concurrent MRI examination available, we analyze the agreement between lesion volume on mammography and MRI, resulting in an intraclass correlation coefficient of 0.81 [95% CI 0.73 - 0.87] for consistency and 0.78 [95% CI 0.66 - 0.86] for absolute agreement. In conclusion, we developed an algorithm to measure mammographic lesion volume that reached excellent reliability and good validity, when using MRI as ground truth. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2211.12436 [pdf, other]

Dynamic Depth-Supervised NeRF for Multi-View RGB-D Operating Room Images

Authors: Beerend G. A. Gerats, Jelmer M. Wolterink, Ivo A. M. J. Broeders

Abstract: The operating room (OR) is an environment of interest for the development of sensing systems, enabling the detection of people, objects, and their semantic relations. Due to frequent occlusions in the OR, these systems often rely on input from multiple cameras. While increasing the number of cameras generally increases algorithm performance, there are hard limitations to the number and locations o… ▽ More The operating room (OR) is an environment of interest for the development of sensing systems, enabling the detection of people, objects, and their semantic relations. Due to frequent occlusions in the OR, these systems often rely on input from multiple cameras. While increasing the number of cameras generally increases algorithm performance, there are hard limitations to the number and locations of cameras in the OR. Neural Radiance Fields (NeRF) can be used to render synthetic views from arbitrary camera positions, virtually enlarging the number of cameras in the dataset. In this work, we explore the use of NeRF for view synthesis of dynamic scenes in the OR, and we show that regularisation with depth supervision from RGB-D sensor data results in higher image quality. We optimise a dynamic depth-supervised NeRF with up to six synchronised cameras that capture the surgical field in five distinct phases before and during a knee replacement surgery. We qualitatively inspect views rendered by a virtual camera that moves 180 degrees around the surgical field at differing time values. Quantitatively, we evaluate view synthesis from an unseen camera position in terms of PSNR, SSIM and LPIPS for the colour channels and in MAE and error percentage for the estimated depth. We find that NeRFs can be used to generate geometrically consistent views, also from interpolated camera positions and at interpolated time intervals. Views are generated from an unseen camera pose with an average PSNR of 18.2 and a depth estimation error of 2.0%. Our results show the potential of a dynamic NeRF for view synthesis in the OR and stress the relevance of depth supervision in a clinical setting. △ Less

Submitted 30 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted to the Workshop on Ambient Intelligence for HealthCare 2023

ACM Class: I.4.5; I.4.9; I.4.10

arXiv:2210.11826 [pdf, other]

doi 10.1080/21681163.2022.2155580

3D Human Pose Estimation in Multi-View Operating Room Videos Using Differentiable Camera Projections

Authors: Beerend G. A. Gerats, Jelmer M. Wolterink, Ivo A. M. J. Broeders

Abstract: 3D human pose estimation in multi-view operating room (OR) videos is a relevant asset for person tracking and action recognition. However, the surgical environment makes it challenging to find poses due to sterile clothing, frequent occlusions, and limited public data. Methods specifically designed for the OR are generally based on the fusion of detected poses in multiple camera views. Typically,… ▽ More 3D human pose estimation in multi-view operating room (OR) videos is a relevant asset for person tracking and action recognition. However, the surgical environment makes it challenging to find poses due to sterile clothing, frequent occlusions, and limited public data. Methods specifically designed for the OR are generally based on the fusion of detected poses in multiple camera views. Typically, a 2D pose estimator such as a convolutional neural network (CNN) detects joint locations. Then, the detected joint locations are projected to 3D and fused over all camera views. However, accurate detection in 2D does not guarantee accurate localisation in 3D space. In this work, we propose to directly optimise for localisation in 3D by training 2D CNNs end-to-end based on a 3D loss that is backpropagated through each camera's projection parameters. Using videos from the MVOR dataset, we show that this end-to-end approach outperforms optimisation in 2D space. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 12 pages, 4 figures, submitted to the 2022 AE-CAI Special Issue of Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization

ACM Class: I.4.8; I.4.9

Journal ref: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 11 (2023) 1197-1205

Showing 1–4 of 4 results for author: Broeders, M