Search | arXiv e-print repository

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Authors: Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta

Abstract: Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the ph… ▽ More Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/ △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 26 pages, 7 tables, 7 figures

arXiv:2303.07264 [pdf, other]

A Surface-normal Based Neural Framework for Colonoscopy Reconstruction

Authors: Shuxian Wang, Yubo Zhang, Sarah K. McGill, Julian G. Rosenman, Jan-Michael Frahm, Soumyadip Sengupta, Stephen M. Pizer

Abstract: Reconstructing a 3D surface from colonoscopy video is challenging due to illumination and reflectivity variation in the video frame that can cause defective shape predictions. Aiming to overcome this challenge, we utilize the characteristics of surface normal vectors and develop a two-step neural framework that significantly improves the colonoscopy reconstruction quality. The normal-based depth i… ▽ More Reconstructing a 3D surface from colonoscopy video is challenging due to illumination and reflectivity variation in the video frame that can cause defective shape predictions. Aiming to overcome this challenge, we utilize the characteristics of surface normal vectors and develop a two-step neural framework that significantly improves the colonoscopy reconstruction quality. The normal-based depth initialization network trained with self-supervised normal consistency loss provides depth map initialization to the normal-depth refinement module, which utilizes the relationship between illumination and surface normals to refine the frame-wise normal and depth predictions recursively. Our framework's depth accuracy performance on phantom colonoscopy data demonstrates the value of exploiting the surface normals in colonoscopy reconstruction, especially on en face views. Due to its low depth error, the prediction result from our framework will require limited post-processing to be clinically applicable for real-time colonoscopy reconstruction. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted at IPMI 2023; first two authors contributed equally

arXiv:2111.10371 [pdf, other]

ColDE: A Depth Estimation Framework for Colonoscopy Reconstruction

Authors: Yubo Zhang, Jan-Michael Frahm, Samuel Ehrenstein, Sarah K. McGill, Julian G. Rosenman, Shuxian Wang, Stephen M. Pizer

Abstract: One of the key elements of reconstructing a 3D mesh from a monocular video is generating every frame's depth map. However, in the application of colonoscopy video reconstruction, producing good-quality depth estimation is challenging. Neural networks can be easily fooled by photometric distractions or fail to capture the complex shape of the colon surface, predicting defective shapes that result i… ▽ More One of the key elements of reconstructing a 3D mesh from a monocular video is generating every frame's depth map. However, in the application of colonoscopy video reconstruction, producing good-quality depth estimation is challenging. Neural networks can be easily fooled by photometric distractions or fail to capture the complex shape of the colon surface, predicting defective shapes that result in broken meshes. Aiming to fundamentally improve the depth estimation quality for colonoscopy 3D reconstruction, in this work we have designed a set of training losses to deal with the special challenges of colonoscopy data. For better training, a set of geometric consistency objectives was developed, using both depth and surface normal information. Also, the classic photometric loss was extended with feature matching to compensate for illumination noise. With the training losses powerful enough, our self-supervised framework named ColDE is able to produce better depth maps of colonoscopy data as compared to the previous work utilizing prior depth knowledge. Used in reconstruction, our network is able to reconstruct good-quality colon meshes in real-time without any post-processing, making it the first to be clinically applicable. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 13 pages, 5 figures

arXiv:2103.10310 [pdf, other]

Lighting Enhancement Aids Reconstruction of Colonoscopic Surfaces

Authors: Yubo Zhang, Shuxian Wang, Ruibin Ma, Sarah K. McGill, Julian G. Rosenman, Stephen M. Pizer

Abstract: High screening coverage during colonoscopy is crucial to effectively prevent colon cancer. Previous work has allowed alerting the doctor to unsurveyed regions by reconstructing the 3D colonoscopic surface from colonoscopy videos in real-time. However, the lighting inconsistency of colonoscopy videos can cause a key component of the colonoscopic reconstruction system, the SLAM optimization, to fail… ▽ More High screening coverage during colonoscopy is crucial to effectively prevent colon cancer. Previous work has allowed alerting the doctor to unsurveyed regions by reconstructing the 3D colonoscopic surface from colonoscopy videos in real-time. However, the lighting inconsistency of colonoscopy videos can cause a key component of the colonoscopic reconstruction system, the SLAM optimization, to fail. In this work we focus on the lighting problem in colonoscopy videos. To successfully improve the lighting consistency of colonoscopy videos, we have found necessary a lighting correction that adapts to the intensity distribution of recent video frames. To achieve this in real-time, we have designed and trained an RNN network. This network adapts the gamma value in a gamma-correction process. Applied in the colonoscopic surface reconstruction system, our light-weight model significantly boosts the reconstruction success rate, making a larger proportion of colonoscopy video segments reconstructable and improving the reconstruction quality of the already reconstructed segments. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted at IPMI 2021 (The 27th international conference on Information Processing in Medical Imaging)

arXiv:1904.07087 [pdf, other]

Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth

Authors: Rui Wang, Stephen M. Pizer, Jan-Michael Frahm

Abstract: Deep learning-based, single-view depth estimation methods have recently shown highly promising results. However, such methods ignore one of the most important features for determining depth in the human vision system, which is motion. We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view imag… ▽ More Deep learning-based, single-view depth estimation methods have recently shown highly promising results. However, such methods ignore one of the most important features for determining depth in the human vision system, which is motion. We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view image reprojection and forward-backward flow-consistency losses. Our model can be trained in a supervised or even unsupervised mode. It is designed for depth and visual odometry estimation from video where the input frames are temporally correlated. However, it also generalizes to single-view depth estimation. Our method produces superior results to the state-of-the-art approaches for single-view and multi-view learning-based depth estimation on the KITTI driving dataset. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1805.06558 [pdf, other]

Recurrent Neural Network for Learning DenseDepth and Ego-Motion from Video

Authors: Rui Wang, Jan-Michael Frahm, Stephen M. Pizer

Abstract: Learning-based, single-view depth estimation often generalizes poorly to unseen datasets. While learning-based, two-frame depth estimation solves this problem to some extent by learning to match features across frames, it performs poorly at large depth where the uncertainty is high. There exists few learning-based, multi-view depth estimation methods. In this paper, we present a learning-based, mu… ▽ More Learning-based, single-view depth estimation often generalizes poorly to unseen datasets. While learning-based, two-frame depth estimation solves this problem to some extent by learning to match features across frames, it performs poorly at large depth where the uncertainty is high. There exists few learning-based, multi-view depth estimation methods. In this paper, we present a learning-based, multi-view dense depth map and ego-motion estimation method that uses Recurrent Neural Networks (RNN). Our model is designed for 3D reconstruction from video where the input frames are temporally correlated. It is generalizable to single- or two-view dense depth estimation. Compared to recent single- or two-view CNN-based depth estimation methods, our model leverages more views and achieves more accurate results, especially at large distances. Our method produces superior results to the state-of-the-art learning-based, single- or two-view depth estimation methods on both indoor and outdoor benchmark datasets. We also demonstrate that our method can even work on extremely difficult sequences, such as endoscopic video, where none of the assumptions (static scene, constant lighting, Lambertian reflection, etc.) from traditional 3D reconstruction methods hold. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Showing 1–6 of 6 results for author: Pizer, S M