Search | arXiv e-print repository

Tell Me What You See: Text-Guided Real-World Image Denoising

Abstract: Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of u… ▽ More Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of using multiple captures. We suggest as an alternative to add a description of the scene as prior, which can be easily done by the photographer capturing the scene. Inspired by the remarkable success of diffusion models for image generation, using a text-guided diffusion model we show that adding image caption information significantly improves image denoising and reconstruction on both synthetic and real-world images. △ Less

Submitted 29 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2212.05315 [pdf, other]

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

Authors: Lior Talker, Aviad Cohen, Erez Yosef, Alexandra Dana, Michael Dinerstein

Abstract: Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensi… ▽ More Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. To the best of our knowledge this paper is the first attempt to address the depth edges issue for LIDAR-supervised scenes. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges GT in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets. Code and datasets are available at \url{https://github.com/liortalker/MindTheEdge}. △ Less

Submitted 3 April, 2024; v1 submitted 10 December, 2022; originally announced December 2022.

Comments: Appears in CVPR24'

arXiv:2112.14768 [pdf, other]

Video Reconstruction from a Single Motion Blurred Image using Learned Dynamic Phase Coding

Authors: Erez Yosef, Shay Elmalem, Raja Giryes

Abstract: Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely-digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works proposed to address these limitations using non-conven… ▽ More Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely-digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works proposed to address these limitations using non-conventional image sensors, however, such sensors are extremely rare and expensive. To circumvent these limitations with simpler means, we propose a hybrid optical-digital method for video reconstruction that requires only simple modifications to existing optical systems. We use a learned dynamic phase-coding in the lens aperture during the image acquisition to encode the motion trajectories, which serve as prior information for the video reconstruction process. The proposed computational camera generates a sharp frame burst of the scene at various frame rates from a single coded motion-blurred image, using an image-to-video convolutional neural network. We present advantages and improved performance compared to existing methods, using both simulations and a real-world camera prototype. We extend our optical coding also to video frame interpolation and present robust and improved results for noisy videos. △ Less

Submitted 18 December, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

Showing 1–3 of 3 results for author: Yosef, E