Search | arXiv e-print repository

Multistable Shape from Shading Emerges from Patch Diffusion

Authors: Xinran Nicole Han, Todd Zickler, Ko Nishino

Abstract: Models for monocular shape reconstruction of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) varieties which are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight di… ▽ More Models for monocular shape reconstruction of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) varieties which are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from $16\times 16$ patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ''ambiguous'' test images that humans experience as being multistable. At the same time, the model produces veridical shape estimates for object-like images that include distinctive occluding contours and appear less ambiguous. This may inspire new architectures for stochastic 3D shape perception that are more efficient and better aligned with human experience. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2401.00935 [pdf, other]

Boundary Attention: Learning to Localize Boundaries under High Noise

Authors: Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

Abstract: We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapp… ▽ More We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlap** patch within an image. It operates in a bottom-up fashion, similar to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional description of local boundary structure, a notion of spatial consistency that is learned instead of designed, and a sequence of operations that is end-to-end differentiable. We train our model using simple synthetic data and then evaluate it using photographs that were captured under low-light conditions with variable amounts of noise. We find that our method generalizes to natural images corrupted by real sensor noise, and predicts consistent boundaries under increasingly noisy conditions where other state-of-the-art methods fail. △ Less

Submitted 18 March, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: Project website at boundaryattention.github.io: http://boundaryattention.github.io

arXiv:2307.08106 [pdf, other]

Polarization Multi-Image Synthesis with Birefringent Metasurfaces

Authors: Dean Hazineh, Soon Wei Daniel Lim, Qi Guo, Federico Capasso, Todd Zickler

Abstract: Optical metasurfaces composed of precisely engineered nanostructures have gained significant attention for their ability to manipulate light and implement distinct functionalities based on the properties of the incident field. Computational imaging systems have started harnessing this capability to produce sets of coded measurements that benefit certain tasks when paired with digital post-processi… ▽ More Optical metasurfaces composed of precisely engineered nanostructures have gained significant attention for their ability to manipulate light and implement distinct functionalities based on the properties of the incident field. Computational imaging systems have started harnessing this capability to produce sets of coded measurements that benefit certain tasks when paired with digital post-processing. Inspired by these works, we introduce a new system that uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure. We apply this system to the task of incoherent opto-electronic filtering, where digital spatial-filtering operations are replaced by simpler, per-pixel sums across the four polarization channels, independent of the spatial filter size. In contrast to previous work on incoherent opto-electronic filtering that can realize only one spatial filter, our approach can realize a continuous family of filters from a single capture, with filters being selected from the family by adjusting the post-capture digital summation weights. To find a metasurface that can realize a set of user-specified spatial filters, we introduce a form of gradient descent with a novel regularizer that encourages light efficiency and a high signal-to-noise ratio. We demonstrate several examples in simulation and with fabricated prototypes, including some with spatial filters that have prescribed variations with respect to depth and wavelength. Visit the Project Page at https://deanhazineh.github.io/publications/Multi_Image_Synthesis/MIS_Home.html △ Less

Submitted 11 August, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

Comments: Published in the Proceedings of the 2023 IEEE International Conference of Computational Photography

arXiv:2305.16321 [pdf, other]

Eclipse: Disambiguating Illumination and Materials using Unintended Shadows

Authors: Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Todd Zickler, Pratul P. Srinivasan

Abstract: Decomposing an object's appearance into representations of its materials and the surrounding illumination is difficult, even when the object's 3D shape is known beforehand. This problem is especially challenging for diffuse objects: it is ill-conditioned because diffuse materials severely blur incoming light, and it is ill-posed because diffuse materials under high-frequency lighting can be indist… ▽ More Decomposing an object's appearance into representations of its materials and the surrounding illumination is difficult, even when the object's 3D shape is known beforehand. This problem is especially challenging for diffuse objects: it is ill-conditioned because diffuse materials severely blur incoming light, and it is ill-posed because diffuse materials under high-frequency lighting can be indistinguishable from shiny materials under low-frequency lighting. We show that it is possible to recover precise materials and illumination -- even from diffuse objects -- by exploiting unintended shadows, like the ones cast onto an object by the photographer who moves around it. These shadows are a nuisance in most previous inverse rendering pipelines, but here we exploit them as signals that improve conditioning and help resolve material-lighting ambiguities. We present a method based on differentiable Monte Carlo ray tracing that uses images of an object to jointly recover its spatially-varying materials, the surrounding illumination environment, and the shapes of the unseen light occluders who inadvertently cast shadows upon it. △ Less

Submitted 13 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Project page: https://dorverbin.github.io/eclipse/

arXiv:2112.03907 [pdf, other]

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

Authors: Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan

Abstract: Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to a… ▽ More Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: Project page: https://dorverbin.github.io/refnerf/

arXiv:2109.03464 [pdf, other]

Level Set Binocular Stereo with Occlusions

Authors: Jialiang Wang, Todd Zickler

Abstract: Localizing stereo boundaries and predicting nearby disparities are difficult because stereo boundaries induce occluded regions where matching cues are absent. Most modern computer vision algorithms treat occlusions secondarily (e.g., via left-right consistency checks after matching) or rely on high-level cues to improve nearby disparities (e.g., via deep networks and large training sets). They ign… ▽ More Localizing stereo boundaries and predicting nearby disparities are difficult because stereo boundaries induce occluded regions where matching cues are absent. Most modern computer vision algorithms treat occlusions secondarily (e.g., via left-right consistency checks after matching) or rely on high-level cues to improve nearby disparities (e.g., via deep networks and large training sets). They ignore the geometry of stereo occlusions, which dictates that the spatial extent of occlusion must equal the amplitude of the disparity jump that causes it. This paper introduces an energy and level-set optimizer that improves boundaries by encoding occlusion geometry. Our model applies to two-layer, figure-ground scenes, and it can be implemented cooperatively using messages that pass predominantly between parents and children in an undecimated hierarchy of multi-scale image patches. In a small collection of figure-ground scenes curated from Middlebury and Falling Things stereo datasets, our model provides more accurate boundaries than previous occlusion-handling stereo techniques. This suggests new directions for creating cooperative stereo systems that incorporate occlusion cues in a human-like manner. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: extended journal version of arXiv:2006.16094

arXiv:2104.12290 [pdf, other]

StegaPos: Preventing Unwanted Crops and Replacements with Imperceptible Positional Embeddings

Authors: Gokhan Egri, Todd Zickler

Abstract: We present a learned, spatially-varying steganography system that allows detecting when and how images have been altered by crop**, splicing or inpainting after publication. The system comprises a learned encoder that imperceptibly hides distinct positional signatures in every local image region before publication, and an accompanying learned decoder that extracts the steganographic signatures t… ▽ More We present a learned, spatially-varying steganography system that allows detecting when and how images have been altered by crop**, splicing or inpainting after publication. The system comprises a learned encoder that imperceptibly hides distinct positional signatures in every local image region before publication, and an accompanying learned decoder that extracts the steganographic signatures to determine, for each local image region, its 2D positional coordinates within the originally-published image. Crop and replacement edits become detectable by the inconsistencies they cause in the hidden positional signatures. Using a prototype system for small $(400 \times 400)$ images, we show experimentally that simple CNN encoder and decoder architectures can be trained jointly to achieve detection that is reliable and robust, without introducing perceptible distortion. This approach could help individuals and image-sharing platforms certify that an image was published by a trusted source, and also know which parts of such an image, if any, have been substantially altered since publication. △ Less

Submitted 7 December, 2022; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: For CVPR 2022 submission, 8 pages (main)

arXiv:2011.13866 [pdf, other]

Field of Junctions: Extracting Boundary Structure at Low SNR

Authors: Dor Verbin, Todd Zickler

Abstract: We introduce a bottom-up model for simultaneously finding many boundary elements in an image, including contours, corners and junctions. The model explains boundary shape in each small patch using a 'generalized M-junction' comprising M angles and a freely-moving vertex. Images are analyzed using non-convex optimization to cooperatively find M+2 junction values at every location, with spatial cons… ▽ More We introduce a bottom-up model for simultaneously finding many boundary elements in an image, including contours, corners and junctions. The model explains boundary shape in each small patch using a 'generalized M-junction' comprising M angles and a freely-moving vertex. Images are analyzed using non-convex optimization to cooperatively find M+2 junction values at every location, with spatial consistency being enforced by a novel regularizer that reduces curvature while preserving corners and junctions. The resulting 'field of junctions' is simultaneously a contour detector, corner/junction detector, and boundary-aware smoothing of regional appearance. Notably, its unified analysis of contours, corners, junctions and uniform regions allows it to succeed at high noise levels, where other methods for segmentation and boundary detection fail. △ Less

Submitted 11 November, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

Comments: ICCV 2021. Project page with demo, video, and code: https://vision.seas.harvard.edu/foj/

arXiv:2006.16094 [pdf, other]

Level Set Stereo for Cooperative Grou** with Occlusion

Authors: Jialiang Wang, Todd Zickler

Abstract: Localizing stereo boundaries is difficult because matching cues are absent in the occluded regions that are adjacent to them. We introduce an energy and level-set optimizer that improves boundaries by encoding the essential geometry of occlusions: The spatial extent of an occlusion must equal the amplitude of the disparity jump that causes it. In a collection of figure-ground scenes from Middlebur… ▽ More Localizing stereo boundaries is difficult because matching cues are absent in the occluded regions that are adjacent to them. We introduce an energy and level-set optimizer that improves boundaries by encoding the essential geometry of occlusions: The spatial extent of an occlusion must equal the amplitude of the disparity jump that causes it. In a collection of figure-ground scenes from Middlebury and Falling Things stereo datasets, the model provides more accurate boundaries than previous occlusion-handling techniques. △ Less

Submitted 18 June, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: ICIP 2021 Code and data: https://github.com/jialiangw/levelsetstereo

arXiv:2003.08885 [pdf, other]

doi 10.1109/TPAMI.2021.3081360

Unique Geometry and Texture from Corresponding Image Patches

Authors: Dor Verbin, Steven J. Gortler, Todd Zickler

Abstract: We present a sufficient condition for recovering unique texture and viewpoints from unknown orthographic projections of a flat texture process. We show that four observations are sufficient in general, and we characterize the ambiguous cases. The results are applicable to shape from texture and texture-based structure from motion. We present a sufficient condition for recovering unique texture and viewpoints from unknown orthographic projections of a flat texture process. We show that four observations are sufficient in general, and we characterize the ambiguous cases. The results are applicable to shape from texture and texture-based structure from motion. △ Less

Submitted 6 November, 2021; v1 submitted 19 March, 2020; originally announced March 2020.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 43, Issue: 12, Dec. 1 2021

arXiv:1807.10376 [pdf, other]

Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset

Authors: Qi Guo, Iuri Frosio, Orazio Gallo, Todd Zickler, Jan Kautz

Abstract: Scene motion, multiple reflections, and sensor noise introduce artifacts in the depth reconstruction performed by time-of-flight cameras. We propose a two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. We also introduce FLAT, a synthetic dataset of 2000 ToF measurements that capture all of these nonidealities, and allows to simulate different camera hard… ▽ More Scene motion, multiple reflections, and sensor noise introduce artifacts in the depth reconstruction performed by time-of-flight cameras. We propose a two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. We also introduce FLAT, a synthetic dataset of 2000 ToF measurements that capture all of these nonidealities, and allows to simulate different camera hardware. Using the Kinect 2 camera as a baseline, we show improved reconstruction errors over state-of-the-art methods, on both simulated and real data. △ Less

Submitted 26 July, 2018; originally announced July 2018.

Comments: ECCV 2018

arXiv:1601.00088 [pdf, ps, other]

doi 10.1109/TIP.2017.2731208

Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective

Authors: Stanley H. Chan, Todd Zickler, Yue M. Lu

Abstract: Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such pe… ▽ More Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking. In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Gaussian mixture model of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. This observation allows us to develop a new denoising algorithm called Gaussian mixture model symmetric smoothing filter (GSF). GSF is an extension of the Sinkhorn-Knopp and is a generalization of the original smoothing filters. Despite its simple formulation, GSF outperforms many existing smoothing filters and has a similar performance compared to several state-of-the-art denoising algorithms. △ Less

Submitted 22 September, 2016; v1 submitted 1 January, 2016; originally announced January 2016.

Comments: 14 pages

arXiv:1411.4894 [pdf, other]

Low-level Vision by Consensus in a Spatial Hierarchy of Regions

Authors: Ayan Chakrabarti, Ying Xiong, Steven J. Gortler, Todd Zickler

Abstract: We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data---such as depth from stereo image pairs. The framework uses a dense, overlap** set of image regions at multiple scales and a "local model," such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is ca… ▽ More We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data---such as depth from stereo image pairs. The framework uses a dense, overlap** set of image regions at multiple scales and a "local model," such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues. △ Less

Submitted 14 April, 2015; v1 submitted 18 November, 2014; originally announced November 2014.

Comments: Accepted to CVPR 2015. Project page: http://www.ttic.edu/chakrabarti/consensus/

arXiv:1312.7366 [pdf, ps, other]

doi 10.1109/TIP.2014.2327813

Monte Carlo non local means: Random sampling for large-scale image filtering

Authors: Stanley H. Chan, Todd Zickler, Yue M. Lu

Abstract: We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorit… ▽ More We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorithm and show that, for large images or large external image databases, the random outcomes of MCNLM are tightly concentrated around the deterministic full NLM result. In particular, our error probability bounds show that, at any given sampling ratio, the probability for MCNLM to have a large deviation from the original NLM solution decays exponentially as the size of the image or database grows. Second, we derive explicit formulas for optimal sampling patterns that minimize the error probability bound by exploiting partial knowledge of the pairwise similarity weights. Numerical experiments show that MCNLM is competitive with other state-of-the-art fast NLM algorithms for single-image denoising. When applied to denoising images using an external database containing ten billion patches, MCNLM returns a randomized solution that is within 0.2 dB of the full NLM solution while reducing the runtime by three orders of magnitude. △ Less

Submitted 14 May, 2014; v1 submitted 27 December, 2013; originally announced December 2013.

Comments: submitted for publication

arXiv:1311.6887 [pdf, other]

doi 10.1109/TPAMI.2014.2318713

Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images

Authors: Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, Kate Saenko

Abstract: To produce images that are suitable for display, tone-map** is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range. This introduces non-linear distortion that must be undone, through a radiometric calibration process, before computer vision systems can analyze such photographs radiometrically. This paper considers the inherent uncertainty… ▽ More To produce images that are suitable for display, tone-map** is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range. This introduces non-linear distortion that must be undone, through a radiometric calibration process, before computer vision systems can analyze such photographs radiometrically. This paper considers the inherent uncertainty of undoing the effects of tone-map**. We observe that this uncertainty varies substantially across color space, making some pixels more reliable than others. We introduce a model for this uncertainty and a method for fitting it to a given camera or imaging pipeline. Once fit, the model provides for each pixel in a tone-mapped digital photograph a probability distribution over linear scene colors that could have induced it. We demonstrate how these distributions can be useful for visual inference by incorporating them into estimation algorithms for a representative set of vision tasks. △ Less

Submitted 9 April, 2014; v1 submitted 27 November, 2013; originally announced November 2013.

Journal ref: IEEE Trans. PAMI 36 (2014) 2185-2198

arXiv:1310.2916 [pdf, other]

doi 10.1109/TPAMI.2014.2343211

From Shading to Local Shape

Authors: Ying Xiong, Ayan Chakrabarti, Ronen Basri, Steven J. Gortler, David W. Jacobs, Todd Zickler

Abstract: We develop a framework for extracting a concise representation of the shape information available from diffuse shading in a small image patch. This produces a mid-level scene descriptor, comprised of local shape distributions that are inferred separately at every image patch across multiple scales. The framework is based on a quadratic representation of local shape that, in the absence of noise, h… ▽ More We develop a framework for extracting a concise representation of the shape information available from diffuse shading in a small image patch. This produces a mid-level scene descriptor, comprised of local shape distributions that are inferred separately at every image patch across multiple scales. The framework is based on a quadratic representation of local shape that, in the absence of noise, has guarantees on recovering accurate local shape and lighting. And when noise is present, the inferred local shape distributions provide useful shape information without over-committing to any particular image explanation. These local shape distributions naturally encode the fact that some smooth diffuse regions are more informative than others, and they enable efficient and robust reconstruction of object-scale shape. Experimental results show that this approach to surface reconstruction compares well against the state-of-art on both synthetic images and captured photographs. △ Less

Submitted 7 April, 2014; v1 submitted 10 October, 2013; originally announced October 2013.

Journal ref: IEEE Trans. PAMI 37 (2015) 67-79

arXiv:1204.2994 [pdf, other]

Image Restoration with Signal-dependent Camera Noise

Authors: Ayan Chakrabarti, Todd Zickler

Abstract: This article describes a fast iterative algorithm for image denoising and deconvolution with signal-dependent observation noise. We use an optimization strategy based on variable splitting that adapts traditional Gaussian noise-based restoration algorithms to account for the observed image being corrupted by mixed Poisson-Gaussian noise and quantization errors. This article describes a fast iterative algorithm for image denoising and deconvolution with signal-dependent observation noise. We use an optimization strategy based on variable splitting that adapts traditional Gaussian noise-based restoration algorithms to account for the observed image being corrupted by mixed Poisson-Gaussian noise and quantization errors. △ Less

Submitted 13 April, 2012; originally announced April 2012.

Comments: 6 pages, 3 figures, 2 tables

Showing 1–17 of 17 results for author: Zickler, T