Search | arXiv e-print repository

doi 10.4204/EPTCS.403.15

Optimal Generation of Strictly Increasing Binary Trees and Beyond

Authors: Olivier Bodini, Francis Durand, Philippe Marchal

Abstract: This article presents two novel algorithms for generating random increasing trees. The first algorithm efficiently generates strictly increasing binary trees using an ad hoc method. The second algorithm improves the recursive method for weighted strictly increasing unary-binary increasing trees, optimizing randomness usage. This article presents two novel algorithms for generating random increasing trees. The first algorithm efficiently generates strictly increasing binary trees using an ad hoc method. The second algorithm improves the recursive method for weighted strictly increasing unary-binary increasing trees, optimizing randomness usage. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: In Proceedings GASCom 2024, arXiv:2406.14588

Journal ref: EPTCS 403, 2024, pp. 60-65

arXiv:2405.14867 [pdf, other]

Improved Distribution Matching Distillation for Fast Image Synthesis

Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman

Abstract: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss… ▽ More Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods. △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: Code, model, and dataset are available at https://tianweiy.github.io/dmd2

arXiv:2405.08564 [pdf, other]

Anytime Sorting Algorithms (Extended Version)

Authors: Emma Caizergues, François Durand, Fabien Mathieu

Abstract: This paper addresses the anytime sorting problem, aiming to develop algorithms providing tentative estimates of the sorted list at each execution step. Comparisons are treated as steps, and the Spearman's footrule metric evaluates estimation accuracy. We propose a general approach for making any sorting algorithm anytime and introduce two new algorithms: multizip sort and Corsort. Simulations show… ▽ More This paper addresses the anytime sorting problem, aiming to develop algorithms providing tentative estimates of the sorted list at each execution step. Comparisons are treated as steps, and the Spearman's footrule metric evaluates estimation accuracy. We propose a general approach for making any sorting algorithm anytime and introduce two new algorithms: multizip sort and Corsort. Simulations showcase the superior performance of both algorithms compared to existing methods. Multizip sort keeps a low global complexity, while Corsort produces intermediate estimates surpassing previous algorithms. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Aug 2024, Jeju City, Jeju Island, South Korea

arXiv:2312.02970 [pdf, other]

Alchemist: Parametric Control of Material Properties with Diffusion Models

Authors: Prafull Sharma, Varun Jampani, Yuanzhen Li, Xuhui Jia, Dmitry Lagun, Fredo Durand, William T. Freeman, Mark Matthews

Abstract: We propose a method to control material attributes of objects like roughness, metallic, albedo, and transparency in real images. Our method capitalizes on the generative prior of text-to-image models known for photorealism, employing a scalar value and instructions to alter low-level material properties. Addressing the lack of datasets with controlled material attributes, we generated an object-ce… ▽ More We propose a method to control material attributes of objects like roughness, metallic, albedo, and transparency in real images. Our method capitalizes on the generative prior of text-to-image models known for photorealism, employing a scalar value and instructions to alter low-level material properties. Addressing the lack of datasets with controlled material attributes, we generated an object-centric synthetic dataset with physically-based materials. Fine-tuning a modified pre-trained text-to-image model on this synthetic dataset enables us to edit material properties in real-world images while preserving all other attributes. We show the potential application of our model to material edited NeRFs. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.18828 [pdf, other]

One-step Diffusion with Distribution Matching Distillation

Authors: Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park

Abstract: Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient c… ▽ More Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware. △ Less

Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: Project page: https://tianweiy.github.io/dmd/

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2309.02005 [pdf, other]

Aggregating Correlated Estimations with (Almost) no Training

Authors: Theo Delemazure, François Durand, Fabien Mathieu

Abstract: Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In… ▽ More Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In this article, we propose different aggregation rules that take correlations into account, and we compare them to naive rules in various experiments based on synthetic data. Our results show that when sufficient information is known about the correlations between errors, a maximum likelihood aggregation should be preferred. Otherwise, typically with limited training data, we recommend a method that we call Embedded Voting (EV). △ Less

Submitted 5 September, 2023; originally announced September 2023.

Journal ref: 26th European Conference on Artificial Intelligence (ECAI 2023), Sep 2023, Krak{ó}w, Poland

arXiv:2306.11719 [pdf, other]

Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

Authors: Ayush Tewari, Tianwei Yin, George Cazenavette, Semon Rezchikov, Joshua B. Tenenbaum, Frédo Durand, William T. Freeman, Vincent Sitzmann

Abstract: Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with… ▽ More Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with a given image, but ground-truth 3D scenes are unavailable and only 2D images are accessible. To address this limitation, we propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. Instead, these signals are measured indirectly through a known differentiable forward model, which produces partial observations of the unknown signal. Our approach involves integrating the forward model directly into the denoising process. This integration effectively connects the generative modeling of observations with the generative modeling of the underlying signals, allowing for end-to-end training of a conditional generative model over signals. During inference, our approach enables sampling from the distribution of underlying signals that are consistent with a given partial observation. We demonstrate the effectiveness of our method on three challenging computer vision tasks. For instance, in the context of inverse graphics, our model enables direct sampling from the distribution of 3D scenes that align with a single 2D input image. △ Less

Submitted 16 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Project page: https://diffusion-with-forward-models.github.io/

arXiv:2305.13291 [pdf, other]

Materialistic: Selecting Similar Materials in Images

Authors: Prafull Sharma, Julien Philip, Michaël Gharbi, William T. Freeman, Fredo Durand, Valentin Deschaintre

Abstract: Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic seg… ▽ More Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grou** problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO features coupled with a proposed Cross-Similarity module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.10431 [pdf, other]

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Authors: Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han

Abstract: Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastCompo… ▽ More Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300$\times$-2500$\times$ speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation. Code, model, and dataset are available at https://github.com/mit-han-lab/fastcomposer. △ Less

Submitted 21 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: The first two authors contributed equally to this work

arXiv:2304.11952 [pdf, ps, other]

Sorting wild pigs

Authors: Emma Caizergues, François Durand, Fabien Mathieu

Abstract: Chjara, breeder in Carg{è}se, has n wild pigs. She would like to sort her herd by weight to better meet the demands of her buyers. Each beast has a distinct weight, alas unknown to Chjara. All she has at her disposal is a Roberval scale, which allows her to compare two pigs only at the cost of an acrobatic manoeuvre. The balance, quite old, can break at any time. Chjara therefore wants to sort his… ▽ More Chjara, breeder in Carg{è}se, has n wild pigs. She would like to sort her herd by weight to better meet the demands of her buyers. Each beast has a distinct weight, alas unknown to Chjara. All she has at her disposal is a Roberval scale, which allows her to compare two pigs only at the cost of an acrobatic manoeuvre. The balance, quite old, can break at any time. Chjara therefore wants to sort his herd in a minimum of weighings, but also to have a good estimate of the result after each weighing.To help Chjara, we pose the problem of finding a good anytime sorting algorithm, in the sense of Kendall's tau distance between provisional result and perfectly sorted list, and we bring the following contributions:- We introduce Corsort, a family of anytime sorting algorithms based on estimators.- By simulation, we show that a well-configured Corsort has a near-optimal termination time, and provides better intermediate estimates than the best sorting algorithms we are aware of. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: in French language, AlgoTel 2023 - 25{è}mes Rencontres Francophones sur les Aspects Algorithmiques des T{é}l{é}communications, May 2023, Cargese, France

arXiv:2209.10507 [pdf, other]

Gemino: Practical and Robust Neural Compression for Video Conferencing

Authors: Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, Vivienne Sze

Abstract: Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produ… ▽ More Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a Titan X GPU, and achieves 2.2-5x lower bitrate than traditional video codecs for the same perceptual quality. △ Less

Submitted 19 October, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 13 pages, 5 appendix

Journal ref: USENIX NSDI 2024

arXiv:2209.10077 [pdf, other]

Can Shadows Reveal Biometric Information?

Authors: Safa C. Medin, Amir Weiss, Frédo Durand, William T. Freeman, Gregory W. Wornell

Abstract: We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, e… ▽ More We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, exploiting the subtle cues in the shadows that are the source of the leakage without requiring any labeled real data. In particular, our approach relies on building synthetic scenes composed of 3D face models obtained from a single photograph of each identity. We transfer what we learn from the synthetic data to the real data using domain adaptation in a completely unsupervised way. Our model is able to generalize well to the real domain and is robust to several variations in the scenes. We report high classification accuracies in an identity classification task that takes place in a scene with unknown geometry and occluding objects. △ Less

Submitted 4 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2207.11232 [pdf, other]

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

Authors: Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

Abstract: We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the bird's-eye-view (BEV) representation commonly used in vision and robotics, we propose conditional neural groundplans, ground-aligned 2D feature grids, as persistent a… ▽ More We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the bird's-eye-view (BEV) representation commonly used in vision and robotics, we propose conditional neural groundplans, ground-aligned 2D feature grids, as persistent and memory-efficient scene representations. Our method is trained self-supervised from unlabeled multi-view observations using differentiable rendering, and learns to complete geometry and appearance of occluded regions. In addition, we show that we can leverage multi-view videos at training time to learn to separately reconstruct static and movable components of the scene from a single image at test time. The ability to separately reconstruct movable objects enables a variety of downstream tasks using simple heuristics, such as extraction of object-centric 3D representations, novel view synthesis, instance-level segmentation, 3D bounding box prediction, and scene editing. This highlights the value of neural groundplans as a backbone for efficient 3D scene understanding models. △ Less

Submitted 9 April, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

Comments: Project page: https://prafullsharma.net/neural_groundplans/

arXiv:2206.05344 [pdf, other]

Differentiable Rendering of Neural SDFs through Reparameterization

Authors: Sai Praveen Bangaru, Michaël Gharbi, Tzu-Mao Li, Fujun Luan, Kalyan Sunkavalli, Miloš Hašan, Sai Bi, Zexiang Xu, Gilbert Bernstein, Frédo Durand

Abstract: We present a method to automatically compute correct gradients with respect to geometric scene parameters in neural SDF renderers. Recent physically-based differentiable rendering techniques for meshes have used edge-sampling to handle discontinuities, particularly at object silhouettes, but SDFs do not have a simple parametric form amenable to sampling. Instead, our approach builds on area-sampli… ▽ More We present a method to automatically compute correct gradients with respect to geometric scene parameters in neural SDF renderers. Recent physically-based differentiable rendering techniques for meshes have used edge-sampling to handle discontinuities, particularly at object silhouettes, but SDFs do not have a simple parametric form amenable to sampling. Instead, our approach builds on area-sampling techniques and develops a continuous war** function for SDFs to account for these discontinuities. Our method leverages the distance to surface encoded in an SDF and uses quadrature on sphere tracer points to compute this war** function. We further show that this can be done by subsampling the points to make the method tractable for neural SDFs. Our differentiable renderer can be used to optimize neural shapes from multi-view images and produces comparable 3D reconstructions to recent SDF-based inverse rendering methods, without the need for 2D segmentation masks to guide the geometry optimization and no volumetric approximations to the geometry. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2205.03923 [pdf, other]

Unsupervised Discovery and Composition of Object Light Fields

Authors: Cameron Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann

Abstract: Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled ra… ▽ More Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled radiance fields and thus, noisy renderings, poor framerates, and high memory and time complexity during training and rendering. Here, we propose to represent objects in an object-centric, compositional scene representation as light fields. We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields. Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance on standard datasets, and rendering and training speeds at orders of magnitude faster than existing 3D approaches. △ Less

Submitted 15 July, 2023; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: Project website: https://cameronosmith.github.io/colf. TMLR 2023

arXiv:2203.12691 [pdf, other]

Learning to generate line drawings that convey geometry and semantics

Authors: Caroline Chan, Fredo Durand, Phillip Isola

Abstract: This paper presents an unpaired method for creating line drawings from photographs. Current methods often rely on high quality paired datasets to generate line drawings. However, these datasets often have limitations due to the subjects of the drawings belonging to a specific domain, or in the amount of data collected. Although recent work in unsupervised image-to-image translation has shown much… ▽ More This paper presents an unpaired method for creating line drawings from photographs. Current methods often rely on high quality paired datasets to generate line drawings. However, these datasets often have limitations due to the subjects of the drawings belonging to a specific domain, or in the amount of data collected. Although recent work in unsupervised image-to-image translation has shown much progress, the latest methods still struggle to generate compelling line drawings. We observe that line drawings are encodings of scene information and seek to convey 3D shape and semantic meaning. We build these observations into a set of objectives and train an image translation to map photographs into line drawings. We introduce a geometry loss which predicts depth information from the image features of a line drawing, and a semantic loss which matches the CLIP features of a line drawing with its corresponding photograph. Our approach outperforms state-of-the-art unpaired image translation and line drawing generation methods on creating line drawings from arbitrary photographs. For code and demo visit our webpage carolineec.github.io/informative_drawings △ Less

Submitted 28 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: Corrected and added references

arXiv:2108.13027 [pdf, other]

What You Can Learn by Staring at a Blank Wall

Authors: Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

Abstract: We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two m… ▽ More We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of $\approx94\%$ for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2106.02634 [pdf, other]

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

Authors: Vincent Sitzmann, Semon Rezchikov, William T. Freeman, Joshua B. Tenenbaum, Fredo Durand

Abstract: Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the… ▽ More Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the underlying 3D scene in a 360-degree, four-dimensional light field parameterized via a neural implicit representation. Rendering a ray from an LFN requires only a single network evaluation, as opposed to hundreds of evaluations per ray for ray-marching or volumetric based renderers in 3D-structured neural scene representations. In the setting of simple scenes, we leverage meta-learning to learn a prior over LFNs that enables multi-view consistent light field reconstruction from as little as a single image observation. This results in dramatic reductions in time and memory complexity, and enables real-time rendering. The cost of storing a 360-degree light field via an LFN is two orders of magnitude lower than conventional methods such as the Lumigraph. Utilizing the analytical differentiability of neural implicit representations and a novel parameterization of light space, we further demonstrate the extraction of sparse depth maps from LFNs. △ Less

Submitted 18 January, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: First two authors contributed equally. Project website: https://vsitzmann.github.io/lfns/

arXiv:2101.04822 [pdf, other]

Plug-and-Play Algorithms for Video Snapshot Compressive Imaging

Authors: Xin Yuan, Yang Liu, **li Suo, Frédo Durand, Qionghai Dai

Abstract: We consider the reconstruction problem of video snapshot compressive imaging (SCI), which captures high-speed videos using a low-speed 2D sensor (detector). The underlying principle of SCI is to modulate sequential high-speed frames with different masks and then these encoded frames are integrated into a snapshot on the sensor and thus the sensor can be of low-speed. On one hand, video SCI enjoys… ▽ More We consider the reconstruction problem of video snapshot compressive imaging (SCI), which captures high-speed videos using a low-speed 2D sensor (detector). The underlying principle of SCI is to modulate sequential high-speed frames with different masks and then these encoded frames are integrated into a snapshot on the sensor and thus the sensor can be of low-speed. On one hand, video SCI enjoys the advantages of low-bandwidth, low-power and low-cost. On the other hand, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging and one of the bottlenecks lies in the reconstruction algorithm. Exiting algorithms are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload. We first employ the image deep denoising priors to show that PnP can recover a UHD color video with 30 frames from a snapshot measurement. Since videos have strong temporal correlation, by employing the video deep denoising priors, we achieve a significant improvement in the results. Furthermore, we extend the proposed PnP algorithms to the color SCI system using mosaic sensors, where each pixel only captures the red, green or blue channels. A joint reconstruction and demosaicing paradigm is developed for flexible and high quality reconstruction of color video SCI systems. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 18 pages, 12 figures and 4 tables. Journal extension of arXiv:2003.13654. Code available at https://github.com/liuyang12/PnP-SCI_python

arXiv:2012.08141 [pdf, other]

AsyncTaichi: On-the-fly Inter-kernel Optimizations for Imperative and Spatially Sparse Programming

Authors: Yuanming Hu, Mingkuan Xu, Ye Kuang, Frédo Durand

Abstract: Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to achieve high performance. Existing work focuses on improving performance within a single sparse computational kernel. We show that a system that looks beyond a sing… ▽ More Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to achieve high performance. Existing work focuses on improving performance within a single sparse computational kernel. We show that a system that looks beyond a single kernel, plus additional domain-specific sparse data structure analysis, opens up exciting new space for optimizing sparse computations. Specifically, we propose a domain-specific data-flow graph model of imperative and sparse computation programs, which describes kernel relationships and enables easy analysis and optimization. Combined with an asynchronous execution engine that exposes a wide window of kernels, the inter-kernel optimizer can then perform effective sparse computation optimizations, such as eliminating unnecessary voxel list generations and removing voxel activation checks. These domain-specific optimizations further make way for classical general-purpose optimizations that are originally challenging to directly apply to computations with sparse data structures. Without any computational code modification, our new system leads to $4.02\times$ fewer kernel launches and $1.87\times$ speed up on our GPU benchmarks, including computations on Eulerian grids, Lagrangian particles, meshes, and automatic differentiation. △ Less

Submitted 22 June, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 18 pages, 20 figures, submitted to ACM SIGGRAPH Asia

ACM Class: D.3.2; I.3.6; I.2.5

arXiv:2007.15721 [pdf, ps, other]

Dimension Groups and Dynamical Systems

Authors: Fabien Durand, Dominique Perrin

Abstract: We give a description of the link between topological dynamical systems and their dimension groups. The focus is on minimal systems and, in particular, on substitution shifts. We describe in detail the various classes of systems including Sturmian shifts and interval exchange shifts. This is a preliminary version of a book which will be published by Cambridge University Press. Any comments are of… ▽ More We give a description of the link between topological dynamical systems and their dimension groups. The focus is on minimal systems and, in particular, on substitution shifts. We describe in detail the various classes of systems including Sturmian shifts and interval exchange shifts. This is a preliminary version of a book which will be published by Cambridge University Press. Any comments are of course welcome. △ Less

Submitted 11 December, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

arXiv:2007.08032 [pdf, other]

When and how CNNs generalize to out-of-distribution category-viewpoint combinations

Authors: Spandan Madan, Timothy Henry, Jamell Dozier, Helen Ho, Nishchal Bhandari, Tomotake Sasaki, Frédo Durand, Hanspeter Pfister, Xavier Boix

Abstract: Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both… ▽ More Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both object category and 3D viewpoint on OOD combinations, and identifying the neural mechanisms that facilitate such OOD generalization. We show that increasing the number of in-distribution combinations (ie. data diversity) substantially improves generalization to OOD combinations, even with the same amount of training data. We compare learning category and viewpoint in separate and shared network architectures, and observe starkly different trends on in-distribution and OOD combinations, ie. while shared networks are helpful in-distribution, separate networks significantly outperform shared ones at OOD combinations. Finally, we demonstrate that such OOD generalization is facilitated by the neural mechanism of specialization, ie. the emergence of two types of neurons -- neurons selective to category and invariant to viewpoint, and vice versa. △ Less

Submitted 17 November, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

arXiv:2001.01026 [pdf, other]

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Authors: Amy Zhao, Guha Balakrishnan, Kathleen M. Lewis, Frédo Durand, John V. Guttag, Adrian V. Dalca

Abstract: We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learni… ▽ More We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a novel training scheme to enable learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthetic videos to be similar to time lapse videos produced by real artists. Our code is available at https://xamyzhao.github.io/timecraft. △ Less

Submitted 25 April, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

Comments: 10 pages, CVPR 2020

arXiv:1912.02314 [pdf, other]

Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization

Authors: Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Fredo Durand

Abstract: We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp… ▽ More We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Inspired by recent work on the Deep Image Prior, we parameterize the factor matrices using randomly initialized convolutional neural networks trained in a one-off manner, and show that this results in decompositions that reflect the true motion in the hidden scene. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: 14 pages, 5 figures, Advances in Neural Information Processing Systems 2019

Journal ref: Aittala, Miika, et al. "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization." Advances in Neural Information Processing Systems. 2019

arXiv:1910.08131 [pdf, other]

A Dataset of Multi-Illumination Images in the Wild

Authors: Lukas Murmann, Michael Gharbi, Miika Aittala, Fredo Durand

Abstract: Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. To fill this gap, we introduce a… ▽ More Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. To fill this gap, we introduce a new multi-illumination dataset of more than 1000 real scenes, each captured under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: ICCV 2019

arXiv:1910.00935 [pdf, other]

DiffTaichi: Differentiable Programming for Physical Simulation

Authors: Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Frédo Durand

Abstract: We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators. Based on an imperative programming language, DiffTaichi generates gradients of simulation steps using source code transformations that preserve arithmetic intensity and parallelism. A light-weight tape is used to record the whole simulation program structure a… ▽ More We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators. Based on an imperative programming language, DiffTaichi generates gradients of simulation steps using source code transformations that preserve arithmetic intensity and parallelism. A light-weight tape is used to record the whole simulation program structure and replay the gradient kernels in a reversed order, for end-to-end backpropagation. We demonstrate the performance and productivity of our language in gradient-based learning and optimization tasks on 10 different physical simulators. For example, a differentiable elastic object simulator written in our language is 4.2x shorter than the hand-engineered CUDA version yet runs as fast, and is 188x faster than the TensorFlow implementation. Using our differentiable programs, neural network controllers are typically optimized within only tens of iterations. △ Less

Submitted 14 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: Published at ICLR 2020

arXiv:1909.00475 [pdf, other]

Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Authors: Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

Abstract: We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield… ▽ More We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield a 1D video. Deprojection is ill-posed-- often there are many plausible solutions for a given input. We first propose a probabilistic model capturing the ambiguity of the task. We then present a variational inference strategy using convolutional neural networks as functional approximators. Sampling from the inference network at test time yields plausible candidates from the distribution of original signals that are consistent with a given input projection. We evaluate the method on several datasets for both spatial and temporal deprojection tasks. We first demonstrate the method can recover human gait videos and face images from spatial projections, and then show that it can recover videos of moving digits from dramatically motion-blurred images obtained via temporal projection. △ Less

Submitted 1 September, 2019; originally announced September 2019.

Comments: ICCV 2019

arXiv:1906.11557 [pdf, other]

Flexible SVBRDF Capture with a Multi-Image Deep Network

Authors: Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, Adrien Bousseau

Abstract: Empowered by deep learning, recent methods for material capture can estimate a spatially-varying reflectance from a single photograph. Such lightweight capture is in stark contrast with the tens or hundreds of pictures required by traditional optimization-based approaches. However, a single image is often simply not enough to observe the rich appearance of real-world materials. We present a deep-l… ▽ More Empowered by deep learning, recent methods for material capture can estimate a spatially-varying reflectance from a single photograph. Such lightweight capture is in stark contrast with the tens or hundreds of pictures required by traditional optimization-based approaches. However, a single image is often simply not enough to observe the rich appearance of real-world materials. We present a deep-learning method capable of estimating material appearance from a variable number of uncalibrated and unordered pictures captured with a handheld camera and flash. Thanks to an order-independent fusing layer, this architecture extracts the most useful information from each picture, while benefiting from strong priors learned from data. The method can handle both view and light direction variation without calibration. We show how our method improves its prediction with the number of input pictures, and reaches high quality reconstructions with as little as 1 to 10 images -- a sweet spot between existing single-image and complex multi-image approaches. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: Accepted to EGSR 2019 in the CGF track

ACM Class: I.3

Journal ref: Computer Graphics Forum (EGSR Conference Proceedings), 38, 4(July 2019), 13 pages

arXiv:1904.08825 [pdf, other]

Generating Training Data for Denoising Real RGB Images via Camera Pipeline Simulation

Authors: Ronnachai Jaroensri, Camille Biscarrat, Miika Aittala, Frédo Durand

Abstract: Image reconstruction techniques such as denoising often need to be applied to the RGB output of cameras and cellphones. Unfortunately, the commonly used additive white noise (AWGN) models do not accurately reproduce the noise and the degradation encountered on these inputs. This is particularly important for learning-based techniques, because the mismatch between training and real world data will… ▽ More Image reconstruction techniques such as denoising often need to be applied to the RGB output of cameras and cellphones. Unfortunately, the commonly used additive white noise (AWGN) models do not accurately reproduce the noise and the degradation encountered on these inputs. This is particularly important for learning-based techniques, because the mismatch between training and real world data will hurt their generalization. This paper aims to accurately simulate the degradation and noise transformation performed by camera pipelines. This allows us to generate realistic degradation in RGB images that can be used to train machine learning models. We use our simulation to study the importance of noise modeling for learning-based denoising. Our study shows that a realistic noise model is required for learning to denoise real JPEG images. A neural network trained on realistic noise outperforms the one trained with AWGN by 3 dB. An ablation study of our pipeline shows that simulating denoising and demosaicking is important to this improvement and that realistic demosaicking algorithms, which have been rarely considered, is needed. We believe this simulation will also be useful for other image reconstruction tasks, and we will distribute our code publicly. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1902.09383 [pdf, other]

Data augmentation using learned transformations for one-shot medical image segmentation

Authors: Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

Abstract: Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such… ▽ More Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm. △ Less

Submitted 6 April, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: 9 pages, CVPR 2019

arXiv:1811.03942 [pdf, ps, other]

Decidability, arithmetic subsequences and eigenvalues of morphic subshifts

Authors: Fabien Durand, Valérie Goyheneche

Abstract: We prove decidability results on the existence of constant subsequences of uniformly recurrent morphic sequences along arithmetic progressions. We use spectral properties of the subshifts they generate to give a first algorithm deciding whether, given p $\in$ N, there exists such a constant subsequence along an arithmetic progression of common difference p. In the special case of uniformly recurre… ▽ More We prove decidability results on the existence of constant subsequences of uniformly recurrent morphic sequences along arithmetic progressions. We use spectral properties of the subshifts they generate to give a first algorithm deciding whether, given p $\in$ N, there exists such a constant subsequence along an arithmetic progression of common difference p. In the special case of uniformly recurrent automatic sequences we explicitely describe the sets of such p by means of automata. △ Less

Submitted 16 November, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

arXiv:1810.09718 [pdf, other]

doi 10.1145/3197517.3201378

Single-Image SVBRDF Capture with a Rendering-Aware Deep Network

Authors: Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, Adrien Bousseau

Abstract: Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in single pictures. Yet, recovering spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a single image based on such cues has challenged researchers in computer graphics for decades. We tackle lightweight appearance capture by training a deep neural network… ▽ More Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in single pictures. Yet, recovering spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a single image based on such cues has challenged researchers in computer graphics for decades. We tackle lightweight appearance capture by training a deep neural network to automatically extract and make sense of these visual cues. Once trained, our network is capable of recovering per-pixel normal, diffuse albedo, specular albedo and specular roughness from a single picture of a flat surface lit by a hand-held flash. We achieve this goal by introducing several innovations on training data acquisition and network design. For training, we leverage a large dataset of artist-created, procedural SVBRDFs which we sample and render under multiple lighting directions. We further amplify the data by material mixing to cover a wide diversity of shading effects, which allows our network to work across many material classes. Motivated by the observation that distant regions of a material sample often offer complementary visual cues, we design a network that combines an encoder-decoder convolutional track for local feature extraction with a fully-connected track for global feature extraction and propagation. Many important material effects are view-dependent, and as such ambiguous when observed in a single image. We tackle this challenge by defining the loss as a differentiable SVBRDF similarity metric that compares the renderings of the predicted maps against renderings of the ground truth from several lighting and viewing directions. Combined together, these novel ingredients bring clear improvement over state of the art methods for single-shot capture of spatially varying BRDFs. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: 15 pages, presented at Siggraph 2018

ACM Class: I.3

Journal ref: ACM Trans. Graph. 37, 4, Article 128 (August 2018), 15 pages

arXiv:1809.07851 [pdf, other]

FFT Convolutions are Faster than Winograd on Modern CPUs, Here is Why

Authors: Aleksandar Zlateski, Zhen Jia, Kai Li, Fredo Durand

Abstract: Winograd-based convolution has quickly gained traction as a preferred approach to implement convolutional neural networks (ConvNet) on various hardware platforms because it requires fewer floating point operations than FFT-based or direct convolutions. This paper compares three highly optimized implementations (regular FFT--, Gauss--FFT--, and Winograd--based convolutions) on modern multi-- and… ▽ More Winograd-based convolution has quickly gained traction as a preferred approach to implement convolutional neural networks (ConvNet) on various hardware platforms because it requires fewer floating point operations than FFT-based or direct convolutions. This paper compares three highly optimized implementations (regular FFT--, Gauss--FFT--, and Winograd--based convolutions) on modern multi-- and many--core CPUs. Although all three implementations employed the same optimizations for modern CPUs, our experimental results with two popular ConvNets (VGG and AlexNet) show that the FFT--based implementations generally outperform the Winograd--based approach, contrary to the popular belief. To understand the results, we use a Roofline performance model to analyze the three implementations in detail, by looking at each of their computation phases and by considering not only the number of floating point operations, but also the memory bandwidth and the cache sizes. The performance analysis explains why, and under what conditions, the FFT--based implementations outperform the Winograd--based one, on modern CPUs. △ Less

Submitted 20 September, 2018; originally announced September 2018.

arXiv:1807.10441 [pdf, other]

doi 10.1109/PacificVis52677.2021.00012

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

Authors: Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand

Abstract: Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `understanding the financial crisis'. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text… ▽ More Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `understanding the financial crisis'. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons'. To bridge this representation gap, we propose a synthetic data generation strategy: we augment background patches in infographics from our Visually29K dataset with Internet-scraped icons which we use as training data for an icon proposal mechanism. On a test set of 1K annotated infographics, icons are located with 38% precision and 34% recall (the best model trained with natural images achieves 14% precision and 7% recall). Combining our icon proposals with icon classification and text extraction, we present a multi-modal summarization application. Our application takes an infographic as input and automatically produces text tags and visual hashtags that are textually and visually representative of the infographic's topics respectively. △ Less

Submitted 27 July, 2018; originally announced July 2018.

arXiv:1806.04891 [pdf, other]

Decidability of the isomorphism and the factorization between minimal substitution subshifts

Authors: Fabien Durand, Julien Leroy

Abstract: Classification is a central problem for dynamical systems, in particular for families that arise in a wide range of topics, like substitution subshifts. It is important to be able to distinguish whether two such subshifts are isomorphic, but the existing invariants are not sufficient for this purpose. We first show that given two minimal substitution subshifts, there exists a computable constant… ▽ More Classification is a central problem for dynamical systems, in particular for families that arise in a wide range of topics, like substitution subshifts. It is important to be able to distinguish whether two such subshifts are isomorphic, but the existing invariants are not sufficient for this purpose. We first show that given two minimal substitution subshifts, there exists a computable constant $R$ such that any factor map between these subshifts (if any) is the composition of a factor map with a radius smaller than $R$ and some power of the shift map. Then we prove that it is decidable to check whether a given sliding block code is a factor map between two prescribed minimal substitution subshifts. As a consequence of these two results, we provide an algorithm that, given two minimal substitution subshifts, decides whether one is a factor of the other and, as a straightforward corollary, whether they are isomorphic. △ Less

Submitted 23 August, 2022; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: 65 pages

MSC Class: 37B10; 68R15

arXiv:1804.07739 [pdf, other]

Synthesizing Images of Humans in Unseen Poses

Authors: Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag

Abstract: We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a… ▽ More We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: CVPR 2018

arXiv:1804.02684 [pdf, other]

Learning-based Video Motion Magnification

Authors: Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Frédo Durand, William T. Freeman, Wojciech Matusik

Abstract: Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the magnification results are prone to noise or excessive blurring. The state of the art relies on hand-designed filters to extract representations that may not be op… ▽ More Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the magnification results are prone to noise or excessive blurring. The state of the art relies on hand-designed filters to extract representations that may not be optimal. In this paper, we seek to learn the filters directly from examples using deep convolutional neural networks. To make training tractable, we carefully design a synthetic dataset that captures small motion well, and use two-frame input for training. We show that the learned filters achieve high-quality results on real videos, with less ringing artifacts and better noise characteristics than previous methods. While our model is not trained with temporal filters, we found that the temporal filters can be used with our extracted representations up to a moderate magnification, enabling a frequency-based motion selection. Finally, we analyze the learned filters and show that they behave similarly to the derivative filters used in previous works. Our code, trained model, and datasets will be available online. △ Less

Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

Comments: Accepted as ECCV 2018 Oral. The 1st and 2nd authors equally contributed. Video result: https://youtu.be/GrMLeEcSNzY , Project page: http://people.csail.mit.edu/tiam/deepmag/ Some bibliography information was fixed

arXiv:1709.09215 [pdf, other]

Understanding Infographics through Textual and Visual Tag Prediction

Authors: Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva

Abstract: We introduce the problem of visual hashtag discovery for infographics: extracting visual elements from an infographic that are diagnostic of its topic. Given an infographic as input, our computational approach automatically outputs textual and visual elements predicted to be representative of the infographic content. Concretely, from a curated dataset of 29K large infographic images sampled across… ▽ More We introduce the problem of visual hashtag discovery for infographics: extracting visual elements from an infographic that are diagnostic of its topic. Given an infographic as input, our computational approach automatically outputs textual and visual elements predicted to be representative of the infographic content. Concretely, from a curated dataset of 29K large infographic images sampled across 26 categories and 391 tags, we present an automated two step approach. First, we extract the text from an infographic and use it to predict text tags indicative of the infographic content. And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i.e. visual hashtags. We report performances on a categorization and multi-label tag prediction problem and compare our proposed visual hashtags to human annotations. △ Less

Submitted 26 September, 2017; originally announced September 2017.

arXiv:1708.02660 [pdf, other]

doi 10.1145/3126594.3126653

Learning Visual Importance for Graphic Designs and Data Visualizations

Authors: Zoya Bylinskii, Nam Wook Kim, Peter O'Donovan, Sami Alsheikh, Spandan Madan, Hanspeter Pfister, Fredo Durand, Bryan Russell, Aaron Hertzmann

Abstract: Knowing where people look and click on visual designs can provide clues about how the designs are perceived, and where the most important or relevant content lies. The most important content of a visual design can be used for effective summarization or to facilitate retrieval from a database. We present automated models that predict the relative importance of different elements in data visualizati… ▽ More Knowing where people look and click on visual designs can provide clues about how the designs are perceived, and where the most important or relevant content lies. The most important content of a visual design can be used for effective summarization or to facilitate retrieval from a database. We present automated models that predict the relative importance of different elements in data visualizations and graphic designs. Our models are neural networks trained on human clicks and importance annotations on hundreds of designs. We collected a new dataset of crowdsourced importance, and analyzed the predictions of our models with respect to ground truth importance and human eye movements. We demonstrate how such predictions of importance can be used for automatic design retargeting and thumbnailing. User studies with hundreds of MTurk participants validate that, with limited post-processing, our importance-driven applications are on par with, or outperform, current state-of-the-art methods, including natural image saliency. We also provide a demonstration of how our importance predictions can be built into interactive design tools to offer immediate feedback during the design process. △ Less

Submitted 8 August, 2017; originally announced August 2017.

ACM Class: H.5.1

Journal ref: UIST 2017

arXiv:1707.02880 [pdf, other]

doi 10.1145/3072959.3073592

Deep Bilateral Learning for Real-Time Image Enhancement

Authors: Michaël Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, Frédo Durand

Abstract: Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutiona… ▽ More Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher. △ Less

Submitted 22 August, 2017; v1 submitted 10 July, 2017; originally announced July 2017.

Comments: 12 pages, 14 figures, Siggraph 2017

Journal ref: ACM Trans. Graph. 36, 4, Article 118 (2017)

arXiv:1702.05150 [pdf, other]

doi 10.1145/3131275

BubbleView: an interface for crowdsourcing image importance maps and tracking visual attention

Authors: Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, Hanspeter Pfister

Abstract: In this paper, we present BubbleView, an alternative methodology for eye tracking using discrete mouse clicks to measure which information people consciously choose to examine. BubbleView is a mouse-contingent, moving-window interface in which participants are presented with a series of blurred images and click to reveal "bubbles" - small, circular areas of the image at original resolution, simila… ▽ More In this paper, we present BubbleView, an alternative methodology for eye tracking using discrete mouse clicks to measure which information people consciously choose to examine. BubbleView is a mouse-contingent, moving-window interface in which participants are presented with a series of blurred images and click to reveal "bubbles" - small, circular areas of the image at original resolution, similar to having a confined area of focus like the eye fovea. Across 10 experiments with 28 different parameter combinations, we evaluated BubbleView on a variety of image types: information visualizations, natural images, static webpages, and graphic designs, and compared the clicks to eye fixations collected with eye-trackers in controlled lab settings. We found that BubbleView clicks can both (i) successfully approximate eye fixations on different images, and (ii) be used to rank image and design elements by importance. BubbleView is designed to collect clicks on static images, and works best for defined tasks such as describing the content of an information visualization or measuring image importance. BubbleView data is cleaner and more consistent than related methodologies that use continuous mouse movements. Our analyses validate the use of mouse-contingent, moving-window methodologies as approximating eye fixations for different image and task types. △ Less

Submitted 9 August, 2017; v1 submitted 16 February, 2017; originally announced February 2017.

Journal ref: TOCHI 2017

arXiv:1612.04007 [pdf, other]

A Video-Based Method for Objectively Rating Ataxia

Authors: Ronnachai Jaroensri, Amy Zhao, Guha Balakrishnan, Derek Lo, Jeremy Schmahmann, John Guttag, Fredo Durand

Abstract: For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an a… ▽ More For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an automated method for quantifying the severity of motion impairment in patients with ataxia, using only video recordings. We consider videos of the finger-to-nose test, a common movement task used as part of the assessment of ataxia progression during the course of routine clinical checkups. Our method uses neural network-based pose estimation and optical flow techniques to track the motion of the patient's hand in a video recording. We extract features that describe qualities of the motion such as speed and variation in performance. Using labels provided by an expert clinician, we train a supervised learning model that predicts severity according to the Brief Ataxia Rating Scale (BARS). The performance of our system is comparable to that of a group of ataxia specialists in terms of mean error and correlation, and our system's predictions were consistently within the range of inter-rater variability. This work demonstrates the feasibility of using computer vision and machine learning to produce consistent and clinically useful measures of motor impairment. △ Less

Submitted 7 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: MLHC 2017

arXiv:1610.05577 [pdf, ps, other]

The constant of recognizability is computable for primitive morphisms

Authors: Fabien Durand, Julien Leroy

Abstract: Mossé proved that primitive morphisms are recognizable. In this paper we give a computable upper bound for the constant of recognizability of such a morphism. This bound can be expressed only using the cardinality of the alphabet and the length of the longest image under the morphism of a letter. Mossé proved that primitive morphisms are recognizable. In this paper we give a computable upper bound for the constant of recognizability of such a morphism. This bound can be expressed only using the cardinality of the alphabet and the length of the longest image under the morphism of a letter. △ Less

Submitted 18 October, 2016; originally announced October 2016.

MSC Class: 68R15

arXiv:1610.02769 [pdf, other]

Inverse Diffusion Curves using Shape Optimization

Authors: Shuang Zhao, Fredo Durand, Changxi Zheng

Abstract: The inverse diffusion curve problem focuses on automatic creation of diffusion curve images that resemble user provided color fields. This problem is challenging since the 1D curves have a nonlinear and global impact on resulting color fields via a partial differential equation (PDE). We introduce a new approach complementary to previous methods by optimizing curve geometry. In particular, we prop… ▽ More The inverse diffusion curve problem focuses on automatic creation of diffusion curve images that resemble user provided color fields. This problem is challenging since the 1D curves have a nonlinear and global impact on resulting color fields via a partial differential equation (PDE). We introduce a new approach complementary to previous methods by optimizing curve geometry. In particular, we propose a novel iterative algorithm based on the theory of shape derivatives. The resulting diffusion curves are clean and well-shaped, and the final image closely approximates the input. Our method provides a user-controlled parameter to regularize curve complexity, and generalizes to handle input color fields represented in a variety of formats. △ Less

Submitted 9 October, 2016; originally announced October 2016.

Comments: 14 pages

arXiv:1604.03605 [pdf, other]

What do different evaluation metrics tell us about saliency models?

Authors: Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand

Abstract: How best to evaluate a saliency model's ability to predict where humans look in images is an open research question. The choice of evaluation metric depends on how saliency is defined and how the ground truth is represented. Metrics differ in how they rank saliency models, and this results from how false positives and false negatives are treated, whether viewing biases are accounted for, whether s… ▽ More How best to evaluate a saliency model's ability to predict where humans look in images is an open research question. The choice of evaluation metric depends on how saliency is defined and how the ground truth is represented. Metrics differ in how they rank saliency models, and this results from how false positives and false negatives are treated, whether viewing biases are accounted for, whether spatial deviations are factored in, and how the saliency maps are pre-processed. In this paper, we provide an analysis of 8 different evaluation metrics and their properties. With the help of systematic experiments and visualizations of metric computations, we add interpretability to saliency scores and more transparency to the evaluation of saliency models. Building off the differences in metric properties and behaviors, we make recommendations for metric selections under specific assumptions and for specific applications. △ Less

Submitted 6 April, 2017; v1 submitted 12 April, 2016; originally announced April 2016.

arXiv:1511.01303 [pdf, other]

doi 10.1007/978-3-319-23114-3_12

Geometry on the Utility Space

Authors: François Durand, Benoît Kloeckner, Fabien Mathieu, Ludovic Noirie

Abstract: We study the geometrical properties of the utility space (the space of expected utilities over a finite set of options), which is commonly used to model the preferences of an agent in a situation of uncertainty. We focus on the case where the model is neutral with respect to the available options, i.e. treats them, a priori, as being symmetrical from one another. Specifically, we prove that the on… ▽ More We study the geometrical properties of the utility space (the space of expected utilities over a finite set of options), which is commonly used to model the preferences of an agent in a situation of uncertainty. We focus on the case where the model is neutral with respect to the available options, i.e. treats them, a priori, as being symmetrical from one another. Specifically, we prove that the only Riemannian metric that respects the geometrical properties and the natural symmetries of the utility space is the round metric. This canonical metric allows to define a uniform probability over the utility space and to naturally generalize the Impartial Culture to a model with expected utilities. △ Less

Submitted 4 November, 2015; originally announced November 2015.

Comments: in Fourth International Conference on Algorithmic Decision Theory, Sep 2015, Lexington, United States. pp.16, 2015, Fourth International Conference on Algorithmic Decision Theory

arXiv:1208.6376 [pdf, ps, other]

Towards a statement of the S-adic conjecture through examples

Authors: Fabien Durand, Julien Leroy, Gwénaël Richomme

Abstract: The $S$-adic conjecture claims that there exists a condition $C$ such that a sequence has a sub-linear complexity if and only if it is an $S$-adic sequence satisfying Condition $C$ for some finite set $S$ of morphisms. We present an overview of the factor complexity of $S$-adic sequences and we give some examples that either illustrate some interesting properties or that are counter-examples to wh… ▽ More The $S$-adic conjecture claims that there exists a condition $C$ such that a sequence has a sub-linear complexity if and only if it is an $S$-adic sequence satisfying Condition $C$ for some finite set $S$ of morphisms. We present an overview of the factor complexity of $S$-adic sequences and we give some examples that either illustrate some interesting properties or that are counter-examples to what could be believed to be "a good Condition $C$". △ Less

Submitted 31 August, 2012; originally announced August 2012.

Comments: 25

arXiv:1204.6455 [pdf, other]

On the Manipulability of Voting Systems: Application to Multi-Carrier Networks

Authors: François Durand, Fabien Mathieu, Ludovic Noirie

Abstract: Today, Internet involves many actors who are making revenues on it (operators, companies, service providers,...). It is therefore important to be able to make fair decisions in this large-scale and highly competitive economical ecosystem. One of the main issues is to prevent actors from manipulating the natural outcome of the decision process. For that purpose, game theory is a natural framework.… ▽ More Today, Internet involves many actors who are making revenues on it (operators, companies, service providers,...). It is therefore important to be able to make fair decisions in this large-scale and highly competitive economical ecosystem. One of the main issues is to prevent actors from manipulating the natural outcome of the decision process. For that purpose, game theory is a natural framework. In that context, voting systems represent an interesting alternative that, to our knowledge, has not yet been considered. They allow competing entities to decide among different options. Strong theoretical results showed that all voting systems are susceptible to be manipulated by one single voter, except for some "degenerated" and non-acceptable cases. However, very little is known about how much a voting system is manipulable in practical scenarios. In this paper, we investigate empirically the use of voting systems for choosing end-to-end paths in multi-carrier networks, analyzing their manipulability and their economical efficiency. We show that one particular system, called \Single Transferable Vote (STV), is largely more resistant to manipulability than the natural system which tries to get the economical optimum. Moreover, STV manages to select paths close to the economical optimum, whether the participants try to cheat or not. △ Less

Submitted 29 April, 2012; originally announced April 2012.

Journal ref: N° 2012-04-001 (2012)

arXiv:1204.5393 [pdf, ps, other]

Decidability of uniform recurrence of morphic sequences

Authors: Fabien Durand

Abstract: We prove that the uniform recurrence of morphic sequences is decidable. For this we show that the number of derived sequences of uniformly recurrent morphic sequences is bounded. As a corollary we obtain that uniformly recurrent morphic sequences are primitive substitutive sequences. We prove that the uniform recurrence of morphic sequences is decidable. For this we show that the number of derived sequences of uniformly recurrent morphic sequences is bounded. As a corollary we obtain that uniformly recurrent morphic sequences are primitive substitutive sequences. △ Less

Submitted 31 August, 2012; v1 submitted 24 April, 2012; originally announced April 2012.

arXiv:1111.3268 [pdf, ps, other]

Decidability of the HD0L ultimate periodicity problem

Authors: Fabien Durand

Abstract: In this paper we prove the decidability of the HD0L ultimate periodicity problem. In this paper we prove the decidability of the HD0L ultimate periodicity problem. △ Less

Submitted 2 January, 2013; v1 submitted 14 November, 2011; originally announced November 2011.

Showing 1–50 of 56 results for author: Durand, F