Skip to main content

Showing 1–50 of 100 results for author: Efros, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14314  [pdf, other

    cs.CL cs.AI

    Identifying User Goals from UI Trajectories

    Authors: Omri Berkovitch, Sapir Caduri, Noam Kahlon, Anatoly Efros, Avi Caciularu, Ido Dagan

    Abstract: Autonomous agents that interact with graphical user interfaces (GUIs) hold significant potential for enhancing user experiences. To further improve these experiences, agents need to be personalized and proactive. By effectively comprehending user intentions through their actions and interactions with GUIs, agents will be better positioned to achieve these goals. This paper introduces the task of g… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.09417  [pdf, other

    cs.CV cs.GR cs.LG

    Rethinking Score Distillation as a Bridge Between Image Distributions

    Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sds-bridge.github.io/

  3. arXiv:2406.09413  [pdf, other

    cs.CV cs.GR cs.LG

    Interpreting the Weight Space of Customized Diffusion Models

    Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman

    Abstract: We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/weights2weights

  4. arXiv:2406.09408  [pdf, other

    cs.CV cs.LG

    Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

    Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang

    Abstract: The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. We can define "influence" by saying that, for a given output, if a model is retrained from scratch without that output's most influential images, the model should then fail to generate that output image. Unfortunately, directly searching for these influential… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://peterwang512.github.io/AttributeByUnlearning Code: https://github.com/PeterWang512/AttributeByUnlearning

  5. arXiv:2406.04341  [pdf, other

    cs.CV

    Interpreting the Second-Order Effects of Neurons in CLIP

    Authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

    Abstract: We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the l… ▽ More

    Submitted 23 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: project page: https://yossigandelsman.github.io/clip_neurons/index.html

  6. arXiv:2405.10320  [pdf, other

    cs.CV

    Toon3D: Seeing Cartoons from a New Perspective

    Authors: Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa

    Abstract: In this work, we recover the underlying 3D structure of non-geometrically consistent scenes. We focus our analysis on hand-drawn images from cartoons and anime. Many cartoons are created by artists without a 3D rendering engine, which means that any new image of a scene is hand-drawn. The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Please see our project page: https://toon3d.studio

  7. arXiv:2402.16936  [pdf, other

    cs.CV cs.LG

    Disentangled 3D Scene Generation with Layout Learning

    Authors: Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A. Efros, Aleksander Holynski

    Abstract: We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jo… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  8. arXiv:2401.14391  [pdf, other

    cs.CV

    Rethinking Patch Dependence for Masked Autoencoders

    Authors: Letian Fu, Long Lian, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell, Alexei A. Efros, Ken Goldberg

    Abstract: In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE). We decompose this decoding mechanism for masked patch reconstruction in MAE into self-attention and cross-attention. Our investigations suggest that self-attention between mask patches is not essential for learning good representations. To this end, we propose a novel pretraining framework:… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  9. arXiv:2401.10889  [pdf, other

    cs.CV cs.AI

    Synthesizing Moving People with 3D Control

    Authors: Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

    Abstract: In this paper, we present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence. Our approach has two core components: a) learning priors about invisible parts of the human body and clothing, and b) rendering novel body poses with proper clothing and texture. For the first part, we learn an in-filling diffusion model to hallucinate unseen… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  10. arXiv:2312.07504  [pdf, other

    cs.CV

    COLMAP-Free 3D Gaussian Splatting

    Authors: Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang

    Abstract: While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure an… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://oasisyang.github.io/colmap-free-3dgs

  11. arXiv:2312.00785  [pdf, other

    cs.CV

    Sequential Modeling Enables Scalable Learning for Large Vision Models

    Authors: Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

    Abstract: We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Website: https://yutongbai.com/lvm.html

  12. arXiv:2311.01462  [pdf, other

    cs.CV cs.LG

    Idempotent Generative Network

    Authors: Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros

    Abstract: We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely $f(f(z))=f(z)$. The proposed model $f$ is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the followi… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  13. arXiv:2310.05916  [pdf, other

    cs.CV cs.AI

    Interpreting CLIP's Image Representation via Text-Based Decomposition

    Authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

    Abstract: We investigate the CLIP image encoder by analyzing how individual model components affect the final representation. We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands. Interpreting the attention heads, we characterize each head's role by automatically finding text representa… ▽ More

    Submitted 28 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Project page and code: https://yossigandelsman.github.io/clip_decomposition/

  14. arXiv:2307.05473  [pdf, other

    cs.CV

    Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives

    Authors: Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry

    Abstract: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate a… ▽ More

    Submitted 26 December, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Project webpage with code and videos: https://www.tmonnier.com/DBW. V2 update includes comparisons based on NeuS, hyperparameter analysis and failure cases

  15. arXiv:2307.05014  [pdf, other

    cs.CV cs.LG

    Test-Time Training on Video Streams

    Authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

    Abstract: Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case -… ▽ More

    Submitted 12 July, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Project website with videos, dataset and code: https://video-ttt.github.io/

  16. arXiv:2306.09346  [pdf, other

    cs.CV

    Rosetta Neurons: Mining the Common Units in a Model Zoo

    Authors: Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher

    Abstract: Do different neural networks, trained for various vision tasks, share some common representations? In this paper, we demonstrate the existence of common features we call "Rosetta Neurons" across a range of models with different architectures, different tasks (generative and discriminative), and different types of supervision (class-supervised, text-supervised, self-supervised). We present an algor… ▽ More

    Submitted 16 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page: https://yossigandelsman.github.io/rosetta_neurons/

  17. arXiv:2306.09345  [pdf, other

    cs.CV cs.LG

    Evaluating Data Attribution for Text-to-Image Models

    Authors: Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang

    Abstract: While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution throug… ▽ More

    Submitted 8 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Updated v2 -- ICCV 2023 camera ready version. Project page: https://peterwang512.github.io/GenDataAttribution Code: https://github.com/PeterWang512/GenDataAttribution

  18. arXiv:2306.00986  [pdf, other

    cs.CV cs.LG stat.ML

    Diffusion Self-Guidance for Controllable Image Generation

    Authors: Dave Epstein, Allan Jabri, Ben Poole, Alexei A. Efros, Aleksander Holynski

    Abstract: Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, locati… ▽ More

    Submitted 11 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Project page at https://dave.ml/selfguidance/

  19. arXiv:2305.01649  [pdf, other

    cs.CV cs.AI cs.LG

    Generalizing Dataset Distillation via Deep Generative Prior

    Authors: George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu

    Abstract: Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. Despite recent progress in the field, existing dataset distillation methods fail to generalize to new architectur… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; Project Page at https://georgecazenavette.github.io/glad Code at https://github.com/GeorgeCazenavette/glad

  20. arXiv:2304.14406  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

    Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, **gwan Lu, Alexei A. Efros, Krishna Kumar Singh

    Abstract: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. W… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page with code: https://sumith1896.github.io/affordance-insertion/

  21. arXiv:2303.12789  [pdf, other

    cs.CV cs.GR

    Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

    Authors: Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed meth… ▽ More

    Submitted 1 June, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Project website: https://instruct-nerf2nerf.github.io; v1. Revisions to related work and discussion

  22. arXiv:2302.14051  [pdf, other

    cs.LG cs.AI cs.CV cs.NE cs.RO

    Internet Explorer: Targeted Representation Learning on the Open Web

    Authors: Alexander C. Li, Ellis Brown, Alexei A. Efros, Deepak Pathak

    Abstract: Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than ho** our static datasets transfer to our d… ▽ More

    Submitted 6 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: In ICML 2023. Website at https://internet-explorer-ssl.github.io/

  23. arXiv:2211.09800  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    InstructPix2Pix: Learning to Follow Image Editing Instructions

    Authors: Tim Brooks, Aleksander Holynski, Alexei A. Efros

    Abstract: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large datase… ▽ More

    Submitted 18 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Project page with code: https://www.timothybrooks.com/instruct-pix2pix

  24. arXiv:2209.15007  [pdf, other

    cs.LG cs.AI cs.CV cs.NE cs.RO

    Understanding Collapse in Non-Contrastive Siamese Representation Learning

    Authors: Alexander C. Li, Alexei A. Efros, Deepak Pathak

    Abstract: Contrastive methods have led a recent surge in the performance of self-supervised representation learning (SSL). Recent methods like BYOL or SimSiam purportedly distill these contrastive methods down to their essence, removing bells and whistles, including the negative examples, that do not contribute to downstream performance. These "non-contrastive" methods work surprisingly well without using n… ▽ More

    Submitted 2 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Published at ECCV 2022. Project page at https://alexanderli.com/noncontrastive-ssl/

  25. arXiv:2209.12892  [pdf, other

    cs.LG cs.CV

    Learning to Learn with Generative Models of Neural Network Checkpoints

    Authors: William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

    Abstract: We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Code available at https://www.github.com/wpeebles/G.pt . Project page and videos available at https://www.wpeebles.com/Gpt

  26. arXiv:2209.07522  [pdf, other

    cs.CV cs.LG

    Test-Time Training with Masked Autoencoders

    Authors: Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros

    Abstract: Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Project page: https://yossigandelsman.github.io/ttt_mae/index.html

  27. arXiv:2209.02836  [pdf, other

    cs.CV cs.LG

    Studying Bias in GANs through the Lens of Race

    Authors: Vongani H. Maluleke, Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, Devin Guillory

    Abstract: In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets. By examining and controlling the racial distributions in various training datasets, we are able to observe the impacts of different training distributions on generated image quality and the racial distributions of the generated images. Our results… ▽ More

    Submitted 14 September, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: ECCV 2022. Project Page: https://neerja.me/bias-gans/

    ACM Class: I.4

  28. arXiv:2209.00647  [pdf, other

    cs.CV

    Visual Prompting via Image Inpainting

    Authors: Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

    Abstract: How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Project page: https://yossigandelsman.github.io/visual_prompt

  29. arXiv:2206.03429  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Generating Long Videos of Dynamic Scenes

    Authors: Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

    Abstract: We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never chan… ▽ More

    Submitted 9 June, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

  30. arXiv:2205.02837  [pdf, other

    cs.CV

    BlobGAN: Spatially Disentangled Scene Representations

    Authors: Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros

    Abstract: We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial unifor… ▽ More

    Submitted 29 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: ECCV 2022. Project webpage available at https://www.dave.ml/blobgan

  31. arXiv:2204.10310  [pdf, other

    cs.CV cs.GR

    Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

    Authors: Tom Monnier, Matthew Fisher, Alexei A. Efros, Mathieu Aubry

    Abstract: Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled image… ▽ More

    Submitted 25 July, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: ECCV 2022. Project webpage with code and videos: http://imagine.enpc.fr/~monniert/UNICORN/

  32. arXiv:2203.11932  [pdf, other

    cs.CV cs.AI cs.LG

    Dataset Distillation by Matching Training Trajectories

    Authors: George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu

    Abstract: Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that optimizes our distilled data to guide networks to a similar state as those trained on real data across many training steps. Given a network, we train it for several ite… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 website: https://georgecazenavette.github.io/mtt-distillation/ code: https://github.com/GeorgeCazenavette/mtt-distillation

  33. arXiv:2201.08379  [pdf, other

    cs.CV

    Learning Pixel Trajectories with Multiscale Contrastive Random Walks

    Authors: Zhangxing Bian, Allan Jabri, Alexei A. Efros, Andrew Owens

    Abstract: A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence. Yet, approaches that dominate each space differ. We take a step towards bridging this gap by extending the recent contrastive random walk formulation to much denser, pixel-level space-time graphs. The main contribution is introducing hierarch… ▽ More

    Submitted 4 April, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  34. arXiv:2112.06909  [pdf, other

    cs.CV

    Hallucinating Pose-Compatible Scenes

    Authors: Tim Brooks, Alexei A. Efros

    Abstract: What does human pose tell us about a scene? We propose a task to answer this question: given human pose as input, hallucinate a compatible scene. Subtle cues captured by human pose -- action semantics, environment affordances, object interactions -- provide surprising insight into which scenes are compatible. We present a large-scale generative adversarial network for pose-conditioned scene genera… ▽ More

    Submitted 30 September, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  35. arXiv:2112.05143  [pdf, other

    cs.CV

    GAN-Supervised Dense Visual Alignment

    Authors: William Peebles, Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei A. Efros, Eli Shechtman

    Abstract: We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. We apply our framework to the dense visual alignment problem. Inspired by the classic Congealing method, our GANgealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode.… ▽ More

    Submitted 4 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: An updated version of our CVPR 2022 paper (oral); v2 features additional references and minor text changes. Code available at https://www.github.com/wpeebles/gangealing . Project page and videos available at https://www.wpeebles.com/gangealing

  36. arXiv:2110.15904  [pdf, other

    cs.CV

    Learning Co-segmentation by Segment Swap** for Retrieval and Discovery

    Authors: Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry

    Abstract: The goal of this work is to efficiently identify visually similar patterns in images, e.g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts of a night-time photograph visible in its daytime counterpart. Lack of training data is a key challenge for this co-segmentation task. We present a simple yet surprisingly effective approach to overcome this d… ▽ More

    Submitted 27 March, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

    Comments: add results of unsupervised saliency detection

  37. arXiv:2110.02951  [pdf, other

    cs.CV cs.LG

    Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

    Authors: Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang

    Abstract: A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static. Given a sequence of video frames as input, the video autoencoder extracts a disentangled representation of the scene includ-… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted to ICCV 2021. Project page: https://zlai0.github.io/VideoAutoencoder

  38. arXiv:2104.14553  [pdf, other

    cs.CV

    MarioNette: Self-Supervised Sprite Learning

    Authors: Dmitriy Smirnov, Michael Gharbi, Matthew Fisher, Vitor Guizilini, Alexei A. Efros, Justin Solomon

    Abstract: Artists and video game designers often construct 2D animations using libraries of sprites -- textured patches of objects and characters. We propose a deep learning approach that decomposes sprite-based video animations into a disentangled representation of recurring graphic elements in a self-supervised manner. By jointly learning a dictionary of possibly transparent patches and training a network… ▽ More

    Submitted 20 October, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted to NeurIPS 2021

  39. arXiv:2104.06820  [pdf, other

    cs.CV cs.GR cs.LG

    Few-shot Image Generation via Cross-domain Correspondence

    Authors: Utkarsh Ojha, Yijun Li, **gwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang

    Abstract: Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance co… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  40. arXiv:2104.02687  [pdf, other

    cs.CV cs.AI cs.MM

    Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

    Authors: Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

    Abstract: We introduce a non-parametric approach for infinite video texture synthesis using a representation learned via contrastive learning. We take inspiration from Video Textures, which showed that plausible new videos could be generated from a single one by stitching its frames together in a novel yet consistent order. This classic work, however, was constrained by its use of hand-designed distance met… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Project website at https://medhini.github.io/audio_video_textures/

  41. arXiv:2012.09811  [pdf, other

    cs.RO cs.CV cs.LG

    Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

    Authors: Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

    Abstract: At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: Project page: https://sjtuzq.github.io/cycle_dynamics.html

  42. arXiv:2008.10599  [pdf, other

    cs.CV cs.GR cs.LG cs.NE

    The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

    Authors: William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba

    Abstract: Existing disentanglement methods for deep generative models rely on hand-picked priors and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal. We introduce a model-agnostic, unbiased stochastic approximation of this term based on Hutchinson's esti… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: ECCV 2020 (Spotlight). Code available at https://github.com/wpeebles/hessian_penalty . Project page and videos available at https://www.wpeebles.com/hessian-penalty

  43. arXiv:2008.05659  [pdf, other

    cs.CV

    What Should Not Be Contrastive in Contrastive Learning

    Authors: Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

    Abstract: Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations. However, these methods implicitly assume a particular set of representational invariances (e.g., invariance to color), and can perform poorly when a downstream task violates this assumption (e.g., distinguishing red vs. yel… ▽ More

    Submitted 18 March, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: Published as a conference paper at ICLR 2021

  44. arXiv:2008.02796  [pdf, other

    cs.CV cs.GR

    Learning to Factorize and Relight a City

    Authors: Andrew Liu, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, Noah Snavely

    Abstract: We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. Inspired by the classic intrinsic image decomposition, our learning signal builds upon two insights: 1) combining the disentangled factors should reconstruct the original image, and 2) the permanent factors should stay constant across multiple temporal samples of… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: ECCV 2020 (Spotlight). Supplemental Material attached

  45. arXiv:2007.15651  [pdf, other

    cs.CV cs.LG

    Contrastive Learning for Unpaired Image-to-Image Translation

    Authors: Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu

    Abstract: In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so -- maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature… ▽ More

    Submitted 20 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: ECCV 2020. Please visit https://taesungp.github.io/ContrastiveUnpairedTranslation/ for introduction videos and more. v3 contains typo fixes and citation update

  46. arXiv:2007.15068  [pdf, other

    cs.CV

    Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild

    Authors: Liqian Ma, Zhe Lin, Connelly Barnes, Alexei A. Efros, **gwan Lu

    Abstract: Due to the ubiquity of smartphones, it is popular to take photos of one's self, or "selfies." Such photos are convenient to take, because they do not require specialized equipment or a third-party photographer. However, in selfies, constraints such as human arm length often make the body pose look unnatural. To address this issue, we introduce $\textit{unselfie}$, a novel photographic transformati… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: To appear in ECCV 2020

  47. arXiv:2007.04309  [pdf, other

    cs.LG cs.CV cs.RO stat.ML

    Self-Supervised Policy Adaptation during Deployment

    Authors: Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

    Abstract: In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. O… ▽ More

    Submitted 8 April, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Website: https://nicklashansen.github.io/PAD/ Code: https://github.com/nicklashansen/policy-adaptation-during-deployment ICLR 2021

  48. arXiv:2007.00653  [pdf, other

    cs.CV cs.GR cs.LG

    Swap** Autoencoder for Deep Image Manipulation

    Authors: Taesung Park, Jun-Yan Zhu, Oliver Wang, **gwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

    Abstract: Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swap** Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components… ▽ More

    Submitted 14 December, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020. Please visit https://taesung.me/Swap**Autoencoder/ for an introductory video. v2 mainly contains reorganization of the Introduction and Broader Impact section

  49. arXiv:2006.14613  [pdf, other

    cs.CV cs.LG eess.IV

    Space-Time Correspondence as a Contrastive Random Walk

    Authors: Allan Jabri, Andrew Owens, Alexei A. Efros

    Abstract: This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines tra… ▽ More

    Submitted 3 December, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 camera ready version -- Code at github.com/ajabri/videowalk

  50. arXiv:2004.01526  [pdf, other

    cs.CV

    RANSAC-Flow: generic two-stage image alignment

    Authors: Xi Shen, François Darmon, Alexei A. Efros, Mathieu Aubry

    Abstract: This paper considers the generic problem of dense alignment between two images, whether they be two frames of a video, two widely different views of a scene, two paintings depicting similar content, etc. Whereas each such task is typically addressed with a domain-specific solution, we show that a simple unsupervised approach performs surprisingly well across a range of tasks. Our main insight is t… ▽ More

    Submitted 17 July, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Accepted to ECCV 2020 as a spotlight. Project page: http://imagine.enpc.fr/~shenx/RANSAC-Flow/