Skip to main content

Showing 1–50 of 76 results for author: Isola, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14813  [pdf, other

    cs.LG

    Scalable Optimization in the Modular Norm

    Authors: Tim Large, Yang Liu, Minyoung Huh, Hyo** Bahng, Phillip Isola, Jeremy Bernstein

    Abstract: To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ram** up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize thi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2405.07987  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    The Platonic Representation Hypothesis

    Authors: Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola

    Abstract: We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure dis… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Equal contributions

  3. arXiv:2402.16828  [pdf, other

    cs.LG cs.AI cs.CV

    Training Neural Networks from Scratch with Parallel Low-Rank Adapters

    Authors: Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal

    Abstract: The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoR… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  4. arXiv:2401.01862  [pdf, other

    cs.CV cs.CL cs.LG

    A Vision Check-up for Language Models

    Authors: Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, Antonio Torralba

    Abstract: What does learning to model relationships between strings teach large language models (LLMs) about the visual world? We systematically evaluate LLMs' abilities to generate and recognize an assortment of visual concepts of increasing complexity and then demonstrate how a preliminary visual representation learning system can be trained using models of text. As language models lack the ability to con… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  5. arXiv:2312.17742  [pdf, other

    cs.CV

    Learning Vision from Models Rivals Learning Vision from Data

    Authors: Yonglong Tian, Lijie Fan, Kaifeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola

    Abstract: We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions, without any real data. We synthesize a large dataset of image captions using LLMs, then use an off-the-shelf text-to-image model to generate multiple images corresponding to each synthetic caption. We perform visual representation learning on these synthetic images vi… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/google-research/syn-rep-learn

  6. arXiv:2312.04567  [pdf, other

    cs.CV

    Scaling Laws of Synthetic Images for Model Training ... for Now

    Authors: Lijie Fan, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian

    Abstract: Recent significant advances in text-to-image models unlock the possibility of training vision systems using synthetic images, potentially overcoming the difficulty of collecting curated data at scale. It is unclear, however, how these models behave at scale, as more synthetic data is added to the training set. In this paper we study the scaling laws of synthetic images generated by state of the ar… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  7. arXiv:2311.03736  [pdf, other

    cs.AI cs.LG cs.MA

    Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

    Authors: Joseph Suárez, Phillip Isola, Kyoung Whan Choe, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

    Abstract: Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  8. arXiv:2311.03707  [pdf, other

    cs.AI cs.LG cs.MA

    The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade

    Authors: Enhong Liu, Joseph Suarez, Chenhui You, Bo Wu, Bingcheng Chen, Jun Hu, Jiaxin Chen, Xiaolong Zhu, Clare Zhu, Julian Togelius, Sharada Mohanty, Weijun Hong, Rui Du, Yibing Zhang, Qinwen Wang, Xinhang Li, Zheng Yuan, Xiang Li, Yuejia Huang, Kun Zhang, Hanhui Yang, Shiqi Tang, Phillip Isola

    Abstract: In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which in… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  9. arXiv:2310.07889  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    LangNav: Language as a Perceptual Representation for Navigation

    Authors: Bowen Pan, Rameswar Panda, SouYoung **, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim

    Abstract: We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, base… ▽ More

    Submitted 30 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  10. arXiv:2308.15802  [pdf, other

    cs.AI

    Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

    Authors: Yangkun Chen, Joseph Suarez, Junjie Zhang, Chenghui Yu, Bo Wu, Hanmo Chen, Hengman Zhu, Rui Du, Shanliang Qian, Shuai Liu, Weijun Hong, **ke He, Yibing Zhang, Liang Zhao, Clare Zhu, Julian Togelius, Sharada Mohanty, Jiaxin Chen, Xiu Li, Xiaolong Zhu, Phillip Isola

    Abstract: We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  11. arXiv:2308.07931  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

    Authors: William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola

    Abstract: Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantic… ▽ More

    Submitted 29 December, 2023; v1 submitted 27 July, 2023; originally announced August 2023.

    Comments: Project website at https://f3rm.csail.mit.edu, Accepted at the 7th Annual Conference on Robot Learning (CoRL), 2023 in Atlanta, US

  12. arXiv:2306.09344  [pdf, other

    cs.CV cs.LG

    DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

    Authors: Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, Phillip Isola

    Abstract: Current perceptual similarity metrics operate at the level of pixels and patches. These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities and differences in image layout, object pose, and semantic content. In this paper, we develop a perceptual metric that assesses images holistically. Our first step is to collect a new dataset of hu… ▽ More

    Submitted 8 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Website: https://dreamsim-nights.github.io/ Code: https://github.com/ssundaram21/dreamsim

  13. arXiv:2306.04738  [pdf, other

    cs.CV cs.AI

    MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge

    Authors: Miriam Cha, Gregory Angelides, Mark Hamilton, Andy Soszynski, Brandon Swenson, Nathaniel Maidel, Phillip Isola, Taylor Perron, Bill Freeman

    Abstract: The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the mul… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  14. arXiv:2306.00984  [pdf, other

    cs.CV

    StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

    Authors: Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, Dilip Krishnan

    Abstract: We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-to-image models. We show that (1) when the generative model is configured wi… ▽ More

    Submitted 26 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: code is available at: https://github.com/google-research/syn-rep-learn

  15. arXiv:2305.20088  [pdf, other

    cs.CV cs.CL cs.LG

    Improving CLIP Training with Language Rewrites

    Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian

    Abstract: Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective and scalable methods for training transferable vision models using paired image and text data. CLIP models are trained using contrastive loss, which typically relies on data augmentations to prevent overfitting and shortcuts. However, in the CLIP training paradigm, data augmentations are exclusively applied to image… ▽ More

    Submitted 28 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  16. arXiv:2305.08842  [pdf, other

    cs.LG cs.AI

    Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

    Authors: Minyoung Huh, Brian Cheung, Pulkit Agrawal, Phillip Isola

    Abstract: This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment los… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  17. arXiv:2304.01203  [pdf, other

    cs.LG

    Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

    Authors: Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang

    Abstract: In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretica… ▽ More

    Submitted 26 November, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Project Page: https://www.tongzhouwang.info/quasimetric_rl/ Code: https://github.com/quasimetric-learning/quasimetric-rl/

    Journal ref: International Conference on Machine Learning (ICML) 2023

  18. arXiv:2303.13515  [pdf, other

    cs.CV cs.LG

    Persistent Nature: A Generative Model of Unbounded 3D Worlds

    Authors: Lucy Chai, Richard Tucker, Zhengqi Li, Phillip Isola, Noah Snavely

    Abstract: Despite increasingly realistic image quality, recent 3D image generative models often operate on 3D volumes of fixed extent with limited camera motions. We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. Our scene representation consists of an extendable, planar scene layout grid,… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR camera ready version, project page: https://chail.github.io/persistent-nature/

  19. arXiv:2302.11349  [pdf, other

    cs.CV

    Steerable Equivariant Representation Learning

    Authors: Sangnie Bhardwaj, Willie McClinton, Tongzhou Wang, Guillaume Lajoie, Chen Sun, Phillip Isola, Dilip Krishnan

    Abstract: Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval, and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image tran… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  20. arXiv:2212.06088  [pdf, other

    cs.RO

    MIRA: Mental Imagery for Robotic Affordances

    Authors: Lin Yen-Chen, Pete Florence, Andy Zeng, Jonathan T. Barron, Yilun Du, Wei-Chiu Ma, Anthony Simeonov, Alberto Rodriguez Garcia, Phillip Isola

    Abstract: Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build art… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: CoRL 2022, webpage: https://yenchenlin.me/mira

  21. arXiv:2211.16412  [pdf, other

    cs.CV cs.LG

    Procedural Image Programs for Representation Learning

    Authors: Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola

    Abstract: Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias. Existing work focuses on a handful of curated generative processes which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with a large dataset of twenty-one thousand programs, eac… ▽ More

    Submitted 6 November, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: 29 pages, Accepted in the Conference on Neural Information Processing Systems 2022 (NeurIPS 2022)

    Journal ref: NeurIPS 2022

  22. arXiv:2211.15120  [pdf, other

    cs.LG

    Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

    Authors: Tongzhou Wang, Phillip Isola

    Abstract: Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications. Imposing such quasimetric structures in model representations has been shown to improve many tasks, including reinforcement learning (RL) and causal relation learning. In this work, we present four desirable properties in such quasimetric models, and show how… ▽ More

    Submitted 5 January, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 NeurReps Workshop Proceedings Track

  23. arXiv:2211.13051  [pdf, other

    cs.AI cs.LG

    Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

    Authors: Kevin Frans, Phillip Isola

    Abstract: One of the grand challenges of reinforcement learning is the ability to generalize to new tasks. However, general agents require a set of rich, diverse tasks to train on. Designing a `foundation environment' for such tasks is tricky -- the ideal environment would support a range of emergent phenomena, an expressive task space, and fast runtime. To take a step towards addressing this research bottl… ▽ More

    Submitted 15 October, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

  24. arXiv:2209.13032  [pdf, other

    cs.CV

    Totems: Physical Objects for Verifying Visual Integrity

    Authors: **gwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Ser-Nam Lim, Phillip Isola, Antonio Torralba

    Abstract: We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene. Totems bend and redirect light rays, thus providing multiple, albeit distorted, views of the scene within a single image. A defender can use these distorted totem pixels to detect if an image has been manipulated. Our approach… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 camera ready version; project page https://**gweim.github.io/totems/

  25. arXiv:2207.10074  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Semantic uncertainty intervals for disentangled latent spaces

    Authors: Swami Sankaranarayanan, Anastasios N. Angelopoulos, Stephen Bates, Yaniv Romano, Phillip Isola

    Abstract: Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained chal… ▽ More

    Submitted 30 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2022. Project page: https://swamiviv.github.io/semantic_uncertainty_intervals/

  26. arXiv:2207.07033  [pdf, other

    cs.AI cs.CY

    Develo** a Series of AI Challenges for the United States Department of the Air Force

    Authors: Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron , et al. (17 additional authors not shown)

    Abstract: Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  27. arXiv:2206.15478  [pdf, other

    cs.LG

    On the Learning and Learnability of Quasimetrics

    Authors: Tongzhou Wang, Phillip Isola

    Abstract: Our world is full of asymmetries. Gravity and wind can make reaching a place easier than coming back. Social artifacts such as genealogy charts and citation graphs are inherently directed. In reinforcement learning and control, optimal goal-reaching strategies are rarely reversible (symmetrical). Distance functions supported on these asymmetrical structures are called quasimetrics. Despite their c… ▽ More

    Submitted 3 October, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Project page: https://ssnl.github.io/quasimetric/ Code: https://github.com/SsnL/poisson_quasimetric_embedding

    Journal ref: International Conference on Learning Representations (ICLR) 2022

  28. arXiv:2206.15477  [pdf, other

    cs.LG

    Denoised MDPs: Learning World Models Better Than the World Itself

    Authors: Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian

    Abstract: The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types… ▽ More

    Submitted 6 April, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Project page: https://ssnl.github.io/denoised_mdp/ Code: https://github.com/facebookresearch/denoised_mdp

  29. arXiv:2204.07649  [pdf, other

    cs.CV

    MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge

    Authors: Miriam Cha, Kuan Wei Huang, Morgan Schmidt, Gregory Angelides, Mark Hamilton, Sam Goldberg, Armando Cabrera, Phillip Isola, Taylor Perron, Bill Freeman, Yen-Chen Lin, Brandon Swenson, Jean Piou

    Abstract: The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as… ▽ More

    Submitted 31 May, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

  30. arXiv:2204.07156  [pdf, other

    cs.CV cs.LG

    Any-resolution Training for High-resolution Image Synthesis

    Authors: Lucy Chai, Michael Gharbi, Eli Shechtman, Phillip Isola, Richard Zhang

    Abstract: Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. To take advantage of varied-size data, we introd… ▽ More

    Submitted 4 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: ECCV 2022 camera ready version; project page https://chail.github.io/anyres-gan/

  31. arXiv:2203.17274  [pdf, other

    cs.CV

    Exploring Visual Prompts for Adapting Large-Scale Models

    Authors: Hyo** Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola

    Abstract: We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the recent approach from prompt tuning and adversarial reprogramming, we learn a single image perturbation such that a frozen model prompted with this perturbation performs a new task. Through comprehensive experiments, we demonstrate that visual prompting is particularly effective for CLIP and robust… ▽ More

    Submitted 3 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 16 pages, 10 figures

  32. arXiv:2203.12691  [pdf, other

    cs.CV cs.GR

    Learning to generate line drawings that convey geometry and semantics

    Authors: Caroline Chan, Fredo Durand, Phillip Isola

    Abstract: This paper presents an unpaired method for creating line drawings from photographs. Current methods often rely on high quality paired datasets to generate line drawings. However, these datasets often have limitations due to the subjects of the drawings belonging to a specific domain, or in the amount of data collected. Although recent work in unsupervised image-to-image translation has shown much… ▽ More

    Submitted 28 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Corrected and added references

  33. arXiv:2203.01913  [pdf, other

    cs.RO cs.CV

    NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

    Authors: Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-Yi Lin, Alberto Rodriguez, Phillip Isola

    Abstract: Thin, reflective objects such as forks and whisks are common in our daily lives, but they are particularly challenging for robot perception because it is hard to reconstruct them using commodity RGB-D cameras or multi-view stereo techniques. While traditional pipelines struggle with objects like these, Neural Radiance Fields (NeRFs) have recently been shown to be remarkably effective for performin… ▽ More

    Submitted 27 April, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: ICRA 2022, Website: https://yenchenlin.me/nerf-supervision/

  34. arXiv:2111.06934  [pdf, other

    cs.CV cs.LG

    Contrastive Feature Loss for Image Prediction

    Authors: Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang

    Abstract: Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result. Yet, this basic functionality remains an open problem. A popular line of approaches uses the L1 (mean absolute error) loss, either in the pixel or the feature space of pretrained deep networks. However, we observe that these losses tend to produce overly blurry and grey images, and o… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Appeared in Advances in Image Manipulation Workshop at ICCV 2021. GitHub: https://github.com/alexandonian/contrastive-feature-loss

  35. arXiv:2110.15349  [pdf, other

    cs.LG cs.AI cs.CL cs.MA

    Learning to Ground Multi-Agent Communication with Autoencoders

    Authors: Toru Lin, Minyoung Huh, Chris Stauffer, Ser-Nam Lim, Phillip Isola

    Abstract: Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned represent… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: Project page, code, and videos can be found at https://toruowo.github.io/marl-ae-comm/

  36. arXiv:2110.07594  [pdf, other

    cs.LG cs.AI cs.MA

    The Neural MMO Platform for Massively Multiagent Research

    Authors: Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

    Abstract: Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems. Existing environments feature subsets of these properties, but Neural MMO is the first to combine them all. We present Neural MMO as free and open source software with active support, ongoing development, documentation, and additional t… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  37. arXiv:2110.06912  [pdf, other

    cs.RO cs.AI cs.LG

    OPEn: An Open-ended Physics Environment for Learning Without a Task

    Authors: Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

    Abstract: Humans have mental models that allow them to plan, experiment, and reason in the physical world. How should an intelligent agent go about learning such models? In this paper, we will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. To this end, we build a benchmark Open-ended Physics ENviron… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: IROS 2021. Project page: http://open.csail.mit.edu/

  38. arXiv:2107.07506  [pdf, other

    cs.LG cs.NE

    Adaptable Agent Populations via a Generative Model of Policies

    Authors: Kenneth Derek, Phillip Isola

    Abstract: In the natural world, life has found innumerable ways to survive and often thrive. Between and even within species, each individual is in some manner unique, and this diversity lends adaptability and robustness to life. In this work, we aim to learn a space of diverse and high-reward policies on any given environment. To this end, we introduce a generative model of policies, which maps a low-dimen… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: Website at https://kennyderek.github.io/adap/

  39. arXiv:2107.00646  [pdf, other

    cs.RO cs.CV

    Learning to See before Learning to Act: Visual Pre-training for Manipulation

    Authors: Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

    Abstract: Does having visual priors (e.g. the ability to detect objects) facilitate learning to perform vision-based manipulation (e.g. picking up objects)? We study this problem under the framework of transfer learning, where the model is first trained on a passive vision task, and adapted to perform an active manipulation task. We find that pre-training on vision tasks significantly improves generalizatio… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted to ICRA 2020. Porject page: http://yenchenlin.me/vision2action/

  40. arXiv:2106.05963  [pdf, other

    cs.CV cs.AI

    Learning to See by Looking at Noise

    Authors: Manel Baradad, Jonas Wulff, Tongzhou Wang, Phillip Isola, Antonio Torralba

    Abstract: Current vision systems are trained on huge datasets, and these datasets come with costs: curation is expensive, they inherit human biases, and there are concerns over privacy and usage rights. To counter these costs, interest has surged in learning from cheaper data sources, such as unlabeled images. In this paper we go a step further and ask if we can do away with real image datasets entirely, in… ▽ More

    Submitted 28 April, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

  41. arXiv:2106.05258  [pdf, other

    cs.CV

    Generative Models as a Data Source for Multiview Representation Learning

    Authors: Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip Isola

    Abstract: Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from dat… ▽ More

    Submitted 15 March, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  42. arXiv:2105.01060  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Curious Representation Learning for Embodied Intelligence

    Authors: Yilun Du, Chuang Gan, Phillip Isola

    Abstract: Self-supervised representation learning has achieved remarkable success in recent years. By subverting the need for supervised labels, such approaches are able to utilize the numerous unlabeled images that exist on the Internet and in photographic datasets. Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn not only from datasets but also lea… ▽ More

    Submitted 30 August, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To apear at ICCV 2021. Code is available at https://yilundu.github.io/crl

  43. arXiv:2104.14551  [pdf, other

    cs.CV cs.LG

    Ensembling with Deep Generative Views

    Authors: Lucy Chai, Jun-Yan Zhu, Eli Shechtman, Phillip Isola, Richard Zhang

    Abstract: Recent generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose, simply by learning from unlabeled image collections. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. Using a pretrained generator, we first find the latent code corresponding… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 camera ready version; code available at https://github.com/chail/gan-ensembling

  44. arXiv:2104.13369  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Explaining in Style: Training a GAN to explain a classifier in StyleSpace

    Authors: Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri

    Abstract: Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is t… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021. Project page: https://explaining-in-style.github.io/, Code: https://github.com/google/explaining-in-style

  45. arXiv:2103.10427  [pdf, other

    cs.LG cs.CV

    The Low-Rank Simplicity Bias in Deep Networks

    Authors: Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola

    Abstract: Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutio… ▽ More

    Submitted 23 March, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

  46. arXiv:2103.10426  [pdf, other

    cs.CV cs.LG

    Using latent space regression to analyze and leverage compositionality in GANs

    Authors: Lucy Chai, Jonas Wulff, Phillip Isola

    Abstract: In recent years, Generative Adversarial Networks have become ubiquitous in both research and public perception, but how GANs convert an unstructured latent code to a high quality output is still an open question. In this work, we investigate regression into the latent space as a probe to understand the compositional properties of GANs. We find that combining the regressor and a pretrained generato… ▽ More

    Submitted 3 June, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Update to ICLR 2021 camera ready version

  47. arXiv:2012.05877  [pdf, other

    cs.CV cs.RO

    INeRF: Inverting Neural Radiance Fields for Pose Estimation

    Authors: Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, Tsung-Yi Lin

    Abstract: We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis - synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation - g… ▽ More

    Submitted 10 August, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: IROS 2021, Website: http://yenchenlin.me/inerf/

  48. arXiv:2008.10588  [pdf, other

    cs.CV

    What makes fake images detectable? Understanding properties that generalize

    Authors: Lucy Chai, David Bau, Ser-Nam Lim, Phillip Isola

    Abstract: The quality of image generation and manipulation is reaching impressive levels, making it increasingly difficult for a human to distinguish between what is real and what is fake. However, deep networks can still pick up on the subtle artifacts in these doctored images. We seek to understand what properties of fake images make them detectable and identify what generalizes across different model arc… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

  49. arXiv:2007.13729  [pdf, other

    cs.CV cs.AI cs.LG cs.RO cs.SD eess.AS

    Noisy Agents: Self-supervised Exploration by Predicting Auditory Events

    Authors: Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum

    Abstract: Humans integrate multiple sensory modalities (e.g. visual and audio) to build a causal understanding of the physical world. In this work, we propose a novel type of intrinsic motivation for Reinforcement Learning (RL) that encourages the agent to understand the causal effect of its actions through auditory event prediction. First, we allow the agent to collect a small amount of acoustic data and u… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Project page: http://noisy-agent.csail.mit.edu

  50. arXiv:2005.10243  [pdf, other

    cs.CV cs.LG

    What Makes for Good Views for Contrastive Learning?

    Authors: Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola

    Abstract: Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In this paper, we use theoretical and empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutu… ▽ More

    Submitted 18 December, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: NeurIPS 2020. Project page: https://hobbitlong.github.io/InfoMin/