Skip to main content

Showing 151–200 of 284 results for author: Darrell, T

.
  1. arXiv:1911.06258  [pdf, other

    cs.CV cs.CL

    Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

    Authors: Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

    Abstract: Many visual scenes contain text that carries crucial information, and it is thus essential to understand text in images for downstream reasoning tasks. For example, a deep water label on a warning sign warns people about the danger in the scene. Recent work has explored the TextVQA task that requires reading and understanding text in images to answer a question. However, existing approaches for Te… ▽ More

    Submitted 24 March, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

    Comments: CVPR 2020

  2. arXiv:1910.14033  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

    Authors: Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

    Abstract: Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially one… ▽ More

    Submitted 1 August, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

    Comments: In NeurIPS 2019

  3. arXiv:1910.09191  [pdf, other

    cs.LG cs.AI stat.ML

    Regularization Matters in Policy Optimization

    Authors: Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

    Abstract: Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment, and becaus… ▽ More

    Submitted 28 November, 2021; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Published at ICLR 2021; please cite this paper's ICLR 2021 version at https://github.com/xuanlinli17/iclr2021_rlreg#citation or the arXiv version from "Export Bibtex Citation" on the right, instead of the "2019 OpenReview" version in Google Scholar. Thanks!

  4. arXiv:1910.09185  [pdf, other

    cs.CV cs.LG

    Exploring Simple and Transferable Recognition-Aware Image Processing

    Authors: Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell

    Abstract: Recent progress in image recognition has stimulated the deployment of vision systems at an unprecedented scale. As a result, visual data are now often consumed not only by humans but also by machines. Existing image processing methods only optimize for better human perception, yet the resulting images may not be accurately recognized by machines. This can be undesirable, e.g., the images can be im… ▽ More

    Submitted 10 September, 2022; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  5. arXiv:1910.08143  [pdf, other

    cs.LG cs.AI stat.ML

    Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

    Authors: Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell

    Abstract: It is a long-standing challenge to enable an intelligent agent to learn in one environment and generalize to an unseen environment without further data collection and finetuning. In this paper, we consider a zero shot generalization problem setup that complies with biological intelligent agents' learning and generalization processes. The agent is first presented with previous experiences in the tr… ▽ More

    Submitted 15 March, 2021; v1 submitted 17 October, 2019; originally announced October 2019.

  6. arXiv:1909.11825  [pdf, other

    cs.LG cs.CV stat.ML

    Unsupervised Domain Adaptation through Self-Supervision

    Authors: Yu Sun, Eric Tzeng, Trevor Darrell, Alexei A. Efros

    Abstract: This paper addresses unsupervised domain adaptation, the setting where labeled training data is available on a source domain, but the goal is to have good performance on a target domain with only unlabeled data. Like much of previous work, we seek to align the learned representations of the source and target domains while preserving discriminability. The way we accomplish alignment is by learning… ▽ More

    Submitted 29 September, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

  7. arXiv:1908.03182  [pdf, other

    cs.CV cs.LG

    Dynamic Scale Inference by Entropy Minimization

    Authors: Dequan Wang, Evan Shelhamer, Bruno Olshausen, Trevor Darrell

    Abstract: Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field. Rather than enumerate variations across filter channels or pyramid levels, dynamic models locally predict scale and adapt receptive fields accordingly. The degree of variation and diversity of inputs makes this a difficult task. Existing meth… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

  8. arXiv:1906.04854  [pdf, other

    cs.CV cs.LG

    Task-Aware Feature Generation for Zero-Shot Compositional Learning

    Authors: Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

    Abstract: Visual concepts (e.g., red apple, big elephant) are often semantically compositional and each element of the compositions can be reused to construct novel concepts (e.g., red elephant). Compositional feature synthesis, which generates image feature distributions exploiting the semantic compositionality, is a promising approach to sample-efficient model generalization. In this work, we propose a ta… ▽ More

    Submitted 23 March, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: 17 pages, 9 figures; substantial content updates with additional experiments

  9. arXiv:1906.02425  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Uncertainty-guided Continual Learning with Bayesian Neural Networks

    Authors: Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

    Abstract: Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters' \textit{importance}. In contrast, we propose Uncertai… ▽ More

    Submitted 19 February, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted at ICLR 2020

  10. arXiv:1906.00347  [pdf, other

    cs.CL

    Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

    Authors: Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko

    Abstract: Vision-and-Language Navigation (VLN) requires grounding instructions, such as "turn right and stop at the door", to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route. We investigate where the natur… ▽ More

    Submitted 9 June, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  11. arXiv:1905.06937  [pdf, other

    cs.CV cs.RO

    Monocular Plan View Networks for Autonomous Driving

    Authors: Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

    Abstract: Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth. We propose a simple transformation of observations into a bird's eye view, also known as plan view, for end-to-end control. We detect vehicles and pedestrians in the first person view and project them into an overhead plan view. This representation provid… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: 8 pages, 9 figures

  12. arXiv:1905.04405  [pdf, other

    cs.CV

    Language-Conditioned Graph Networks for Relational Reasoning

    Authors: Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko

    Abstract: Solving grounded language tasks often requires reasoning about relationships between objects in the context of a given task. For example, to answer the question "What color is the mug on the plate?" we must check the color of the specific mug that satisfies the "on" relationship with respect to the plate. Recent work has proposed various methods capable of complex relational reasoning. However, mo… ▽ More

    Submitted 16 August, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

  13. arXiv:1905.03706  [pdf, other

    cs.CV cs.AI

    Accurate Visual Localization for Automotive Applications

    Authors: Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, Trevor Darrell

    Abstract: Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localization errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in sc… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  14. arXiv:1904.11487  [pdf, other

    cs.CV

    Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields

    Authors: Evan Shelhamer, Dequan Wang, Trevor Darrell

    Abstract: The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured p… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

  15. arXiv:1904.06487  [pdf, other

    cs.CV

    Semi-supervised Domain Adaptation via Minimax Entropy

    Authors: Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

    Abstract: Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision. However, we show that these techniques perform poorly when even a few labeled examples are available in the target. To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversaria… ▽ More

    Submitted 14 September, 2019; v1 submitted 13 April, 2019; originally announced April 2019.

    Comments: accepted to ICCV2019. ICCV paper version

  16. arXiv:1904.05967  [pdf, other

    cs.CV cs.AI

    TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

    Authors: Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

    Abstract: Learning good feature embeddings for images often requires substantial training data. As a consequence, in settings where training data is limited (e.g., few-shot and zero-shot learning), we are typically forced to use a generic feature embedding across various tasks. Ideally, we want to construct feature embeddings that are tuned for the given task. In this work, we propose Task-Aware Feature Emb… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: Accepted at CVPR 2019

  17. arXiv:1904.00370  [pdf, other

    cs.LG cs.CV stat.ML

    Variational Adversarial Active Learning

    Authors: Samarth Sinha, Sayna Ebrahimi, Trevor Darrell

    Abstract: Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the… ▽ More

    Submitted 28 October, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

    Comments: First two authors contributed equally, listed alphabetically. Accepted as Oral at ICCV 2019

  18. arXiv:1902.05546  [pdf, other

    cs.LG cs.AI cs.CV cs.NE cs.RO stat.ML

    Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

    Authors: Deepak Pathak, Chris Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros

    Abstract: Contemporary sensorimotor learning approaches typically start with an existing complex agent (e.g., a robotic arm), which they learn to control. In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies. Each primitive agent… ▽ More

    Submitted 21 November, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: NeurIPS 2019 (Spotlight). Videos at https://pathak22.github.io/modular-assemblies/

  19. arXiv:1901.02527  [pdf, other

    cs.CV cs.AI

    Robust Change Captioning

    Authors: Dong Huk Park, Trevor Darrell, Anna Rohrbach

    Abstract: Describing what has changed in a scene can be useful to a user, but only if generated text focuses on what is semantically relevant. It is thus important to distinguish distractors (e.g. a viewpoint change) from relevant changes (e.g. an object has moved). We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning. Our model learns to distinguish distractors from se… ▽ More

    Submitted 16 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

  20. arXiv:1812.10000  [pdf, other

    cs.CV

    Similarity R-C3D for Few-shot Temporal Activity Detection

    Authors: Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

    Abstract: Many activities of interest are rare events, with only a few labeled examples available. Therefore models for temporal activity detection which are able to learn from a few examples are desirable. In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrim… ▽ More

    Submitted 24 December, 2018; originally announced December 2018.

  21. arXiv:1812.06264  [pdf, other

    cs.CV

    Hierarchical Discrete Distribution Decomposition for Match Density Estimation

    Authors: Zhichao Yin, Trevor Darrell, Fisher Yu

    Abstract: Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications. However, the computation of the match density for each pixel may be prohibitively expensive due to the large number of candidates. In this paper, we propose Hierarchical Discrete Distribution Decomposition (HD^3), a f… ▽ More

    Submitted 16 April, 2019; v1 submitted 15 December, 2018; originally announced December 2018.

    Comments: To appear at CVPR 2019

  22. arXiv:1812.05634  [pdf, other

    cs.CV cs.CL

    Adversarial Inference for Multi-Sentence Video Description

    Authors: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

    Abstract: While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video. Recently, reinforcement and adversarial learning… ▽ More

    Submitted 15 April, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

    Comments: Accepted to Computer Vision and Pattern Recognition (CVPR) 2019

  23. arXiv:1812.01866  [pdf, other

    cs.CV

    Few-shot Object Detection via Feature Reweighting

    Authors: Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

    Abstract: Conventional training of a deep CNN based object detector demands a large number of bounding box annotations, which may be unavailable for rare categories. In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. Our proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature lear… ▽ More

    Submitted 21 October, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: ICCV 2019

  24. arXiv:1812.01784  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

    Authors: Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

    Abstract: Many approaches in generalized zero-shot learning rely on cross-modal map** between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a map** associated with class embeddings. In this work,… ▽ More

    Submitted 5 April, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: Accepted at CVPR 2019

  25. arXiv:1812.01233  [pdf, other

    cs.CV

    Spatio-Temporal Action Graph Networks

    Authors: Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

    Abstract: Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance. Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with globa… ▽ More

    Submitted 29 September, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

  26. arXiv:1812.00929  [pdf, other

    cs.CV

    SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection

    Authors: Eric Tzeng, Kaylee Burns, Kate Saenko, Trevor Darrell

    Abstract: Domain adaptation of visual detectors is a critical challenge, yet existing methods have overlooked pixel appearance transformations, focusing instead on bootstrap** and/or domain confusion losses. We propose a Semantic Pixel-Level Adaptation Transform (SPLAT) approach to detector adaptation that efficiently generates cross-domain image pairs. Our model uses aligned-pair and/or pseudo-label loss… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

  27. arXiv:1812.00452  [pdf, other

    cs.CV

    Disentangling Propagation and Generation for Video Prediction

    Authors: Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell

    Abstract: A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated. Prior approaches to video prediction typically learn either to warp or to hallucinate future pixels, but not both. In this paper, we describe a computational model for high-fidelity video prediction which disentangles mo… ▽ More

    Submitted 5 August, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

    Comments: ICCV 2019

  28. arXiv:1811.10742  [pdf, other

    cs.CV

    Joint Monocular 3D Vehicle Detection and Tracking

    Authors: Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

    Abstract: Vehicle 3D extents and trajectories are critical cues for predicting the future location of vehicles and planning future agent ego-motion based on those predictions. In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bound… ▽ More

    Submitted 12 September, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: 18 pages, 12 figures. Add supplementary material. Accepted by ICCV 2019. Website: https://eborboihuc.github.io/Mono-3DT Code: https://github.com/ucbdrive/3d-vehicle-tracking Video: https://youtu.be/EJAtOCKI31g

  29. arXiv:1811.05432  [pdf, other

    cs.AI cs.CV cs.RO

    Deep Object-Centric Policies for Autonomous Driving

    Authors: Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell

    Abstract: While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways. For robotics tasks, such as autonomous driving, models that explicitly represent objects may be more robust to new scenes and provide intuitive visualizations. We describe a taxonomy of "object-centric" models which leverage both object instances and en… ▽ More

    Submitted 1 March, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: Accepted at ICRA 2019

  30. arXiv:1811.03555  [pdf, other

    cs.AI

    Modular Architecture for StarCraft II with Deep Reinforcement Learning

    Authors: Dennis Lee, Haoran Tang, Jeffrey O Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel

    Abstract: We present a novel modular architecture for StarCraft II AI. The architecture splits responsibilities between multiple modules that each control one aspect of the game, such as build-order selection or tactics. A centralized scheduler reviews macros suggested by all modules and decides their order of execution. An updater keeps track of environment changes and instantiates macros into series of ex… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

    Comments: Accepted to The 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'18)

  31. arXiv:1810.06758  [pdf, other

    stat.ML cs.LG

    Discriminator Rejection Sampling

    Authors: Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena

    Abstract: We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be us… ▽ More

    Submitted 26 February, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  32. arXiv:1810.05270  [pdf, other

    cs.LG cs.CV stat.ML

    Rethinking the Value of Network Pruning

    Authors: Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell

    Abstract: Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surpris… ▽ More

    Submitted 5 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: ICLR 2019. Significant revisions from the previous version

  33. arXiv:1809.02156  [pdf, other

    cs.CL cs.CV

    Object Hallucination in Image Captioning

    Authors: Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

    Abstract: Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and asses… ▽ More

    Submitted 29 March, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: Rohrbach and Hendricks contributed equally; accepted to EMNLP 2018

  34. arXiv:1809.01337  [pdf, other

    cs.CV cs.CL

    Localizing Moments in Video with Temporal Language

    Authors: Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

    Abstract: Localizing moments in a longer video via natural language queries is a new, challenging task at the intersection of language and video understanding. Though moment localization with natural language is similar to other language and vision tasks like natural language object retrieval in images, moment localization offers an interesting opportunity to model temporal dependencies and reasoning in tex… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  35. arXiv:1808.04355  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Large-Scale Study of Curiosity-Driven Learning

    Authors: Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

    Abstract: Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for develo** reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper:… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: First three authors contributed equally and ordered alphabetically. Website at https://pathak22.github.io/large-scale-curiosity/

  36. arXiv:1807.11546  [pdf, other

    cs.CV

    Textual Explanations for Self-Driving Vehicles

    Authors: **kyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

    Abstract: Deep neural perception and control networks have become key components of self-driving vehicles. User acceptance is likely to benefit from easy-to-interpret textual explanations which allow end-users to understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rat… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV 2018

    Journal ref: European Conference on Computer Vision (ECCV), 2018

  37. arXiv:1807.09685  [pdf, other

    cs.CV

    Grounding Visual Explanations

    Authors: Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

    Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine g… ▽ More

    Submitted 2 August, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV 2018

    Journal ref: European Conference on Computer Vision (ECCV), 2018

  38. arXiv:1807.08556  [pdf, other

    cs.CV

    Explainable Neural Computation via Stack Neural Module Networks

    Authors: Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko

    Abstract: In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction. Existing models designed to produce interpretable traces of their decision-making process typically… ▽ More

    Submitted 6 March, 2019; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: ECCV 2018

  39. arXiv:1807.07560  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Compositional GAN: Learning Image-Conditional Binary Composition

    Authors: Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell

    Abstract: Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion,… ▽ More

    Submitted 28 March, 2019; v1 submitted 19 July, 2018; originally announced July 2018.

  40. arXiv:1807.03858  [pdf, other

    cs.LG cs.AI stat.ML

    Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

    Authors: Yu** Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma

    Abstract: Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited. This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretica… ▽ More

    Submitted 15 February, 2021; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: Added important notes that the conditions of Theorem 3.1 cannot simultaneously hold for most models

  41. arXiv:1807.00517  [pdf, other

    cs.CV

    Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)

    Authors: Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach

    Abstract: Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data. This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over reliance on the learned prior and image co… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: Burns and Hendricks contributed equally. 2018 ICML Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

  42. arXiv:1806.09809  [pdf, other

    cs.CV

    Generating Counterfactual Explanations with Natural Language

    Authors: Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

    Abstract: Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., "This is not a Scarlet Tanager because it doe… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden

  43. arXiv:1806.08354  [pdf, other

    cs.CV cs.AI cs.LG cs.RO stat.ML

    Learning Instance Segmentation by Interaction

    Authors: Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

    Abstract: We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions g… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: Website at https://pathak22.github.io/seg-by-interaction/

  44. arXiv:1806.07373  [pdf, other

    cs.CV cs.LG stat.ML

    Few-Shot Segmentation Propagation with Guided Networks

    Authors: Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alexei A. Efros, Sergey Levine

    Abstract: Learning-based methods for visual segmentation have made progress on particular types of segmentation tasks, but are limited by the necessary supervision, the narrow definitions of fixed tasks, and the lack of control during inference for correcting errors. To remedy the rigidity and annotation burden of standard approaches, we address the problem of few-shot segmentation: given few image and few… ▽ More

    Submitted 25 May, 2018; originally announced June 2018.

  45. arXiv:1806.02724  [pdf, other

    cs.CV cs.CL

    Speaker-Follower Models for Vision-and-Language Navigation

    Authors: Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell

    Abstract: Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify only a few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it… ▽ More

    Submitted 26 October, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: NIPS 2018

  46. arXiv:1806.01531  [pdf, other

    cs.CV

    Deep Mixture of Experts via Shallow Embedding

    Authors: Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

    Abstract: Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the networ… ▽ More

    Submitted 11 April, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

  47. arXiv:1805.04687  [pdf, other

    cs.CV

    BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

    Authors: Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell

    Abstract: Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving… ▽ More

    Submitted 8 April, 2020; v1 submitted 12 May, 2018; originally announced May 2018.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020

  48. arXiv:1804.08606  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Zero-Shot Visual Imitation

    Authors: Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

    Abstract: The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: Oral presentation at ICLR 2018. Website at https://pathak22.github.io/zeroshot-imitation/

  49. arXiv:1803.09797  [pdf, other

    cs.CV

    Women also Snowboard: Overcoming Bias in Captioning Models

    Authors: Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

    Abstract: Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data (e.g., if a word is present in 60% of training sentences, it might be predicted in 70% of sentences at test time). This can lead to incorrect captions i… ▽ More

    Submitted 13 March, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: 22 pages, 6 figures, Burns and Hendricks contributed equally

  50. arXiv:1802.08129  [pdf, other

    cs.AI cs.CL cs.CV

    Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: arXiv admin note: text overlap with arXiv:1612.04757