Skip to main content

Showing 251–284 of 284 results for author: Darrell, T

.
  1. arXiv:1510.02949  [pdf, other

    cs.CV

    Spatial Semantic Regularisation for Large Scale Object Detection

    Authors: Damian Mrowca, Marcus Rohrbach, Judy Hoffman, Ronghang Hu, Kate Saenko, Trevor Darrell

    Abstract: Large scale object detection with thousands of classes introduces the problem of many contradicting false positive detections, which have to be suppressed. Class-independent non-maximum suppression has traditionally been used for this step, but it does not scale well as the number of classes grows. Traditional non-maximum suppression does not consider label- and instance-level relationships nor do… ▽ More

    Submitted 10 October, 2015; originally announced October 2015.

    Comments: accepted at ICCV 2015

  2. arXiv:1510.02192  [pdf, other

    cs.CV

    Simultaneous Deep Transfer Across Domains and Tasks

    Authors: Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko

    Abstract: Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias. Fine-tuning deep models in a new domain can require a significant amount of labeled data, which for many applications is simply not available. We propose a new CNN architecture to exploit unlabeled and sparsely labeled target domain data. Our approach simulta… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

  3. arXiv:1509.06113  [pdf, other

    cs.LG cs.CV cs.RO

    Deep Spatial Autoencoders for Visuomotor Learning

    Authors: Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel

    Abstract: Reinforcement learning provides a powerful and flexible framework for automated acquisition of robotic motion skills. However, applying reinforcement learning requires a sufficiently detailed representation of the state, including the configuration of task-relevant objects. We present an approach that automates state-space construction by learning a state representation directly from camera images… ▽ More

    Submitted 1 March, 2016; v1 submitted 21 September, 2015; originally announced September 2015.

    Comments: Published in the International Conference on Robotics and Automation (ICRA)

  4. arXiv:1506.03648  [pdf, other

    cs.CV cs.LG

    Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

    Authors: Deepak Pathak, Philipp Krähenbühl, Trevor Darrell

    Abstract: We present an approach to learn a dense pixel-wise labeling from image-level tags. Each image-level tag imposes constraints on the output labeling of a Convolutional Neural Network (CNN) classifier. We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i.e. predicted label distribution) of a CNN. Our loss for… ▽ More

    Submitted 18 October, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: 12 pages, ICCV 2015

  5. arXiv:1505.00487  [pdf, other

    cs.CV

    Sequence to Sequence -- Video to Text

    Authors: Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

    Abstract: Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural network… ▽ More

    Submitted 19 October, 2015; v1 submitted 3 May, 2015; originally announced May 2015.

    Comments: ICCV 2015 camera-ready. Includes code, project page and LSMDC challenge results

  6. arXiv:1504.00702  [pdf, other

    cs.LG cs.CV cs.RO

    End-to-End Training of Deep Visuomotor Policies

    Authors: Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel

    Abstract: Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training e… ▽ More

    Submitted 18 April, 2016; v1 submitted 2 April, 2015; originally announced April 2015.

    Comments: updating with revisions for JMLR final version

  7. arXiv:1412.7155  [pdf, other

    cs.CV cs.LG cs.NE

    Learning Compact Convolutional Neural Networks with Nested Dropout

    Authors: Chelsea Finn, Lisa Anne Hendricks, Trevor Darrell

    Abstract: Recently, nested dropout was proposed as a method for ordering representation units in autoencoders by their information content, without diminishing reconstruction cost. However, it has only been applied to training fully-connected autoencoders in an unsupervised setting. We explore the impact of nested dropout on the convolutional layers in a CNN trained by backpropagation, investigating whether… ▽ More

    Submitted 10 April, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: 4 pages, 2 figures. Accepted as a workshop contribution at ICLR 2015

  8. arXiv:1412.7144  [pdf, other

    cs.CV cs.LG cs.NE

    Fully Convolutional Multi-Class Multiple Instance Learning

    Authors: Deepak Pathak, Evan Shelhamer, Jonathan Long, Trevor Darrell

    Abstract: Multiple instance learning (MIL) can reduce the need for costly annotation in tasks such as semantic segmentation by weakening the required degree of supervision. We propose a novel MIL formulation of multi-class semantic segmentation learning by a fully convolutional network. In this setting, we seek to learn a semantic segmentation model from just weak image-level labels. The model is trained en… ▽ More

    Submitted 15 April, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: in ICLR 2015

  9. arXiv:1412.3474  [pdf, other

    cs.CV

    Deep Domain Confusion: Maximizing for Domain Invariance

    Authors: Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell

    Abstract: Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark. Fine-tuning deep models in a new domain can require a significant amount of data, which for many applications is simply not available. We propose a new CNN architecture which introduces an adaptation layer and an additional domain conf… ▽ More

    Submitted 10 December, 2014; originally announced December 2014.

  10. arXiv:1412.1135  [pdf, other

    cs.CV

    Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

    Authors: Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko

    Abstract: We develop methods for detector learning which exploit joint training over both weak and strong labels and which transfer learned perceptual representations from strongly-labeled auxiliary tasks. Previous methods for weak-label learning often learn detector models independently using latent variable optimization, but fail to share deep representation knowledge across classes and usually require st… ▽ More

    Submitted 2 December, 2014; originally announced December 2014.

    Journal ref: Computer Vision and Pattern Recognition (CVPR) 2015

  11. arXiv:1411.4389  [pdf, other

    cs.CV

    Long-term Recurrent Convolutional Networks for Visual Recognition and Description

    Authors: Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

    Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of thes… ▽ More

    Submitted 31 May, 2016; v1 submitted 17 November, 2014; originally announced November 2014.

    Comments: Originally presented at CVPR 2015 (oral). Updated version (accepted as a TPAMI journal article) includes additional results

  12. arXiv:1411.4038  [pdf, other

    cs.CV

    Fully Convolutional Networks for Semantic Segmentation

    Authors: Jonathan Long, Evan Shelhamer, Trevor Darrell

    Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning… ▽ More

    Submitted 8 March, 2015; v1 submitted 14 November, 2014; originally announced November 2014.

    Comments: to appear in CVPR (2015)

  13. arXiv:1411.1091  [pdf, other

    cs.CV cs.LG cs.NE

    Do Convnets Learn Correspondence?

    Authors: Jonathan Long, Ning Zhang, Trevor Darrell

    Abstract: Convolutional neural nets (convnets) trained from massive labeled datasets have substantially improved the state-of-the-art in image classification and object detection. However, visual understanding requires establishing correspondence on a finer level than object category. Given their large pooling regions and training from whole-image labels, it is not clear that convnets derive their success f… ▽ More

    Submitted 4 November, 2014; originally announced November 2014.

  14. arXiv:1410.8586  [pdf, other

    cs.CV cs.LG cs.MM cs.NE

    DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks

    Authors: Tao Chen, Damian Borth, Trevor Darrell, Shih-Fu Chang

    Abstract: This paper introduces a visual sentiment concept classification method based on deep convolutional neural networks (CNNs). The visual sentiment concepts are adjective noun pairs (ANPs) automatically discovered from the tags of web photos, and can be utilized as effective statistical cues for detecting emotions depicted in the images. Nearly one million Flickr images tagged with these ANPs are down… ▽ More

    Submitted 30 October, 2014; originally announced October 2014.

    Comments: 7 pages, 4 figures

    ACM Class: H.3.3

  15. arXiv:1409.5403  [pdf, other

    cs.CV

    Deformable Part Models are Convolutional Neural Networks

    Authors: Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

    Abstract: Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a novel synthesis of the two ideas. Our constructio… ▽ More

    Submitted 1 October, 2014; v1 submitted 18 September, 2014; originally announced September 2014.

  16. arXiv:1408.5093  [pdf, other

    cs.CV cs.LG cs.NE

    Caffe: Convolutional Architecture for Fast Feature Embedding

    Authors: Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

    Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits i… ▽ More

    Submitted 20 June, 2014; originally announced August 2014.

    Comments: Tech report for the Caffe software at http://github.com/BVLC/Caffe/

  17. arXiv:1407.5035  [pdf, other

    cs.CV

    LSDA: Large Scale Detection Through Adaptation

    Authors: Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

    Abstract: A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories. Recently, deep convolutional neural networks (CNNs) have emerged as clear winners on object classification benchmarks, in part due to training with 1.2M+ labeled classification images. Unfortunately, only a small fraction of those labels are available for the detection task.… ▽ More

    Submitted 31 October, 2014; v1 submitted 18 July, 2014; originally announced July 2014.

    Journal ref: Neural Information Processing Systems (NIPS) 2014

  18. arXiv:1407.3867  [pdf, other

    cs.CV

    Part-based R-CNNs for Fine-grained Category Detection

    Authors: Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

    Abstract: Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts. Methods for pose-normalized representations have been proposed, but generally presume bounding box annotations at test time due to the difficulty of object detection. We propose a model for fine-grained categorization that overcomes thes… ▽ More

    Submitted 14 July, 2014; originally announced July 2014.

    Comments: 16 pages. To appear at European Conference on Computer Vision (ECCV), 2014

  19. arXiv:1406.6507  [pdf, other

    cs.CV cs.LG

    Weakly-supervised Discovery of Visual Pattern Configurations

    Authors: Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell

    Abstract: The increasing prominence of weakly labeled data nurtures a growing demand for object detection methods that can cope with minimal supervision. We propose an approach that automatically identifies discriminative configurations of visual patterns that are characteristic of a given object class. We formulate the problem as a constrained submodular optimization problem and demonstrate the benefits of… ▽ More

    Submitted 25 June, 2014; originally announced June 2014.

  20. arXiv:1405.7102  [pdf, other

    cs.MM cs.CV

    Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition

    Authors: Tim Althoff, Hyun Oh Song, Trevor Darrell

    Abstract: While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition. Recognition or retrieval of events and activities can be improved if specific discriminative… ▽ More

    Submitted 14 June, 2014; v1 submitted 27 May, 2014; originally announced May 2014.

    Comments: ACM Multimedia 2012

  21. arXiv:1404.1869  [pdf, other

    cs.CV

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    Authors: Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, Kurt Keutzer

    Abstract: Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow. However, for some CNN classifier topologies… ▽ More

    Submitted 7 April, 2014; originally announced April 2014.

  22. arXiv:1403.1024  [pdf, other

    cs.CV cs.LG

    On learning to localize objects with minimal supervision

    Authors: Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

    Abstract: Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain. In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. Our approach combines a discriminative submodular cover problem for automatically discovering a se… ▽ More

    Submitted 15 May, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

  23. arXiv:1312.6204  [pdf, other

    cs.CV cs.LG cs.NE

    One-Shot Adaptation of Supervised Deep Convolutional Models

    Authors: Judy Hoffman, Eric Tzeng, Jeff Donahue, Yangqing Jia, Kate Saenko, Trevor Darrell

    Abstract: Dataset bias remains a significant barrier towards solving real world computer vision tasks. Though deep convolutional networks have proven to be a competitive approach for image classification, a question remains: have these models have solved the dataset bias problem? In general, training or fine-tuning a state-of-the-art deep model on a new domain requires a significant amount of data, which fo… ▽ More

    Submitted 17 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Journal ref: ICLR Workshop 2014

  24. Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images

    Authors: Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, Kate Saenko

    Abstract: To produce images that are suitable for display, tone-map** is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range. This introduces non-linear distortion that must be undone, through a radiometric calibration process, before computer vision systems can analyze such photographs radiometrically. This paper considers the inherent uncertainty… ▽ More

    Submitted 9 April, 2014; v1 submitted 27 November, 2013; originally announced November 2013.

    Journal ref: IEEE Trans. PAMI 36 (2014) 2185-2198

  25. arXiv:1311.5591  [pdf, other

    cs.CV

    PANDA: Pose Aligned Networks for Deep Attribute Modeling

    Authors: Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

    Abstract: We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion. Convolutional Neural Nets (CNN) have been shown to perform very well on large scale object recognition problems. In the context of attribute classification, however, the signal is o… ▽ More

    Submitted 5 May, 2014; v1 submitted 21 November, 2013; originally announced November 2013.

    Comments: 8 pages

  26. Recognizing Image Style

    Authors: Sergey Karayev, Matthew Trentacoste, Helen Han, Aseem Agarwala, Trevor Darrell, Aaron Hertzmann, Holger Winnemoeller

    Abstract: The style of an image plays a significant role in how it is viewed, but style has received little attention in computer vision research. We describe an approach to predicting style of images, and perform a thorough evaluation of different image features for these tasks. We find that features learned in a multi-layer network generally perform best -- even when trained with object class (not style)… ▽ More

    Submitted 23 July, 2014; v1 submitted 14 November, 2013; originally announced November 2013.

    Journal ref: Proc. British Machine Vision Conference (BMVC) 2014

  27. arXiv:1311.2524  [pdf, other

    cs.CV

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Authors: Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

    Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the p… ▽ More

    Submitted 22 October, 2014; v1 submitted 11 November, 2013; originally announced November 2013.

    Comments: Extended version of our CVPR 2014 paper; latest update (v5) includes results using deeper networks (see Appendix G. Changelog)

  28. arXiv:1310.1531  [pdf, other

    cs.CV

    DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

    Authors: Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell

    Abstract: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep archi… ▽ More

    Submitted 5 October, 2013; originally announced October 2013.

  29. arXiv:1308.4200  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations

    Authors: Erik Rodner, Judy Hoffman, Jeff Donahue, Trevor Darrell, Kate Saenko

    Abstract: Images seen during test time are often not from the same distribution as images used for learning. This problem, known as domain shift, occurs when training classifiers from object-centric internet image databases and trying to apply them directly to scene understanding tasks. The consequence is often severe performance degradation and is one of the major barriers for the application of classifier… ▽ More

    Submitted 19 August, 2013; originally announced August 2013.

  30. arXiv:1302.5056  [pdf, other

    cs.CV cs.LG

    Pooling-Invariant Image Feature Learning

    Authors: Yangqing Jia, Oriol Vinyals, Trevor Darrell

    Abstract: Unsupervised dictionary learning has been a key component in state-of-the-art computer vision recognition architectures. While highly effective methods exist for patch-based dictionary learning, these methods may learn redundant features after the pooling stage in a given early vision architecture. In this paper, we offer a novel dictionary learning scheme to efficiently take into account the inva… ▽ More

    Submitted 15 January, 2013; originally announced February 2013.

  31. arXiv:1301.5348  [pdf, other

    cs.LG cs.CV

    Why Size Matters: Feature Coding as Nystrom Sampling

    Authors: Oriol Vinyals, Yangqing Jia, Trevor Darrell

    Abstract: Recently, the computer vision and machine learning community has been in favor of feature extraction pipelines that rely on a coding step followed by a linear classifier, due to their overall simplicity, well understood properties of linear classifiers, and their computational efficiency. In this paper we propose a novel view of this pipeline based on kernel methods and Nystrom sampling. In partic… ▽ More

    Submitted 15 April, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

  32. arXiv:1301.3224  [pdf, other

    cs.LG

    Efficient Learning of Domain-invariant Image Representations

    Authors: Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko

    Abstract: We present an algorithm that learns representations which explicitly compensate for domain mismatch and which can be efficiently realized as linear classifiers. Specifically, we form a linear transformation that maps features from the target (test) domain to the source (training) domain as part of training the classifier. We optimize both the transformation and classifier parameters jointly, and i… ▽ More

    Submitted 8 April, 2013; v1 submitted 14 January, 2013; originally announced January 2013.

    Journal ref: ICLR 2013

  33. arXiv:1210.4920  [pdf

    cs.LG cs.IR stat.ML

    Factorized Multi-Modal Topic Model

    Authors: Seppo Virtanen, Yangqing Jia, Arto Klami, Trevor Darrell

    Abstract: Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, mu… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-843-851

  34. arXiv:1206.3242  [pdf

    cs.LG stat.ML

    Multi-View Learning in the Presence of View Disagreement

    Authors: C. Christoudias, Raquel Urtasun, Trevor Darrell

    Abstract: Traditional multi-view learning approaches suffer in the presence of view disagreement,i.e., when samples in each view do not belong to the same class due to view corruption, occlusion or other noise processes. In this paper we present a multi-view learning approach that uses a conditional entropy criterion to detect view disagreement. Once detected, samples with view disagreement are filtered and… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-88-96