Skip to main content

Showing 51–73 of 73 results for author: Girshick, R

.
  1. arXiv:1511.06335  [pdf, other

    cs.LG cs.CV

    Unsupervised Deep Embedding for Clustering Analysis

    Authors: Junyuan Xie, Ross Girshick, Ali Farhadi

    Abstract: Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grou** algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks.… ▽ More

    Submitted 24 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: icml2016

  2. arXiv:1511.06068  [pdf, other

    cs.LG stat.ML

    Reducing Overfitting in Deep Networks by Decorrelating Representations

    Authors: Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra

    Abstract: One major challenge in training Deep Neural Networks is preventing overfitting. Many techniques such as data augmentation and novel regularizers such as Dropout have been proposed to prevent overfitting without requiring a massive amount of training data. In this work, we propose a new regularizer called DeCov which leads to significantly reduced overfitting (as indicated by the difference between… ▽ More

    Submitted 10 June, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: 12 pages, 5 figures, 5 tables, Accepted to ICLR 2016, (v4 adds acknowledgements)

  3. arXiv:1506.02640  [pdf, other

    cs.CV

    You Only Look Once: Unified, Real-Time Object Detection

    Authors: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

    Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detec… ▽ More

    Submitted 9 May, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

  4. arXiv:1506.01497  [pdf, other

    cs.CV

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

    Authors: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

    Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus… ▽ More

    Submitted 6 January, 2016; v1 submitted 4 June, 2015; originally announced June 2015.

    Comments: Extended tech report

  5. arXiv:1505.04467  [pdf, other

    cs.CV

    Exploring Nearest Neighbor Approaches for Image Captioning

    Authors: Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick

    Abstract: We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When mea… ▽ More

    Submitted 17 May, 2015; originally announced May 2015.

  6. arXiv:1505.01197  [pdf, other

    cs.CV

    Contextual Action Recognition with R*CNN

    Authors: Georgia Gkioxari, Ross Girshick, Jitendra Malik

    Abstract: There are multiple cues in an image which reveal what action a person is performing. For example, a jogger has a pose that is characteristic for jogging, but the scene (e.g. road, trail) and the presence of other joggers can be an additional source of information. In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition sy… ▽ More

    Submitted 24 March, 2016; v1 submitted 5 May, 2015; originally announced May 2015.

  7. arXiv:1504.08083  [pdf, other

    cs.CV

    Fast R-CNN

    Authors: Ross Girshick

    Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 n… ▽ More

    Submitted 27 September, 2015; v1 submitted 30 April, 2015; originally announced April 2015.

    Comments: To appear in ICCV 2015

  8. arXiv:1504.06066  [pdf, other

    cs.CV

    Object Detection Networks on Convolutional Feature Maps

    Authors: Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

    Abstract: Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper… ▽ More

    Submitted 17 August, 2016; v1 submitted 23 April, 2015; originally announced April 2015.

    Comments: To appear in TPAMI; substantial re-writing over the original post at arXiv of April 2015. COCO competition results included

  9. arXiv:1502.04652  [pdf, other

    cs.CV

    Inferring 3D Object Pose in RGB-D Images

    Authors: Saurabh Gupta, Pablo Arbeláez, Ross Girshick, Jitendra Malik

    Abstract: The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene using the approach from Gupta et al. [13]. We use a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel normals in images containing rendered synthetic ob… ▽ More

    Submitted 16 February, 2015; originally announced February 2015.

    Comments: 13 pages, 8 figures, 4 tables

  10. arXiv:1412.2604  [pdf, other

    cs.CV

    Actions and Attributes from Wholes and Parts

    Authors: Georgia Gkioxari, Ross Girshick, Jitendra Malik

    Abstract: We investigate the importance of parts for the tasks of action and attribute classification. We develop a part-based approach by leveraging convolutional network features inspired by recent advances in computer vision. Our part detectors are a deep version of poselets and capture parts of the human body under a distinct set of poses. For the tasks of action and attribute classification, we train h… ▽ More

    Submitted 5 May, 2015; v1 submitted 8 December, 2014; originally announced December 2014.

  11. arXiv:1411.5752  [pdf, other

    cs.CV

    Hypercolumns for Object Segmentation and Fine-grained Localization

    Authors: Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

    Abstract: Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of… ▽ More

    Submitted 25 April, 2015; v1 submitted 20 November, 2014; originally announced November 2014.

    Comments: CVPR Camera ready

  12. arXiv:1409.5403  [pdf, other

    cs.CV

    Deformable Part Models are Convolutional Neural Networks

    Authors: Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

    Abstract: Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a novel synthesis of the two ideas. Our constructio… ▽ More

    Submitted 1 October, 2014; v1 submitted 18 September, 2014; originally announced September 2014.

  13. arXiv:1408.5093  [pdf, other

    cs.CV cs.LG cs.NE

    Caffe: Convolutional Architecture for Fast Feature Embedding

    Authors: Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

    Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits i… ▽ More

    Submitted 20 June, 2014; originally announced August 2014.

    Comments: Tech report for the Caffe software at http://github.com/BVLC/Caffe/

  14. arXiv:1407.5736  [pdf, other

    cs.CV cs.RO

    Learning Rich Features from RGB-D Images for Object Detection and Segmentation

    Authors: Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik

    Abstract: In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature rep… ▽ More

    Submitted 22 July, 2014; originally announced July 2014.

    Comments: To appear in the European Conference on Computer Vision (ECCV), 2014

  15. arXiv:1407.5035  [pdf, other

    cs.CV

    LSDA: Large Scale Detection Through Adaptation

    Authors: Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

    Abstract: A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories. Recently, deep convolutional neural networks (CNNs) have emerged as clear winners on object classification benchmarks, in part due to training with 1.2M+ labeled classification images. Unfortunately, only a small fraction of those labels are available for the detection task.… ▽ More

    Submitted 31 October, 2014; v1 submitted 18 July, 2014; originally announced July 2014.

    Journal ref: Neural Information Processing Systems (NIPS) 2014

  16. arXiv:1407.3867  [pdf, other

    cs.CV

    Part-based R-CNNs for Fine-grained Category Detection

    Authors: Ning Zhang, Jeff Donahue, Ross Girshick, Trevor Darrell

    Abstract: Semantic part localization can facilitate fine-grained categorization by explicitly isolating subtle appearance differences associated with specific object parts. Methods for pose-normalized representations have been proposed, but generally presume bounding box annotations at test time due to the difficulty of object detection. We propose a model for fine-grained categorization that overcomes thes… ▽ More

    Submitted 14 July, 2014; originally announced July 2014.

    Comments: 16 pages. To appear at European Conference on Computer Vision (ECCV), 2014

  17. arXiv:1407.1808  [pdf, other

    cs.CV

    Simultaneous Detection and Segmentation

    Authors: Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

    Abstract: We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional ne… ▽ More

    Submitted 7 July, 2014; originally announced July 2014.

    Comments: To appear in the European Conference on Computer Vision (ECCV), 2014

  18. arXiv:1407.1610  [pdf, other

    cs.CV cs.NE

    Analyzing the Performance of Multilayer Neural Networks for Object Recognition

    Authors: Pulkit Agrawal, Ross Girshick, Jitendra Malik

    Abstract: In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks. CNN-based features seem poised to quickly replace engineered representations, such as SIFT and HOG. However, compared to SIFT and HOG, we understand much less about the nature of the features learned by large CNNs. In this paper, we experimentally pro… ▽ More

    Submitted 22 September, 2014; v1 submitted 7 July, 2014; originally announced July 2014.

    Comments: Published in European Conference on Computer Vision 2014 (ECCV-2014)

  19. arXiv:1406.5212  [pdf, other

    cs.CV

    R-CNNs for Pose Estimation and Action Detection

    Authors: Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

    Abstract: We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art result… ▽ More

    Submitted 19 June, 2014; originally announced June 2014.

  20. arXiv:1405.0312  [pdf, other

    cs.CV

    Microsoft COCO: Common Objects in Context

    Authors: Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

    Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object lo… ▽ More

    Submitted 20 February, 2015; v1 submitted 1 May, 2014; originally announced May 2014.

    Comments: 1) updated annotation pipeline description and figures; 2) added new section describing datasets splits; 3) updated author list

  21. arXiv:1404.1869  [pdf, other

    cs.CV

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    Authors: Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, Kurt Keutzer

    Abstract: Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow. However, for some CNN classifier topologies… ▽ More

    Submitted 7 April, 2014; originally announced April 2014.

  22. arXiv:1403.1024  [pdf, other

    cs.CV cs.LG

    On learning to localize objects with minimal supervision

    Authors: Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell

    Abstract: Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain. In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. Our approach combines a discriminative submodular cover problem for automatically discovering a se… ▽ More

    Submitted 15 May, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

  23. arXiv:1311.2524  [pdf, other

    cs.CV

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Authors: Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

    Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the p… ▽ More

    Submitted 22 October, 2014; v1 submitted 11 November, 2013; originally announced November 2013.

    Comments: Extended version of our CVPR 2014 paper; latest update (v5) includes results using deeper networks (see Appendix G. Changelog)