Skip to main content

Showing 1–39 of 39 results for author: Dollár, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.02643  [pdf, other

    cs.CV cs.AI cs.LG

    Segment Anything

    Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

    Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project web-page: https://segment-anything.com

  2. arXiv:2303.13496  [pdf, other

    cs.CV cs.AI cs.LG

    The effectiveness of MAE pre-pretraining for billion-scale pretraining

    Authors: Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra

    Abstract: This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images. We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model. While MAE has on… ▽ More

    Submitted 24 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ICCV 2023. Models available at https://github.com/facebookresearch/maws/

  3. arXiv:2201.08371  [pdf, other

    cs.CV

    Revisiting Weakly Supervised Pre-Training of Visual Perception Models

    Authors: Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten

    Abstract: Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of res… ▽ More

    Submitted 2 April, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: CVPR 2022

  4. arXiv:2111.11429  [pdf, other

    cs.CV

    Benchmarking Detection Transfer Learning with Vision Transformers

    Authors: Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

    Abstract: Object detection is a central downstream task used to test if pre-trained network parameters confer benefits, such as improved accuracy or training speed. The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. These difficulties (e.g., architectural incompatibility, slow training, high memory consum… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  5. arXiv:2111.06377  [pdf, other

    cs.CV

    Masked Autoencoders Are Scalable Vision Learners

    Authors: Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

    Abstract: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), a… ▽ More

    Submitted 19 December, 2021; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: Tech report. arXiv v2: add more transfer learning results; v3: add robustness evaluation

  6. arXiv:2106.14881  [pdf, other

    cs.CV

    Early Convolutions Help Transformers See Better

    Authors: Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

    Abstract: Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional neural networks are easier to optimize. Why is this the case? In this work, we conjecture that the issue lies with the patchify stem of ViT models, which is impl… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  7. arXiv:2103.16562  [pdf, other

    cs.CV

    Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

    Authors: Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

    Abstract: We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects. The new quality me… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021, project page: https://bowenc0221.github.io/boundary-iou

  8. arXiv:2103.06877  [pdf, other

    cs.CV cs.LG

    Fast and Accurate Model Scaling

    Authors: Piotr Dollár, Mannat Singh, Ross Girshick

    Abstract: In this work we analyze strategies for convolutional neural network scaling; that is, the process of scaling a base convolutional network to endow it with greater computational complexity and consequently representational power. Example scaling strategies may include increasing model width, depth, resolution, etc. While various scaling strategies exist, their tradeoffs are not fully understood. Ex… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  9. arXiv:2102.01066  [pdf, other

    cs.CV

    Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

    Authors: Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick

    Abstract: By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On one hand, this is desirable as it treats all classes equally. On the other hand, it ignores cross-category confidence calibration, a key property in real-world use cases. Unfortunately, under important conditions (i.e., large vocabulary, high in… ▽ More

    Submitted 15 March, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

  10. arXiv:2003.13678  [pdf, other

    cs.CV cs.LG

    Designing Network Design Spaces

    Authors: Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elev… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  11. arXiv:2003.12056  [pdf, other

    cs.CV cs.LG

    Are Labels Necessary for Neural Architecture Search?

    Authors: Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

    Abstract: Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels. In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels? To answer this question, we first define a new setup called Unsupervised Neural Architecture Se… ▽ More

    Submitted 3 August, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: To appear in ECCV 2020 as spotlight. Code release: https://github.com/facebookresearch/unnas

  12. arXiv:1908.03195  [pdf, other

    cs.CV

    LVIS: A Dataset for Large Vocabulary Instance Segmentation

    Authors: Agrim Gupta, Piotr Dollár, Ross Girshick

    Abstract: Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation… ▽ More

    Submitted 15 September, 2019; v1 submitted 8 August, 2019; originally announced August 2019.

    Comments: Extension of the CVPR'19 paper describing release v0.5, the LVIS Challenge, and baseline results

  13. arXiv:1905.13214  [pdf, other

    cs.CV cs.LG

    On Network Design Spaces for Visual Recognition

    Authors: Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

    Abstract: Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying sta… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: tech report

  14. arXiv:1903.12174  [pdf, other

    cs.CV

    TensorMask: A Foundation for Dense Object Segmentation

    Authors: Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: Sliding-window object detectors that generate bounding-box object predictions over a dense, regular grid have advanced rapidly and proven popular. In contrast, modern instance segmentation approaches are dominated by methods that first detect object bounding boxes, and then crop and segment these regions, as popularized by Mask R-CNN. In this work, we investigate the paradigm of dense sliding-wind… ▽ More

    Submitted 27 August, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: accepted to ICCV

  15. arXiv:1901.02446  [pdf, other

    cs.CV

    Panoptic Feature Pyramid Networks

    Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we… ▽ More

    Submitted 10 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: accepted to CVPR 2019

  16. arXiv:1811.08883  [pdf, other

    cs.CV

    Rethinking ImageNet Pre-training

    Authors: Kaiming He, Ross Girshick, Piotr Dollár

    Abstract: We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the numb… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: Technical report

  17. arXiv:1801.00868  [pdf, other

    cs.CV

    Panoptic Segmentation

    Authors: Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár

    Abstract: We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems.… ▽ More

    Submitted 10 April, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

    Comments: accepted to CVPR 2019

  18. arXiv:1712.04440  [pdf, other

    cs.CV

    Data Distillation: Towards Omni-Supervised Learning

    Authors: Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He

    Abstract: We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lower-bounded by performance on existing labeled datasets, offering the potential to surpass state-of-the-art fully supervised methods. To exploit the omni-supervised setting, we pro… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: tech report

  19. arXiv:1711.10370  [pdf, other

    cs.CV

    Learning to Segment Every Thing

    Authors: Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick

    Abstract: Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enable… ▽ More

    Submitted 27 March, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

  20. arXiv:1708.02002  [pdf, other

    cs.CV

    Focal Loss for Dense Object Detection

    Authors: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In… ▽ More

    Submitted 7 February, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

  21. arXiv:1706.02677  [pdf, other

    cs.CV cs.DC cs.LG

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    Authors: Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

    Abstract: Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large,… ▽ More

    Submitted 30 April, 2018; v1 submitted 8 June, 2017; originally announced June 2017.

    Comments: Tech report (v2: correct typos)

  22. arXiv:1704.07333  [pdf, other

    cs.CV

    Detecting and Recognizing Human-Object Interactions

    Authors: Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He

    Abstract: To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting <human, verb, object> triplets in challenging everyday photos. We propose a novel model… ▽ More

    Submitted 26 March, 2018; v1 submitted 24 April, 2017; originally announced April 2017.

  23. arXiv:1703.06870  [pdf, other

    cs.CV

    Mask R-CNN

    Authors: Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

    Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognit… ▽ More

    Submitted 24 January, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: open source; appendix on more results

  24. arXiv:1612.06370  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Learning Features by Watching Objects Move

    Authors: Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

    Abstract: This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grou** cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to s… ▽ More

    Submitted 12 April, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

    Comments: CVPR 2017

  25. arXiv:1612.03144  [pdf, other

    cs.CV

    Feature Pyramid Networks for Object Detection

    Authors: Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie

    Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A… ▽ More

    Submitted 19 April, 2017; v1 submitted 9 December, 2016; originally announced December 2016.

  26. arXiv:1611.05431  [pdf, other

    cs.CV

    Aggregated Residual Transformations for Deep Neural Networks

    Authors: Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

    Abstract: We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call "cardinality" (the size… ▽ More

    Submitted 10 April, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: Accepted to CVPR 2017. Code and models: https://github.com/facebookresearch/ResNeXt

  27. arXiv:1604.02135  [pdf, other

    cs.CV

    A MultiPath Network for Object Detection

    Authors: Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár

    Abstract: The recent COCO object detection dataset presents several new challenges for object detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple… ▽ More

    Submitted 8 August, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

  28. arXiv:1603.08695  [pdf, other

    cs.CV

    Learning to Refine Object Segments

    Authors: Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr

    Abstract: Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a nove… ▽ More

    Submitted 26 July, 2016; v1 submitted 29 March, 2016; originally announced March 2016.

    Comments: extended version of ECCV camera-ready (figures 6-9 only in arXiv)

  29. arXiv:1511.05939  [pdf, other

    stat.ML cs.LG

    Metric Learning with Adaptive Density Discrimination

    Authors: Oren Rippel, Manohar Paluri, Piotr Dollar, Lubomir Bourdev

    Abstract: Distance metric learning (DML) approaches learn a transformation to a representation space where distance is in correspondence with a predefined notion of similarity. While such models offer a number of compelling benefits, it has been difficult for these to compete with modern classification algorithms in performance and even in feature extraction. In this work, we propose a novel approach expl… ▽ More

    Submitted 1 March, 2016; v1 submitted 18 November, 2015; originally announced November 2015.

    Comments: ICLR 2016

  30. arXiv:1511.04166  [pdf, other

    cs.CV

    Unsupervised Learning of Edges

    Authors: Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár

    Abstract: Data-driven approaches for edge detection have proven effective and achieve top results on modern benchmarks. However, all current data-driven edge detectors require manual supervision for training in the form of hand-labeled region segments or object boundaries. Specifically, human annotators mark semantically meaningful edges which are subsequently used for training. Is this form of strong, high… ▽ More

    Submitted 10 April, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: Camera ready version for CVPR 2016

  31. arXiv:1509.01329  [pdf, other

    cs.CV

    Semantic Amodal Segmentation

    Authors: Yan Zhu, Yuandong Tian, Dimitris Mexatas, Piotr Dollár

    Abstract: Common visual recognition tasks such as classification, object detection, and semantic segmentation are rapidly reaching maturity, and given the recent rate of progress, it is not unreasonable to conjecture that techniques for many of these problems will approach human levels of performance in the next few years. In this paper we look to the future: what is the next frontier in visual recognition?… ▽ More

    Submitted 14 December, 2016; v1 submitted 3 September, 2015; originally announced September 2015.

    Comments: major update including new COCO data, metrics, and baselines

  32. arXiv:1506.06204  [pdf, other

    cs.CV

    Learning to Segment Object Candidates

    Authors: Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar

    Abstract: Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier. Such approaches have been shown they can be fast, while achieving the state of the art in detection performance. In this paper, we propose a new way to generate object proposals, introducin… ▽ More

    Submitted 1 September, 2015; v1 submitted 20 June, 2015; originally announced June 2015.

  33. arXiv:1504.00325  [pdf, other

    cs.CV cs.CL

    Microsoft COCO Captions: Data Collection and Evaluation Server

    Authors: Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick

    Abstract: In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is us… ▽ More

    Submitted 3 April, 2015; v1 submitted 1 April, 2015; originally announced April 2015.

    Comments: arXiv admin note: text overlap with arXiv:1411.4952

  34. What makes for effective detection proposals?

    Authors: Jan Hosang, Rodrigo Benenson, Piotr Dollár, Bernt Schiele

    Abstract: Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines reg… ▽ More

    Submitted 1 August, 2015; v1 submitted 17 February, 2015; originally announced February 2015.

    Comments: TPAMI final version, duplicate proposals removed in experiments

  35. arXiv:1411.4952  [pdf, other

    cs.CV cs.CL

    From Captions to Visual Concepts and Back

    Authors: Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig

    Abstract: This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word det… ▽ More

    Submitted 14 April, 2015; v1 submitted 18 November, 2014; originally announced November 2014.

    Comments: version corresponding to CVPR15 paper

  36. arXiv:1406.5549  [pdf, other

    cs.CV

    Fast Edge Detection Using Structured Forests

    Authors: Piotr Dollár, C. Lawrence Zitnick

    Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the proble… ▽ More

    Submitted 24 November, 2014; v1 submitted 20 June, 2014; originally announced June 2014.

    Comments: update corresponding to acceptance to PAMI

  37. arXiv:1406.1134  [pdf, other

    cs.CV

    Local Decorrelation For Improved Detection

    Authors: Woonhyun Nam, Piotr Dollár, Joon Hee Han

    Abstract: Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology… ▽ More

    Submitted 3 November, 2014; v1 submitted 4 June, 2014; originally announced June 2014.

    Comments: To appear in Neural Information Processing Systems (NIPS), 2014

  38. arXiv:1405.6804  [pdf, ps, other

    stat.ML cs.LG

    Layered Logic Classifiers: Exploring the `And' and `Or' Relations

    Authors: Zhuowen Tu, Piotr Dollar, Yingnian Wu

    Abstract: Designing effective and efficient classifier for pattern analysis is a key problem in machine learning and computer vision. Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'. Classification and regression tree (CART) include these operations explicitly. Other methods such as neural networks, SVM, and boosting learn/compute a weighted sum on featur… ▽ More

    Submitted 27 May, 2014; v1 submitted 27 May, 2014; originally announced May 2014.

  39. arXiv:1405.0312  [pdf, other

    cs.CV

    Microsoft COCO: Common Objects in Context

    Authors: Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

    Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object lo… ▽ More

    Submitted 20 February, 2015; v1 submitted 1 May, 2014; originally announced May 2014.

    Comments: 1) updated annotation pipeline description and figures; 2) added new section describing datasets splits; 3) updated author list