-
GeoStyle: Discovering Fashion Trends and Events
Authors:
Utkarsh Mall,
Kevin Matzen,
Bharath Hariharan,
Noah Snavely,
Kavita Bala
Abstract:
Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability…
▽ More
Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability to analyze spatial and temporal trends in these attributes or make predictions about the future. In this paper, we address this need by providing an automatic framework that analyzes large corpora of street imagery to (a) discover and forecast long-term trends of various fashion attributes as well as automatically discovered styles, and (b) identify spatio-temporally localized events that affect what people wear. We show that our framework makes long term trend forecasts that are >20% more accurate than the prior art, and identifies hundreds of socially meaningful events that impact fashion across the globe.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Modeling of rigidity dependent CORSIKA simulations for GRAPES-3
Authors:
B. Hariharan,
S. R. Dugad,
S. K. Gupta,
Y. Hayashi,
S. S. R. Inbanathan,
P. Jagadeesan,
A. Jain,
S. Kawakami,
P. K. Mohanty,
B. S. Rao
Abstract:
The GRAPES-3 muon telescope located in Ooty, India records 4x10^9 muons daily. These muons are produced by interaction of primary cosmic rays (PCRs) in the atmosphere. The high statistics of muons enables GRAPES-3 to make precise measurement of various sun-induced phenomenon including coronal mass ejections (CME), Forbush decreases, geomagnetic storms (GMS) and atmosphere acceleration during the o…
▽ More
The GRAPES-3 muon telescope located in Ooty, India records 4x10^9 muons daily. These muons are produced by interaction of primary cosmic rays (PCRs) in the atmosphere. The high statistics of muons enables GRAPES-3 to make precise measurement of various sun-induced phenomenon including coronal mass ejections (CME), Forbush decreases, geomagnetic storms (GMS) and atmosphere acceleration during the overhead passage of thunderclouds. However, the understanding and interpretation of observed data requires Monte Carlo (MC) simulation of PCRs and subsequent development of showers in the atmosphere. CORSIKA is a standard MC simulation code widely used for this purpose. However, these simulations are time consuming as large number of interactions and decays need to be taken into account at various stages of shower development from top of the atmosphere down to ground level. Therefore, computing resources become an important consideration particularly when billion of PCRs need to be simulated to match the high statistical accuracy of the data. During the GRAPES-3 simulations, it was observed that over 60% of simulated events don't really reach the Earth's atmosphere. The geomagnetic field (GMF) creates a threshold to PCRs called cutoff rigidity Rc, a direction dependent parameter below which PCRs can't reach the Earth's atmosphere. However, in CORSIKA there is no provision to set a direction dependent threshold. We have devised an efficient method that has taken into account of this Rc dependence. A reduction by a factor ~3 in simulation time and ~2 in output data size was achieved for GRAPES-3 simulations. This has been incorporated in CORSIKA version v75600 onwards. Detailed implementation of this along the potential benefits are discussed in this work.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows
Authors:
Guandao Yang,
Xun Huang,
Zekun Hao,
Ming-Yu Liu,
Serge Belongie,
Bharath Hariharan
Abstract:
As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to…
▽ More
As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Specifically, we learn a two-level hierarchy of distributions where the first level is the distribution of shapes and the second level is the distribution of points given a shape. This formulation allows us to both sample shapes and sample an arbitrary number of points from a shape. Our generative model, named PointFlow, learns each level of the distribution with a continuous normalizing flow. The invertibility of normalizing flows enables the computation of the likelihood during training and allows us to train our model in the variational inference framework. Empirically, we demonstrate that PointFlow achieves state-of-the-art performance in point cloud generation. We additionally show that our model can faithfully reconstruct point clouds and learn useful representations in an unsupervised manner. The code will be available at https://github.com/stevenygd/PointFlow.
△ Less
Submitted 2 September, 2019; v1 submitted 28 June, 2019;
originally announced June 2019.
-
Boosting Supervision with Self-Supervision for Few-shot Learning
Authors:
Jong-Chyi Su,
Subhransu Maji,
Bharath Hariharan
Abstract:
We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision.…
▽ More
We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision. Learning representations with self-supervised losses reduces the relative error rate of a state-of-the-art meta-learner by 5-25% on several few-shot learning benchmarks, as well as off-the-shelf deep networks on standard classification tasks when training from scratch. We find the benefits of self-supervision increase with the difficulty of the task. Our approach utilizes the images within the dataset to construct self-supervised losses and hence is an effective way of learning transferable representations without relying on any external training data.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving
Authors:
Yurong You,
Yan Wang,
Wei-Lun Chao,
Divyansh Garg,
Geoff Pleiss,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substanti…
▽ More
Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. Concretely, we adapt the stereo network architecture and loss function to be more aligned with accurate depth estimation of faraway objects --- currently the primary weakness of pseudo-LiDAR. Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. We propose a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map. We show on the KITTI object detection benchmark that our combined approach yields substantial improvements in depth estimation and stereo-based 3D object detection --- outperforming the previous state-of-the-art detection accuracy for faraway objects by 40%. Our code is available at https://github.com/mileyan/Pseudo_Lidar_V2.
△ Less
Submitted 15 February, 2020; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Few-Shot Learning with Localization in Realistic Settings
Authors:
Davis Wertheimer,
Bharath Hariharan
Abstract:
Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot…
▽ More
Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot learning do not work out of the box in these challenging conditions, based on a new "meta-iNat" benchmark. We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling. Together, these improvements double the accuracy of state-of-the-art models on meta-iNat while generalizing to prior benchmarks, complex neural architectures, and settings with substantial domain shift.
△ Less
Submitted 1 July, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.
-
Measurement of the Electrical Properties of a Thundercloud Through Muon Imaging by the GRAPES-3 Experiment
Authors:
B. Hariharan,
A. Chandra,
S. R. Dugad,
S. K. Gupta,
P. Jagadeesan,
A. Jain,
P. K. Mohanty,
S. D. Morris,
P. K. Nayak,
P. S. Rakshe,
K. Ramesh,
B. S. Rao,
L. V. Reddy,
M. Zuberi,
Y. Hayashi,
S. Kawakami,
S. Ahmad,
H. Kojima,
A. Oshima,
S. Shibata,
Y. Muraki,
K. Tanaka
Abstract:
The GRAPES-3 muon telescope located in Ooty, India records rapid ($\sim$10 min) variations in the muon intensity during major thunderstorms. Out of a total of 184 thunderstorms recorded during the interval April 2011-December 2014, the one on 1 December 2014 produced a massive potential of 1.3 GV. The electric field measured by four well-separated (up to 6 km) monitors on the ground was used to he…
▽ More
The GRAPES-3 muon telescope located in Ooty, India records rapid ($\sim$10 min) variations in the muon intensity during major thunderstorms. Out of a total of 184 thunderstorms recorded during the interval April 2011-December 2014, the one on 1 December 2014 produced a massive potential of 1.3 GV. The electric field measured by four well-separated (up to 6 km) monitors on the ground was used to help estimate some of the properties of this thundercloud including its altitude and area that were found to be 11.4 km above mean sea level (amsl) and $\geq$380 km$^2$, respectively. A charging time of 6 min to reach 1.3 GV implied the delivery of a power of $\geq$2 GW by this thundercloud that was moving at a speed of $\sim$60 km h$^{-1}$. This work possibly provides the first direct evidence for the generation of GV potentials in thunderclouds that could also possibly explain the production of highest energy (100 MeV) $γ$-rays in the terrestrial $γ$-ray flashes.
△ Less
Submitted 23 March, 2019;
originally announced March 2019.
-
Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
Authors:
Yan Wang,
Wei-Lun Chao,
Divyansh Garg,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimat…
▽ More
3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that it is not the quality of the data but its representation that accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo-image-based approaches. Our code is publicly available at https://github.com/mileyan/pseudo_lidar.
△ Less
Submitted 22 February, 2020; v1 submitted 18 December, 2018;
originally announced December 2018.
-
A Deep-Learning-Based Fashion Attributes Detection Model
Authors:
Menglin Jia,
Yichen Zhou,
Mengyun Shi,
Bharath Hariharan
Abstract:
Analyzing fashion attributes is essential in the fashion design process. Current fashion forecasting firms, such as WGSN utilizes information from all around the world (from fashion shows, visual merchandising, blogs, etc). They gather information by experience, by observation, by media scan, by interviews, and by exposed to new things. Such information analyzing process is called abstracting, whi…
▽ More
Analyzing fashion attributes is essential in the fashion design process. Current fashion forecasting firms, such as WGSN utilizes information from all around the world (from fashion shows, visual merchandising, blogs, etc). They gather information by experience, by observation, by media scan, by interviews, and by exposed to new things. Such information analyzing process is called abstracting, which recognize similarities or differences across all the garments and collections. In fact, such abstraction ability is useful in many fashion careers with different purposes. Fashion forecasters abstract across design collections and across time to identify fashion change and directions; designers, product developers and buyers abstract across a group of garments and collections to develop a cohesive and visually appeal lines; sales and marketing executives abstract across product line each season to recognize selling points; fashion journalist and bloggers abstract across runway photos to recognize symbolic core concepts that can be translated into editorial features. Fashion attributes analysis for such fashion insiders requires much detailed and in-depth attributes annotation than that for consumers, and requires inference on multiple domains. In this project, we propose a data-driven approach for recognizing fashion attributes. Specifically, a modified version of Faster R-CNN model is trained on images from a large-scale localization dataset with 594 fine-grained attributes under different scenarios, for example in online stores and street snapshots. This model will then be used to detect garment items and classify clothing attributes for runway photos and fashion illustrations.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Deep Fundamental Matrix Estimation without Correspondences
Authors:
Omid Poursaeed,
Guandao Yang,
Aditya Prakash,
Qiuren Fang,
Hanqing Jiang,
Bharath Hariharan,
Serge Belongie
Abstract:
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estim…
▽ More
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Resource Aware Person Re-identification across Multiple Resolutions
Authors:
Yan Wang,
Lequn Wang,
Yurong You,
Xu Zou,
Vincent Chen,
Serena Li,
Gao Huang,
Bharath Hariharan,
Kilian Q. Weinberger
Abstract:
Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needles…
▽ More
Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needlessly expensive for the easy ones. To remedy this, we present a new person re-ID model that combines effective embeddings built on multiple convolutional network layers, trained with deep-supervision. On traditional re-ID benchmarks, our method improves substantially over the previous state-of-the-art results on all five datasets that we evaluate on. We then propose two new formulations of the person re-ID problem under resource-constraints, and show how our model can be used to effectively trade off accuracy and computation in the presence of resource constraints. Code and pre-trained models are available at https://github.com/mileyan/DARENet.
△ Less
Submitted 1 October, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Was the cosmic ray burst detected by the GRAPES-3 on 22 June 2015 caused by transient weakening of geomagnetic field or by an interplanetary anisotropy?
Authors:
P. K. Mohanty,
K. P. Arunbabu,
T. Aziz,
S. R. Dugad,
S. K. Gupta,
B. Hariharan,
P. Jagadeesan,
A. Jain,
S. D. Morris,
P. K. Nayak,
P. S. Rakshe,
K. Ramesh,
B. S. Rao,
M. Zuberi,
Y. Hayashi,
S. Kawakami,
P. Subramanian,
S. Raha,
S. Ahmad,
A. Oshima,
S. Shibata,
H. Kojima
Abstract:
The GRAPES-3 muon telescope in Ooty, India had claimed detection of a 2 hour (h) high-energy ($\sim$20 GeV) burst of galactic cosmic-rays (GCRs) through a $>$50$σ$ surge in GeV muons, was caused by reconnection of the interplanetary magnetic field (IMF) in the magnetosphere that led to transient weakening of Earth's magnetic shield. This burst had occurred during a G4-class geomagnetic storm (stor…
▽ More
The GRAPES-3 muon telescope in Ooty, India had claimed detection of a 2 hour (h) high-energy ($\sim$20 GeV) burst of galactic cosmic-rays (GCRs) through a $>$50$σ$ surge in GeV muons, was caused by reconnection of the interplanetary magnetic field (IMF) in the magnetosphere that led to transient weakening of Earth's magnetic shield. This burst had occurred during a G4-class geomagnetic storm (storm) with a delay of $\frac{1}{2}$h relative to the coronal mass ejection (CME) of 22 June 2015 (Mohanty et al., 2016). However, recently a group interpreted the occurrence of the same burst in a subset of 31 neutron monitors (NMs) to have been the result of an anisotropy in interplanetary space (Evenson et al., 2017) in contrast to the claim in (Mohanty et al., 2016). A new analysis of the GRAPES-3 data with a fine 10.6$^{\circ}$ angular segmentation shows the speculation of interplanetary anisotropy to be incorrect, and offers a possible explanation of the NM observations. The observed 28 minutes (min) delay of the burst relative to the CME can be explained by the movement of the reconnection front from the bow shock to the surface of Earth at an average speed of 35 km/s, much lower than the CME speed of 700 km/s. This measurement may provide a more accurate estimate of the start of the storm.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Low-Shot Learning from Imaginary Data
Authors:
Yu-Xiong Wang,
Ross Girshick,
Martial Hebert,
Bharath Hariharan
Abstract:
Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views. Incorporating this ability to hallucinate novel instances of new concepts might help machine vision systems perform better low-shot learning, i.e., learning concepts from few examples. We present a novel approach to low-shot learning that uses this i…
▽ More
Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views. Incorporating this ability to hallucinate novel instances of new concepts might help machine vision systems perform better low-shot learning, i.e., learning concepts from few examples. We present a novel approach to low-shot learning that uses this idea. Our approach builds on recent progress in meta-learning ("learning to learn") by combining a meta-learner with a "hallucinator" that produces additional training examples, and optimizing both models jointly. Our hallucinator can be incorporated into a variety of meta-learners and provides significant gains: up to a 6 point boost in classification accuracy when only a single training example is available, yielding state-of-the-art performance on the challenging ImageNet low-shot classification benchmark.
△ Less
Submitted 2 April, 2018; v1 submitted 16 January, 2018;
originally announced January 2018.
-
Low-shot learning with large-scale diffusion
Authors:
Matthijs Douze,
Arthur Szlam,
Bharath Hariharan,
Hervé Jégou
Abstract:
This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based…
▽ More
This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction.
We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime.
△ Less
Submitted 15 June, 2018; v1 submitted 7 June, 2017;
originally announced June 2017.
-
Inferring and Executing Programs for Visual Reasoning
Authors:
Justin Johnson,
Bharath Hariharan,
Laurens van der Maaten,
Judy Hoffman,
Li Fei-Fei,
C. Lawrence Zitnick,
Ross Girshick
Abstract:
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a p…
▽ More
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Authors:
Justin Johnson,
Bharath Hariharan,
Laurens van der Maaten,
Li Fei-Fei,
C. Lawrence Zitnick,
Ross Girshick
Abstract:
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pi…
▽ More
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
△ Less
Submitted 20 December, 2016;
originally announced December 2016.
-
Learning Features by Watching Objects Move
Authors:
Deepak Pathak,
Ross Girshick,
Piotr Dollár,
Trevor Darrell,
Bharath Hariharan
Abstract:
This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grou** cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to s…
▽ More
This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grou** cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.
△ Less
Submitted 12 April, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Feature Pyramid Networks for Object Detection
Authors:
Tsung-Yi Lin,
Piotr Dollár,
Ross Girshick,
Kaiming He,
Bharath Hariharan,
Serge Belongie
Abstract:
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A…
▽ More
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.
△ Less
Submitted 19 April, 2017; v1 submitted 9 December, 2016;
originally announced December 2016.
-
Low-shot Visual Recognition by Shrinking and Hallucinating Features
Authors:
Bharath Hariharan,
Ross Girshick
Abstract:
Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then pro…
▽ More
Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose a) representation regularization techniques, and b) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3x on the challenging ImageNet dataset.
△ Less
Submitted 4 November, 2017; v1 submitted 9 June, 2016;
originally announced June 2016.
-
Iterative Instance Segmentation
Authors:
Ke Li,
Bharath Hariharan,
Jitendra Malik
Abstract:
Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible. While incorporating structure into the model should improve prediction quality, doing so is challenging - manually specifying the form of structural constraints may be impractical and inference often becomes intractable even if stru…
▽ More
Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible. While incorporating structure into the model should improve prediction quality, doing so is challenging - manually specifying the form of structural constraints may be impractical and inference often becomes intractable even if structural constraints are given. We sidestep this problem by reducing structured prediction to a sequence of unconstrained prediction problems and demonstrate that this approach is capable of automatically discovering priors on shape, contiguity of region predictions and smoothness of region contours from data without any a priori specification. On the instance segmentation task, this method outperforms the state-of-the-art, achieving a mean $\mathrm{AP}^{r}$ of 63.6% at 50% overlap and 43.3% at 70% overlap.
△ Less
Submitted 10 June, 2016; v1 submitted 26 November, 2015;
originally announced November 2015.
-
Exploring Person Context and Local Scene Context for Object Detection
Authors:
Saurabh Gupta,
Bharath Hariharan,
Jitendra Malik
Abstract:
In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships betwee…
▽ More
In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement).
△ Less
Submitted 25 November, 2015;
originally announced November 2015.
-
DeepBox: Learning Objectness with Convolutional Networks
Authors:
Weicheng Kuo,
Bharath Hariharan,
Jitendra Malik
Abstract:
Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct. We argue for a data-driven, semantic approach for ranking object proposals. Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method. We use a novel four-layer CNN architecture that…
▽ More
Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct. We argue for a data-driven, semantic approach for ranking object proposals. Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method. We use a novel four-layer CNN architecture that is as good as much larger networks on the task of evaluating objectness while being much faster. We show that DeepBox significantly improves over the bottom-up ranking, achieving the same recall with 500 proposals as achieved by bottom-up methods with 2000. This improvement generalizes to categories the CNN has never seen before and leads to a 4.5-point gain in detection mAP. Our implementation achieves this performance while running at 260 ms per image.
△ Less
Submitted 26 September, 2015; v1 submitted 8 May, 2015;
originally announced May 2015.
-
Hypercolumns for Object Segmentation and Fine-grained Localization
Authors:
Bharath Hariharan,
Pablo Arbeláez,
Ross Girshick,
Jitendra Malik
Abstract:
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of…
▽ More
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation[22], where we improve state-of-the-art from 49.7[22] mean AP^r to 60.0, keypoint localization, where we get a 3.3 point boost over[20] and part labeling, where we show a 6.6 point gain over a strong baseline.
△ Less
Submitted 25 April, 2015; v1 submitted 20 November, 2014;
originally announced November 2014.
-
Simultaneous Detection and Segmentation
Authors:
Bharath Hariharan,
Pablo Arbeláez,
Ross Girshick,
Jitendra Malik
Abstract:
We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional ne…
▽ More
We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [16]), introducing a novel architecture tailored for SDS. We then use category-specific, top- down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 5 point boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.
△ Less
Submitted 7 July, 2014;
originally announced July 2014.
-
R-CNNs for Pose Estimation and Action Detection
Authors:
Georgia Gkioxari,
Bharath Hariharan,
Ross Girshick,
Jitendra Malik
Abstract:
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art result…
▽ More
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art results for keypoint and action prediction. Additionally, we introduce a new dataset for action detection, the task of simultaneously localizing people and classifying their actions, and present results using our approach.
△ Less
Submitted 19 June, 2014;
originally announced June 2014.
-
Modeling gamma ray production from proton-proton interactions in high-energy astrophysical environments
Authors:
Dimitra Atri,
B. Hariharan
Abstract:
Gamma rays are the best probes to study high-energy particle interactions occurring in astrophysical environments. Space based instruments such as Fermi Large Area Telescope (Fermi LAT) and ground based experiments such as VERITAS, H.E.S.S. and MAGIC have provided us with valuable data on various production mechanisms of gamma rays within our Galaxy and beyond. Depending on astronomical conditions…
▽ More
Gamma rays are the best probes to study high-energy particle interactions occurring in astrophysical environments. Space based instruments such as Fermi Large Area Telescope (Fermi LAT) and ground based experiments such as VERITAS, H.E.S.S. and MAGIC have provided us with valuable data on various production mechanisms of gamma rays within our Galaxy and beyond. Depending on astronomical conditions, gamma rays can be produced either by hadronic or leptonic interactions. In this paper, we probe the production of gamma rays by the hadronic channel where gamma rays are primarily produced by the decay of secondary neutral pions and $η$ mesons from proton-proton interactions in a wide energy range. We use state of the art high-energy hadronic interaction models, calibrated with the new LHC results and widely used in ground based ultra-high energy air shower experiments. We also compare SIBYLL 2.1, QGSJET-II-04 and EPOS LHC models and provide lookup tables which can be used by researchers to model gamma ray production from the hadronic channel and ultimately extract the underlying proton spectrum from gamma ray observations.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
Galactic cosmic ray induced radiation dose on terrestrial exoplanets
Authors:
Dimitra Atri,
B. Hariharan,
Jean-Mathias Griessmeier
Abstract:
This past decade has seen tremendous advancements in the study of extrasolar planets. Observations are now made with increasing sophistication from both ground and space-based instruments, and exoplanets are characterized with increasing precision. There is a class of particularly interesting exoplanets, falling in the habitable zone, which is defined as the area around a star where the planet is…
▽ More
This past decade has seen tremendous advancements in the study of extrasolar planets. Observations are now made with increasing sophistication from both ground and space-based instruments, and exoplanets are characterized with increasing precision. There is a class of particularly interesting exoplanets, falling in the habitable zone, which is defined as the area around a star where the planet is capable of supporting liquid water on its surface. Theoretical calculations also suggest that close-in exoplanets are more likely to have weaker planetary magnetic fields, especially in case of super earths. Such exoplanets are subjected to a high flux of Galactic Cosmic Rays (GCRs) due to their weak magnetic moments. GCRs are energetic particles of astrophysical origin, which strike the planetary atmosphere and produce secondary particles, including muons, which are highly penetrating. Some of these particles reach the planetary surface and contribute to the radiation dose. Along with the magnetic field, another factor governing the radiation dose is the depth of the planetary atmosphere. The higher the depth of the planetary atmosphere, the lower the flux of secondary particles will be on the surface. If the secondary particles are energetic enough, and their flux is sufficiently high, the radiation from muons can also impact the sub-surface regions, such as in the case of Mars. If the radiation dose is too high, the chances of sustaining a long-term biosphere on the planet are very low. We explore the dependence of the GCR induced radiation dose on the strength of the planetary magnetic field and its atmospheric depth, finding that the latter is the decisive factor for the protection of a planetary biosphere.
△ Less
Submitted 16 September, 2013; v1 submitted 17 July, 2013;
originally announced July 2013.