Search | arXiv e-print repository

arXiv:1908.11412 [pdf, other]

GeoStyle: Discovering Fashion Trends and Events

Authors: Utkarsh Mall, Kevin Matzen, Bharath Hariharan, Noah Snavely, Kavita Bala

Abstract: Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability… ▽ More Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability to analyze spatial and temporal trends in these attributes or make predictions about the future. In this paper, we address this need by providing an automatic framework that analyzes large corpora of street imagery to (a) discover and forecast long-term trends of various fashion attributes as well as automatically discovered styles, and (b) identify spatio-temporally localized events that affect what people wear. We show that our framework makes long term trend forecasts that are >20% more accurate than the prior art, and identifies hundreds of socially meaningful events that impact fashion across the globe. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: Accepted in ICCV 2019

arXiv:1908.05948 [pdf, ps, other]

doi 10.1007/s10686-019-09640-0

Modeling of rigidity dependent CORSIKA simulations for GRAPES-3

Authors: B. Hariharan, S. R. Dugad, S. K. Gupta, Y. Hayashi, S. S. R. Inbanathan, P. Jagadeesan, A. Jain, S. Kawakami, P. K. Mohanty, B. S. Rao

Abstract: The GRAPES-3 muon telescope located in Ooty, India records 4x10^9 muons daily. These muons are produced by interaction of primary cosmic rays (PCRs) in the atmosphere. The high statistics of muons enables GRAPES-3 to make precise measurement of various sun-induced phenomenon including coronal mass ejections (CME), Forbush decreases, geomagnetic storms (GMS) and atmosphere acceleration during the o… ▽ More The GRAPES-3 muon telescope located in Ooty, India records 4x10^9 muons daily. These muons are produced by interaction of primary cosmic rays (PCRs) in the atmosphere. The high statistics of muons enables GRAPES-3 to make precise measurement of various sun-induced phenomenon including coronal mass ejections (CME), Forbush decreases, geomagnetic storms (GMS) and atmosphere acceleration during the overhead passage of thunderclouds. However, the understanding and interpretation of observed data requires Monte Carlo (MC) simulation of PCRs and subsequent development of showers in the atmosphere. CORSIKA is a standard MC simulation code widely used for this purpose. However, these simulations are time consuming as large number of interactions and decays need to be taken into account at various stages of shower development from top of the atmosphere down to ground level. Therefore, computing resources become an important consideration particularly when billion of PCRs need to be simulated to match the high statistical accuracy of the data. During the GRAPES-3 simulations, it was observed that over 60% of simulated events don't really reach the Earth's atmosphere. The geomagnetic field (GMF) creates a threshold to PCRs called cutoff rigidity Rc, a direction dependent parameter below which PCRs can't reach the Earth's atmosphere. However, in CORSIKA there is no provision to set a direction dependent threshold. We have devised an efficient method that has taken into account of this Rc dependence. A reduction by a factor ~3 in simulation time and ~2 in output data size was achieved for GRAPES-3 simulations. This has been incorporated in CORSIKA version v75600 onwards. Detailed implementation of this along the potential benefits are discussed in this work. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: Exp Astron (2019)

arXiv:1906.12320 [pdf, other]

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

Authors: Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan

Abstract: As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to… ▽ More As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Specifically, we learn a two-level hierarchy of distributions where the first level is the distribution of shapes and the second level is the distribution of points given a shape. This formulation allows us to both sample shapes and sample an arbitrary number of points from a shape. Our generative model, named PointFlow, learns each level of the distribution with a continuous normalizing flow. The invertibility of normalizing flows enables the computation of the likelihood during training and allows us to train our model in the variational inference framework. Empirically, we demonstrate that PointFlow achieves state-of-the-art performance in point cloud generation. We additionally show that our model can faithfully reconstruct point clouds and learn useful representations in an unsupervised manner. The code will be available at https://github.com/stevenygd/PointFlow. △ Less

Submitted 2 September, 2019; v1 submitted 28 June, 2019; originally announced June 2019.

Comments: Published in ICCV 2019

arXiv:1906.07079 [pdf, other]

Boosting Supervision with Self-Supervision for Few-shot Learning

Authors: Jong-Chyi Su, Subhransu Maji, Bharath Hariharan

Abstract: We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision.… ▽ More We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision. Learning representations with self-supervised losses reduces the relative error rate of a state-of-the-art meta-learner by 5-25% on several few-shot learning benchmarks, as well as off-the-shelf deep networks on standard classification tasks when training from scratch. We find the benefits of self-supervision increase with the difficulty of the task. Our approach utilizes the images within the dataset to construct self-supervised losses and hence is an effective way of learning transferable representations without relying on any external training data. △ Less

Submitted 17 June, 2019; originally announced June 2019.

arXiv:1906.06310 [pdf, other]

Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

Authors: Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

Abstract: Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substanti… ▽ More Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. Concretely, we adapt the stereo network architecture and loss function to be more aligned with accurate depth estimation of faraway objects --- currently the primary weakness of pseudo-LiDAR. Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. We propose a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map. We show on the KITTI object detection benchmark that our combined approach yields substantial improvements in depth estimation and stereo-based 3D object detection --- outperforming the previous state-of-the-art detection accuracy for faraway objects by 40%. Our code is available at https://github.com/mileyan/Pseudo_Lidar_V2. △ Less

Submitted 15 February, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

Comments: Accepted to International Conference on Learning Representations (ICLR) 2020

arXiv:1904.08502 [pdf, other]

Few-Shot Learning with Localization in Realistic Settings

Authors: Davis Wertheimer, Bharath Hariharan

Abstract: Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot… ▽ More Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot learning do not work out of the box in these challenging conditions, based on a new "meta-iNat" benchmark. We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling. Together, these improvements double the accuracy of state-of-the-art models on meta-iNat while generalizing to prior benchmarks, complex neural architectures, and settings with substantial domain shift. △ Less

Submitted 1 July, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: Appearing in CVPR 2019; added references in covariance pooling sections, added link to code in supplementary

arXiv:1903.09801 [pdf, ps, other]

doi 10.1103/PhysRevLett.122.105101

Measurement of the Electrical Properties of a Thundercloud Through Muon Imaging by the GRAPES-3 Experiment

Authors: B. Hariharan, A. Chandra, S. R. Dugad, S. K. Gupta, P. Jagadeesan, A. Jain, P. K. Mohanty, S. D. Morris, P. K. Nayak, P. S. Rakshe, K. Ramesh, B. S. Rao, L. V. Reddy, M. Zuberi, Y. Hayashi, S. Kawakami, S. Ahmad, H. Kojima, A. Oshima, S. Shibata, Y. Muraki, K. Tanaka

Abstract: The GRAPES-3 muon telescope located in Ooty, India records rapid ($\sim$10 min) variations in the muon intensity during major thunderstorms. Out of a total of 184 thunderstorms recorded during the interval April 2011-December 2014, the one on 1 December 2014 produced a massive potential of 1.3 GV. The electric field measured by four well-separated (up to 6 km) monitors on the ground was used to he… ▽ More The GRAPES-3 muon telescope located in Ooty, India records rapid ($\sim$10 min) variations in the muon intensity during major thunderstorms. Out of a total of 184 thunderstorms recorded during the interval April 2011-December 2014, the one on 1 December 2014 produced a massive potential of 1.3 GV. The electric field measured by four well-separated (up to 6 km) monitors on the ground was used to help estimate some of the properties of this thundercloud including its altitude and area that were found to be 11.4 km above mean sea level (amsl) and $\geq$380 km$^2$, respectively. A charging time of 6 min to reach 1.3 GV implied the delivery of a power of $\geq$2 GW by this thundercloud that was moving at a speed of $\sim$60 km h$^{-1}$. This work possibly provides the first direct evidence for the generation of GV potentials in thunderclouds that could also possibly explain the production of highest energy (100 MeV) $γ$-rays in the terrestrial $γ$-ray flashes. △ Less

Submitted 23 March, 2019; originally announced March 2019.

Comments: Received 6 January 2019, Revised 21 January 2019, Published 15 March 2019

Journal ref: Phys. Rev. Lett. 122, 105101 (2019)

arXiv:1812.07179 [pdf, other]

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Authors: Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

Abstract: 3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimat… ▽ More 3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that it is not the quality of the data but its representation that accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo-image-based approaches. Our code is publicly available at https://github.com/mileyan/pseudo_lidar. △ Less

Submitted 22 February, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

Comments: Accepted by CVPR 2019

arXiv:1810.10148 [pdf]

A Deep-Learning-Based Fashion Attributes Detection Model

Authors: Menglin Jia, Yichen Zhou, Mengyun Shi, Bharath Hariharan

Abstract: Analyzing fashion attributes is essential in the fashion design process. Current fashion forecasting firms, such as WGSN utilizes information from all around the world (from fashion shows, visual merchandising, blogs, etc). They gather information by experience, by observation, by media scan, by interviews, and by exposed to new things. Such information analyzing process is called abstracting, whi… ▽ More Analyzing fashion attributes is essential in the fashion design process. Current fashion forecasting firms, such as WGSN utilizes information from all around the world (from fashion shows, visual merchandising, blogs, etc). They gather information by experience, by observation, by media scan, by interviews, and by exposed to new things. Such information analyzing process is called abstracting, which recognize similarities or differences across all the garments and collections. In fact, such abstraction ability is useful in many fashion careers with different purposes. Fashion forecasters abstract across design collections and across time to identify fashion change and directions; designers, product developers and buyers abstract across a group of garments and collections to develop a cohesive and visually appeal lines; sales and marketing executives abstract across product line each season to recognize selling points; fashion journalist and bloggers abstract across runway photos to recognize symbolic core concepts that can be translated into editorial features. Fashion attributes analysis for such fashion insiders requires much detailed and in-depth attributes annotation than that for consumers, and requires inference on multiple domains. In this project, we propose a data-driven approach for recognizing fashion attributes. Specifically, a modified version of Faster R-CNN model is trained on images from a large-scale localization dataset with 594 fine-grained attributes under different scenarios, for example in online stores and street snapshots. This model will then be used to detect garment items and classify clothing attributes for runway photos and fashion illustrations. △ Less

Submitted 23 October, 2018; originally announced October 2018.

arXiv:1810.01575 [pdf, other]

Deep Fundamental Matrix Estimation without Correspondences

Authors: Omid Poursaeed, Guandao Yang, Aditya Prakash, Qiuren Fang, Hanqing Jiang, Bharath Hariharan, Serge Belongie

Abstract: Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estim… ▽ More Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Comments: ECCV 2018, Geometry Meets Deep Learning Workshop

arXiv:1805.08805 [pdf, other]

Resource Aware Person Re-identification across Multiple Resolutions

Authors: Yan Wang, Lequn Wang, Yurong You, Xu Zou, Vincent Chen, Serena Li, Gao Huang, Bharath Hariharan, Kilian Q. Weinberger

Abstract: Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needles… ▽ More Not all people are equally easy to identify: color statistics might be enough for some cases while others might require careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) methods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needlessly expensive for the easy ones. To remedy this, we present a new person re-ID model that combines effective embeddings built on multiple convolutional network layers, trained with deep-supervision. On traditional re-ID benchmarks, our method improves substantially over the previous state-of-the-art results on all five datasets that we evaluate on. We then propose two new formulations of the person re-ID problem under resource-constraints, and show how our model can be used to effectively trade off accuracy and computation in the presence of resource constraints. Code and pre-trained models are available at https://github.com/mileyan/DARENet. △ Less

Submitted 1 October, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 8 pages, 8 figures, CVPR 2018

arXiv:1803.10499 [pdf, ps, other]

doi 10.1103/PhysRevD.97.082001

Was the cosmic ray burst detected by the GRAPES-3 on 22 June 2015 caused by transient weakening of geomagnetic field or by an interplanetary anisotropy?

Authors: P. K. Mohanty, K. P. Arunbabu, T. Aziz, S. R. Dugad, S. K. Gupta, B. Hariharan, P. Jagadeesan, A. Jain, S. D. Morris, P. K. Nayak, P. S. Rakshe, K. Ramesh, B. S. Rao, M. Zuberi, Y. Hayashi, S. Kawakami, P. Subramanian, S. Raha, S. Ahmad, A. Oshima, S. Shibata, H. Kojima

Abstract: The GRAPES-3 muon telescope in Ooty, India had claimed detection of a 2 hour (h) high-energy ($\sim$20 GeV) burst of galactic cosmic-rays (GCRs) through a $>$50$σ$ surge in GeV muons, was caused by reconnection of the interplanetary magnetic field (IMF) in the magnetosphere that led to transient weakening of Earth's magnetic shield. This burst had occurred during a G4-class geomagnetic storm (stor… ▽ More The GRAPES-3 muon telescope in Ooty, India had claimed detection of a 2 hour (h) high-energy ($\sim$20 GeV) burst of galactic cosmic-rays (GCRs) through a $>$50$σ$ surge in GeV muons, was caused by reconnection of the interplanetary magnetic field (IMF) in the magnetosphere that led to transient weakening of Earth's magnetic shield. This burst had occurred during a G4-class geomagnetic storm (storm) with a delay of $\frac{1}{2}$h relative to the coronal mass ejection (CME) of 22 June 2015 (Mohanty et al., 2016). However, recently a group interpreted the occurrence of the same burst in a subset of 31 neutron monitors (NMs) to have been the result of an anisotropy in interplanetary space (Evenson et al., 2017) in contrast to the claim in (Mohanty et al., 2016). A new analysis of the GRAPES-3 data with a fine 10.6$^{\circ}$ angular segmentation shows the speculation of interplanetary anisotropy to be incorrect, and offers a possible explanation of the NM observations. The observed 28 minutes (min) delay of the burst relative to the CME can be explained by the movement of the reconnection front from the bow shock to the surface of Earth at an average speed of 35 km/s, much lower than the CME speed of 700 km/s. This measurement may provide a more accurate estimate of the start of the storm. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: Accepted for Publication in Physical Review D

arXiv:1801.05401 [pdf, other]

Low-Shot Learning from Imaginary Data

Authors: Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan

Abstract: Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views. Incorporating this ability to hallucinate novel instances of new concepts might help machine vision systems perform better low-shot learning, i.e., learning concepts from few examples. We present a novel approach to low-shot learning that uses this i… ▽ More Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views. Incorporating this ability to hallucinate novel instances of new concepts might help machine vision systems perform better low-shot learning, i.e., learning concepts from few examples. We present a novel approach to low-shot learning that uses this idea. Our approach builds on recent progress in meta-learning ("learning to learn") by combining a meta-learner with a "hallucinator" that produces additional training examples, and optimizing both models jointly. Our hallucinator can be incorporated into a variety of meta-learners and provides significant gains: up to a 6 point boost in classification accuracy when only a single training example is available, yielding state-of-the-art performance on the challenging ImageNet low-shot classification benchmark. △ Less

Submitted 2 April, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: CVPR 2018 camera-ready version

arXiv:1706.02332 [pdf, other]

Low-shot learning with large-scale diffusion

Authors: Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

Abstract: This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based… ▽ More This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime. △ Less

Submitted 15 June, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1705.03633 [pdf, other]

Inferring and Executing Programs for Visual Reasoning

Authors: Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Abstract: Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a p… ▽ More Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings. △ Less

Submitted 10 May, 2017; originally announced May 2017.

arXiv:1612.06890 [pdf, other]

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Authors: Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Abstract: When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pi… ▽ More When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations. △ Less

Submitted 20 December, 2016; originally announced December 2016.

arXiv:1612.06370 [pdf, other]

Learning Features by Watching Objects Move

Authors: Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan

Abstract: This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grou** cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to s… ▽ More This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grou** cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce. △ Less

Submitted 12 April, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

Comments: CVPR 2017

arXiv:1612.03144 [pdf, other]

Feature Pyramid Networks for Object Detection

Authors: Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie

Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A… ▽ More Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available. △ Less

Submitted 19 April, 2017; v1 submitted 9 December, 2016; originally announced December 2016.

arXiv:1606.02819 [pdf, other]

Low-shot Visual Recognition by Shrinking and Hallucinating Features

Authors: Bharath Hariharan, Ross Girshick

Abstract: Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then pro… ▽ More Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose a) representation regularization techniques, and b) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3x on the challenging ImageNet dataset. △ Less

Submitted 4 November, 2017; v1 submitted 9 June, 2016; originally announced June 2016.

Comments: ICCV 2017 spotlight

arXiv:1511.08498 [pdf, other]

Iterative Instance Segmentation

Authors: Ke Li, Bharath Hariharan, Jitendra Malik

Abstract: Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible. While incorporating structure into the model should improve prediction quality, doing so is challenging - manually specifying the form of structural constraints may be impractical and inference often becomes intractable even if stru… ▽ More Existing methods for pixel-wise labelling tasks generally disregard the underlying structure of labellings, often leading to predictions that are visually implausible. While incorporating structure into the model should improve prediction quality, doing so is challenging - manually specifying the form of structural constraints may be impractical and inference often becomes intractable even if structural constraints are given. We sidestep this problem by reducing structured prediction to a sequence of unconstrained prediction problems and demonstrate that this approach is capable of automatically discovering priors on shape, contiguity of region predictions and smoothness of region contours from data without any a priori specification. On the instance segmentation task, this method outperforms the state-of-the-art, achieving a mean $\mathrm{AP}^{r}$ of 63.6% at 50% overlap and 43.3% at 70% overlap. △ Less

Submitted 10 June, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

Comments: 13 pages, 10 figures; IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

arXiv:1511.08177 [pdf, other]

Exploring Person Context and Local Scene Context for Object Detection

Authors: Saurabh Gupta, Bharath Hariharan, Jitendra Malik

Abstract: In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships betwee… ▽ More In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement). △ Less

Submitted 25 November, 2015; originally announced November 2015.

arXiv:1505.02146 [pdf, other]

DeepBox: Learning Objectness with Convolutional Networks

Authors: Weicheng Kuo, Bharath Hariharan, Jitendra Malik

Abstract: Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct. We argue for a data-driven, semantic approach for ranking object proposals. Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method. We use a novel four-layer CNN architecture that… ▽ More Existing object proposal approaches use primarily bottom-up cues to rank proposals, while we believe that objectness is in fact a high level construct. We argue for a data-driven, semantic approach for ranking object proposals. Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method. We use a novel four-layer CNN architecture that is as good as much larger networks on the task of evaluating objectness while being much faster. We show that DeepBox significantly improves over the bottom-up ranking, achieving the same recall with 500 proposals as achieved by bottom-up methods with 2000. This improvement generalizes to categories the CNN has never seen before and leads to a 4.5-point gain in detection mAP. Our implementation achieves this performance while running at 260 ms per image. △ Less

Submitted 26 September, 2015; v1 submitted 8 May, 2015; originally announced May 2015.

Comments: ICCV 2015 Camera-ready version

arXiv:1411.5752 [pdf, other]

Hypercolumns for Object Segmentation and Fine-grained Localization

Authors: Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Abstract: Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of… ▽ More Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation[22], where we improve state-of-the-art from 49.7[22] mean AP^r to 60.0, keypoint localization, where we get a 3.3 point boost over[20] and part labeling, where we show a 6.6 point gain over a strong baseline. △ Less

Submitted 25 April, 2015; v1 submitted 20 November, 2014; originally announced November 2014.

Comments: CVPR Camera ready

arXiv:1407.1808 [pdf, other]

Simultaneous Detection and Segmentation

Authors: Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

Abstract: We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional ne… ▽ More We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [16]), introducing a novel architecture tailored for SDS. We then use category-specific, top- down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 5 point boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work. △ Less

Submitted 7 July, 2014; originally announced July 2014.

Comments: To appear in the European Conference on Computer Vision (ECCV), 2014

arXiv:1406.5212 [pdf, other]

R-CNNs for Pose Estimation and Action Detection

Authors: Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik

Abstract: We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art result… ▽ More We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art results for keypoint and action prediction. Additionally, we introduce a new dataset for action detection, the task of simultaneously localizing people and classifying their actions, and present results using our approach. △ Less

Submitted 19 June, 2014; originally announced June 2014.

arXiv:1309.2360 [pdf, other]

Modeling gamma ray production from proton-proton interactions in high-energy astrophysical environments

Authors: Dimitra Atri, B. Hariharan

Abstract: Gamma rays are the best probes to study high-energy particle interactions occurring in astrophysical environments. Space based instruments such as Fermi Large Area Telescope (Fermi LAT) and ground based experiments such as VERITAS, H.E.S.S. and MAGIC have provided us with valuable data on various production mechanisms of gamma rays within our Galaxy and beyond. Depending on astronomical conditions… ▽ More Gamma rays are the best probes to study high-energy particle interactions occurring in astrophysical environments. Space based instruments such as Fermi Large Area Telescope (Fermi LAT) and ground based experiments such as VERITAS, H.E.S.S. and MAGIC have provided us with valuable data on various production mechanisms of gamma rays within our Galaxy and beyond. Depending on astronomical conditions, gamma rays can be produced either by hadronic or leptonic interactions. In this paper, we probe the production of gamma rays by the hadronic channel where gamma rays are primarily produced by the decay of secondary neutral pions and $η$ mesons from proton-proton interactions in a wide energy range. We use state of the art high-energy hadronic interaction models, calibrated with the new LHC results and widely used in ground based ultra-high energy air shower experiments. We also compare SIBYLL 2.1, QGSJET-II-04 and EPOS LHC models and provide lookup tables which can be used by researchers to model gamma ray production from the hadronic channel and ultimately extract the underlying proton spectrum from gamma ray observations. △ Less

Submitted 9 September, 2013; originally announced September 2013.

arXiv:1307.4704 [pdf]

doi 10.1089/ast.2013.1052

Galactic cosmic ray induced radiation dose on terrestrial exoplanets

Authors: Dimitra Atri, B. Hariharan, Jean-Mathias Griessmeier

Abstract: This past decade has seen tremendous advancements in the study of extrasolar planets. Observations are now made with increasing sophistication from both ground and space-based instruments, and exoplanets are characterized with increasing precision. There is a class of particularly interesting exoplanets, falling in the habitable zone, which is defined as the area around a star where the planet is… ▽ More This past decade has seen tremendous advancements in the study of extrasolar planets. Observations are now made with increasing sophistication from both ground and space-based instruments, and exoplanets are characterized with increasing precision. There is a class of particularly interesting exoplanets, falling in the habitable zone, which is defined as the area around a star where the planet is capable of supporting liquid water on its surface. Theoretical calculations also suggest that close-in exoplanets are more likely to have weaker planetary magnetic fields, especially in case of super earths. Such exoplanets are subjected to a high flux of Galactic Cosmic Rays (GCRs) due to their weak magnetic moments. GCRs are energetic particles of astrophysical origin, which strike the planetary atmosphere and produce secondary particles, including muons, which are highly penetrating. Some of these particles reach the planetary surface and contribute to the radiation dose. Along with the magnetic field, another factor governing the radiation dose is the depth of the planetary atmosphere. The higher the depth of the planetary atmosphere, the lower the flux of secondary particles will be on the surface. If the secondary particles are energetic enough, and their flux is sufficiently high, the radiation from muons can also impact the sub-surface regions, such as in the case of Mars. If the radiation dose is too high, the chances of sustaining a long-term biosphere on the planet are very low. We explore the dependence of the GCR induced radiation dose on the strength of the planetary magnetic field and its atmospheric depth, finding that the latter is the decisive factor for the protection of a planetary biosphere. △ Less

Submitted 16 September, 2013; v1 submitted 17 July, 2013; originally announced July 2013.

Comments: Accepted for publication in Astrobiology

Journal ref: Astrobiology, October 2013, 13(10): 910-919

Showing 51–77 of 77 results for author: Hariharan, B