Skip to main content

Showing 101–131 of 131 results for author: Ramanan, D

.
  1. arXiv:1812.02699  [pdf, other

    cs.CV

    Online Model Distillation for Efficient Video Inference

    Authors: Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian

    Abstract: High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, w… ▽ More

    Submitted 27 January, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

    Journal ref: ICCV 2019

  2. arXiv:1808.05174  [pdf, other

    cs.CV cs.GR cs.LG

    Recycle-GAN: Unsupervised Video Retargeting

    Authors: Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh

    Abstract: We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style. Our approach combines both spatial and temporal information along with adv… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: ECCV 2018; Please refer to project webpage for videos - http://www.cs.cmu.edu/~aayushb/Recycle-GAN

  3. arXiv:1807.00493  [pdf, other

    cs.CV

    Active Testing: An Efficient and Robust Framework for Estimating Accuracy

    Authors: Phuc Nguyen, Deva Ramanan, Charless Fowlkes

    Abstract: Much recent work on visual recognition aims to scale up learning to massive, noisily-annotated datasets. We address the problem of scaling- up the evaluation of such models to large-scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the test set and ignore the rest, or else correct errors in annotation as they are f… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: accepted to ICML 2018

  4. Cross-Domain Image Matching with Deep Feature Maps

    Authors: Bailey Kong, James Supancic, Deva Ramanan, Charless C. Fowlkes

    Abstract: We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features ex… ▽ More

    Submitted 1 October, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

  5. arXiv:1802.07427  [pdf, other

    cs.LG

    Active Learning with Partial Feedback

    Authors: Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan

    Abstract: While many active learning papers assume that the learner can simply ask for a label and receive it, real annotation often presents a mismatch between the form of a label (say, one among many classes), and the form of an annotation (typically yes/no binary feedback). To annotate examples corpora for multiclass classification, we might need to ask multiple yes/no questions, exploiting a label hiera… ▽ More

    Submitted 8 July, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: ICLR 2019

  6. arXiv:1802.01777  [pdf, other

    cs.CV

    Brute-Force Facial Landmark Analysis With A 140,000-Way Classifier

    Authors: Mengtian Li, Laszlo Jeni, Deva Ramanan

    Abstract: We propose a simple approach to visual alignment, focusing on the illustrative task of facial landmark estimation. While most prior work treats this as a regression problem, we instead formulate it as a discrete $K$-way classification task, where a classifier is trained to return one of $K$ discrete alignments. One crucial benefit of a classifier is the ability to report back a (softmax) distribut… ▽ More

    Submitted 14 February, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: In AAAI 2018, code can be find at https://github.com/mtli/BFFL

  7. arXiv:1711.10683  [pdf, other

    cs.CV

    Patch Correspondences for Interpreting Pixel-level CNNs

    Authors: Victor Fragoso, Chunhui Liu, Aayush Bansal, Deva Ramanan

    Abstract: We present compositional nearest neighbors (CompNN), a simple approach to visually interpreting distributed representations learned by a convolutional neural network (CNN) for pixel-level tasks (e.g., image synthesis and segmentation). It does so by reconstructing both a CNN's input and output image by copy-pasting corresponding patches from the training set with similar feature embeddings. To do… ▽ More

    Submitted 3 September, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

  8. arXiv:1711.01467  [pdf, other

    cs.CV

    Attentional Pooling for Action Recognition

    Authors: Rohit Girdhar, Deva Ramanan

    Abstract: We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while kee** the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architectu… ▽ More

    Submitted 29 December, 2017; v1 submitted 4 November, 2017; originally announced November 2017.

    Comments: In NIPS 2017. Project page: https://rohitgirdhar.github.io/AttentionalPoolingAction/

  9. arXiv:1708.05349  [pdf, other

    cs.CV cs.GR cs.LG

    PixelNN: Example-based Image Synthesis

    Authors: Aayush Bansal, Yaser Sheikh, Deva Ramanan

    Abstract: We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode co… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNN/

  10. arXiv:1708.02973  [pdf, other

    cs.CV

    Learning Policies for Adaptive Tracking with Deep Feature Cascades

    Authors: Chen Huang, Simon Lucey, Deva Ramanan

    Abstract: Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where… ▽ More

    Submitted 13 September, 2017; v1 submitted 9 August, 2017; originally announced August 2017.

    Comments: ICCV 2017 Spotlight, with Supplementary Material

  11. Unconstrained Face Detection and Open-Set Face Recognition Challenge

    Authors: Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult

    Abstract: Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses… ▽ More

    Submitted 25 September, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

    Comments: This is an ERRATA version of the paper originally presented at the International Joint Conference on Biometrics. Due to a bug in our evaluation code, the results of the participants changed. The final conclusion, however, is still the same

  12. arXiv:1707.07169  [pdf, other

    cs.CV

    Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset

    Authors: Zachary Pezzementi, Trenton Tabor, Peiyun Hu, Jonathan K. Chang, Deva Ramanan, Carl Wellington, Benzun P. Wisely Babu, Herman Herman

    Abstract: Person detection from vehicles has made rapid progress recently with the advent of multiple highquality datasets of urban and highway driving, yet no large-scale benchmark is available for the same problem in off-road or agricultural environments. Here we present the NREC Agricultural Person-Detection Dataset to spur research in these environments. It consists of labeled stereo video of people in… ▽ More

    Submitted 26 October, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

    Comments: Accepted to Journal of Field Robotics

  13. arXiv:1707.04991  [pdf, other

    cs.CV

    Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

    Authors: James Steven Supancic III, Deva Ramanan

    Abstract: We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typical… ▽ More

    Submitted 16 July, 2017; originally announced July 2017.

  14. Predictive-Corrective Networks for Action Detection

    Authors: Achal Dave, Olga Russakovsky, Deva Ramanan

    Abstract: While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. In this work, we rethink both the underlying network architecture and the stochastic learning paradigm f… ▽ More

    Submitted 12 December, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: Accepted to CVPR 2017. [v2]: Updated Multi-LSTM mAP on MultiTHUMOS (should be 29.7, was initially reported as 29.6). [Project URL]: http://www.achaldave.com/projects/predictive-corrective/

  15. arXiv:1704.02895  [pdf, other

    cs.CV

    ActionVLAD: Learning spatio-temporal aggregation for action classification

    Authors: Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

    Abstract: In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video. We do so by integrating state-of-the-art two-stream networks with learnable spatio-temporal feature aggregation. The resulting architecture is end-to-end trainable for whole-video classification. We investigate different… ▽ More

    Submitted 10 April, 2017; originally announced April 2017.

    Comments: Accepted to CVPR 2017. Project page: https://rohitgirdhar.github.io/ActionVLAD/

  16. arXiv:1703.06283  [pdf, other

    cs.CV cs.AI

    Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters

    Authors: Shiyu Huang, Deva Ramanan

    Abstract: As autonomous vehicles become an every-day reality, high-accuracy pedestrian detection is of paramount practical importance. Pedestrian detection is a highly researched topic with mature methods, but most datasets focus on common scenes of people engaged in typical walking poses on sidewalks. But performance is most crucial for dangerous scenarios, such as children playing in the street or people… ▽ More

    Submitted 10 April, 2017; v1 submitted 18 March, 2017; originally announced March 2017.

    Comments: To appear in CVPR 2017

  17. arXiv:1703.05884  [pdf, other

    cs.CV

    Need for Speed: A Benchmark for Higher Frame Rate Object Tracking

    Authors: Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan, Simon Lucey

    Abstract: In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine vis… ▽ More

    Submitted 21 March, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

  18. arXiv:1702.06506  [pdf, other

    cs.CV cs.LG cs.RO

    PixelNet: Representation of the pixels, by the pixels, and for the pixels

    Authors: Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

    Abstract: We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationall… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNet/. arXiv admin note: substantial text overlap with arXiv:1609.06694

  19. arXiv:1612.06524  [pdf, other

    cs.CV

    3D Human Pose Estimation = 2D Pose Estimation + Matching

    Authors: Ching-Hang Chen, Deva Ramanan

    Abstract: We explore 3D human pose estimation from a single RGB image. While many approaches try to directly predict 3D pose from image measurements, we explore a simple architecture that reasons through intermediate 2D pose predictions. Our approach is based on two key observations (1) Deep neural nets have revolutionized 2D pose estimation, producing accurate 2D predictions even for poses with self occlus… ▽ More

    Submitted 11 April, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: Demo code: https://github.com/flyawaychase/3DHumanPose

  20. arXiv:1612.04901  [pdf, other

    cs.CV

    Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

    Authors: Vivek Krishnan, Deva Ramanan

    Abstract: We consider the task of visual net surgery, in which a CNN can be reconfigured without extra data to recognize novel concepts that may be omitted from the training set. While most prior work make use of linguistic cues for such "zero-shot" learning, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. To this end, we… ▽ More

    Submitted 14 December, 2016; originally announced December 2016.

  21. arXiv:1612.04402  [pdf, other

    cs.CV

    Finding Tiny Faces

    Authors: Peiyun Hu, Deva Ramanan

    Abstract: Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentall… ▽ More

    Submitted 15 April, 2017; v1 submitted 13 December, 2016; originally announced December 2016.

    Comments: CVPR 2017

  22. arXiv:1609.06694  [pdf, other

    cs.CV cs.LG

    PixelNet: Towards a General Pixel-level Architecture

    Authors: Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

    Abstract: We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally ef… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

  23. arXiv:1603.09439  [pdf, other

    cs.CV

    The Open World of Micro-Videos

    Authors: Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, Deva Ramanan

    Abstract: Micro-videos are six-second videos popular on social media networks with several unique properties. Firstly, because of the authoring process, they contain significantly more diversity and narrative structure than existing collections of video "snippets". Secondly, because they are often captured by hand-held mobile cameras, they contain specialized viewpoints including third-person, egocentric, a… ▽ More

    Submitted 31 March, 2016; v1 submitted 30 March, 2016; originally announced March 2016.

  24. arXiv:1507.05699  [pdf, other

    cs.CV

    Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians

    Authors: Peiyun Hu, Deva Ramanan

    Abstract: Convolutional neural nets (CNNs) have demonstrated remarkable performance in recent history. Such approaches tend to work in a unidirectional bottom-up feed-forward fashion. However, practical experience and biological evidence tells us that feedback plays a crucial role, particularly for detailed spatial understanding tasks. This work explores bidirectional architectures that also reason with top… ▽ More

    Submitted 4 May, 2016; v1 submitted 21 July, 2015; originally announced July 2015.

    Comments: To appear in CVPR 2016

  25. arXiv:1505.05232  [pdf, other

    cs.CV

    Multi-scale recognition with DAG-CNNs

    Authors: Songfan Yang, Deva Ramanan

    Abstract: We explore multi-scale convolutional neural nets (CNNs) for image classification. Contemporary approaches extract features from a single output layer. By extracting features from multiple layers, one can simultaneously reason about high, mid, and low-level features during classification. The resulting multi-scale architecture can itself be seen as a feed-forward model that is structured as a direc… ▽ More

    Submitted 19 May, 2015; originally announced May 2015.

  26. arXiv:1504.06378  [pdf, other

    cs.CV

    Depth-based hand pose estimation: methods, data, and challenges

    Authors: James Steven Supancic III, Gregory Rogez, Yi Yang, Jamie Shotton, Deva Ramanan

    Abstract: Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation co… ▽ More

    Submitted 6 May, 2015; v1 submitted 23 April, 2015; originally announced April 2015.

  27. Do We Need More Training Data?

    Authors: Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan

    Abstract: Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defi… ▽ More

    Submitted 4 March, 2015; originally announced March 2015.

  28. arXiv:1412.0065  [pdf, other

    cs.CV

    3D Hand Pose Detection in Egocentric RGB-D Images

    Authors: Gregory Rogez, James S. Supancic III, Maryam Khademi, Jose Maria Martinez Montiel, Deva Ramanan

    Abstract: We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved… ▽ More

    Submitted 28 November, 2014; originally announced December 2014.

    Comments: 14 pages, 15 figures, extended version of the corresponding ECCV workshop paper, submitted to International Journal of Computer Vision

  29. arXiv:1412.0060  [pdf, other

    cs.CV

    Egocentric Pose Recognition in Four Lines of Code

    Authors: Gregory Rogez, James S. Supancic III, Deva Ramanan

    Abstract: We tackle the problem of estimating the 3D pose of an individual's upper limbs (arms+hands) from a chest mounted depth-camera. Importantly, we consider pose estimation during everyday interactions with objects. Past work shows that strong pose+viewpoint priors and depth-based features are crucial for robust performance. In egocentric views, hands and arms are observable within a well defined volum… ▽ More

    Submitted 28 November, 2014; originally announced December 2014.

    Comments: 9 pages, 10 figures

  30. arXiv:1405.0312  [pdf, other

    cs.CV

    Microsoft COCO: Common Objects in Context

    Authors: Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

    Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object lo… ▽ More

    Submitted 20 February, 2015; v1 submitted 1 May, 2014; originally announced May 2014.

    Comments: 1) updated annotation pipeline description and figures; 2) added new section describing datasets splits; 3) updated author list

  31. arXiv:1312.1743  [pdf, other

    cs.LG cs.CV

    Dual coordinate solvers for large-scale structural SVMs

    Authors: Deva Ramanan

    Abstract: This manuscript describes a method for training linear SVMs (including binary SVMs, SVM regression, and structural SVMs) from large, out-of-core training datasets. Current strategies for large-scale learning fall into one of two camps; batch algorithms which solve the learning problem given a finite datasets, and online algorithms which can process out-of-core datasets. The former typically requir… ▽ More

    Submitted 13 June, 2014; v1 submitted 5 December, 2013; originally announced December 2013.