Skip to main content

Showing 1–50 of 68 results for author: Hariharan, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11819  [pdf, other

    cs.CV

    MegaScenes: Scene-Level View Synthesis at Scale

    Authors: Joseph Tung, Gene Chou, Ruo** Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely

    Abstract: Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Our project page is at https://megascenes.github.io

  2. arXiv:2405.16034  [pdf, other

    cs.CV

    DiffuBox: Refining 3D Object Detection with Point Diffusion

    Authors: Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.14841  [pdf, other

    cs.CV

    Learning to Detect and Segment Mobile Objects from Unlabeled Videos

    Authors: Yihong Sun, Bharath Hariharan

    Abstract: Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised object segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2404.05139  [pdf, other

    cs.CV cs.RO

    Better Monocular 3D Detectors with LiDAR from the Past

    Authors: Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Z Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q Weinberger

    Abstract: Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In th… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICRA 2024. The code can be found at https://github.com/YurongYou/AsyncDepth

  5. arXiv:2312.06960  [pdf, other

    cs.CV cs.LG

    Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

    Authors: Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala

    Abstract: We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  6. arXiv:2312.05984  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Accurate Differential Operators for Hybrid Neural Fields

    Authors: Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan

    Abstract: Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybri… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  7. arXiv:2310.19080  [pdf, other

    cs.CV

    Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

    Authors: Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper,… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

  8. arXiv:2310.18887  [pdf, other

    cs.CV

    Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes

    Authors: Yihong Sun, Bharath Hariharan

    Abstract: Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  9. arXiv:2310.14592  [pdf, other

    cs.CV cs.LG

    Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

    Authors: Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

    Abstract: Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equip** it with valuable semantic cues. To… ▽ More

    Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  10. arXiv:2309.16668  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    RealFill: Reference-Driven Generation for Authentic Image Completion

    Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

    Abstract: Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of a… ▽ More

    Submitted 14 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH 2024 (Journal Track). Project page: https://realfill.github.io

  11. arXiv:2309.12140  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features

    Authors: Travis Zhang, Katie Luo, Cheng Perng Phoo, Yurong You, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detector… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  12. arXiv:2309.02420  [pdf, other

    cs.CV

    Doppelgangers: Learning to Disambiguate Images of Similar Structures

    Authors: Ruo** Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Hariharan, Noah Snavely

    Abstract: We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce er… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Published in ICCV 2023 (Oral); Project page: http://doppelgangers-3d.github.io/

  13. arXiv:2306.05422  [pdf, other

    cs.CV

    Tracking Everything Everywhere All at Once

    Authors: Qianqian Wang, Yen-Yu Chang, Ruo** Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, Noah Snavely

    Abstract: We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbe… ▽ More

    Submitted 12 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: ICCV 2023

  14. arXiv:2306.03881  [pdf, other

    cs.CV

    Emergent Correspondence from Image Diffusion

    Authors: Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, Bharath Hariharan

    Abstract: Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images.… ▽ More

    Submitted 6 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. Project page: https://diffusionfeatures.github.io

  15. arXiv:2304.12314  [pdf, other

    cs.CV cs.AI cs.LG

    Distilling from Similar Tasks for Transfer Learning on a Budget

    Authors: Kenneth Borup, Cheng Perng Phoo, Bharath Hariharan

    Abstract: We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a comput… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: 11 pages

  16. arXiv:2303.15286  [pdf, other

    cs.CV cs.LG

    Unsupervised Adaptation from Repeated Traversals for Autonomous Driving

    Authors: Yurong You, Cheng Perng Phoo, Katie Z Luo, Travis Zhang, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. Whi… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by NeurIPS 2022. Code is available at https://github.com/YurongYou/Rote-DA

  17. arXiv:2302.04862  [pdf, other

    cs.CV cs.LG

    Polynomial Neural Fields for Subband Decomposition and Manipulation

    Authors: Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

    Abstract: Neural fields have emerged as a new paradigm for representing signals, thanks to their ability to do it compactly while being easy to optimize. In most applications, however, neural fields are treated like black boxes, which precludes many signal manipulation tasks. In this paper, we propose a new class of neural fields called polynomial neural fields (PNFs). The key advantage of a PNF is that it… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to NeurIPS 2022

  18. arXiv:2209.11673  [pdf, other

    cs.CV cs.RO

    Image-to-Image Translation for Autonomous Driving from Coarsely-Aligned Image Pairs

    Authors: Youya Xia, Josephine Monica, Wei-Lun Chao, Bharath Hariharan, Kilian Q Weinberger, Mark Campbell

    Abstract: A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired imag… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Submitted to the International Conference on Robotics and Automation (ICRA) 2023

  19. arXiv:2208.01166  [pdf, other

    cs.CV

    Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

    Authors: Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, Jose Nino, Junan Chen, Josephine Monica, Xiangyu Chen, Katie Luo, Yan Wang, Marc Emond, Wei-Lun Chao, Bharath Hariharan, Kilian Q. Weinberger, Mark Campbell

    Abstract: Advances in perception for self-driving cars have accelerated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety requirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new data… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted by CVPR 2022

  20. arXiv:2207.03398  [pdf, other

    cs.CV cs.AI cs.LG

    Diagnosing and Remedying Shot Sensitivity with Cosine Few-Shot Learners

    Authors: Davis Wertheimer, Luming Tang, Bharath Hariharan

    Abstract: Few-shot recognition involves training an image classifier to distinguish novel concepts at test time using few examples (shot). Existing approaches generally assume that the shot number at test time is known in advance. This is not realistic, and the performance of a popular and foundational method has been shown to suffer when train and test shots do not match. We conduct a systematic empirical… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

  21. arXiv:2204.07030  [pdf, other

    cs.CV cs.LG

    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification

    Authors: Samar Khanna, Bram Wallace, Kavita Bala, Bharath Hariharan

    Abstract: Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire c… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  22. arXiv:2203.15882  [pdf, other

    cs.CV

    Learning to Detect Mobile Objects from LiDAR Scans Without Labels

    Authors: Yurong You, Katie Z Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth.… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR 2022. Code is available at https://github.com/YurongYou/MODEST

  23. arXiv:2203.12119  [pdf, other

    cs.CV

    Visual Prompt Tuning

    Authors: Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, Ser-Nam Lim

    Abstract: The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amo… ▽ More

    Submitted 20 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: ECCV2022

  24. arXiv:2203.11405  [pdf, other

    cs.CV

    Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

    Authors: Yurong You, Katie Z Luo, Xiangyu Chen, Junan Chen, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: Self-driving cars must detect vehicles, pedestrians, and other traffic participants accurately to operate safely. Small, far-away, or highly occluded objects are particularly challenging because there is limited information in the LiDAR point clouds for detecting them. To address this challenge, we leverage valuable information from the past: in particular, data collected in past traversals of the… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted by ICLR 2022. Code is available at https://github.com/YurongYou/Hindsight

  25. arXiv:2203.08414  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Unsupervised Semantic Segmentation by Distilling Feature Correspondences

    Authors: Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman

    Abstract: Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to sep… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

  26. arXiv:2202.13237  [pdf, other

    cs.CV cs.MA cs.RO

    Orientation-Discriminative Feature Representation for Decentralized Pedestrian Tracking

    Authors: Vikram Shree, Carlos Diaz-Ruiz, Chang Liu, Bharath Hariharan, Mark Campbell

    Abstract: This paper focuses on the problem of decentralized pedestrian tracking using a sensor network. Traditional works on pedestrian tracking usually use a centralized framework, which becomes less practical for robotic applications due to limited communication bandwidth. Our paper proposes a communication-efficient, orientation-discriminative feature representation to characterize pedestrian appearance… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: 8 pages, 4 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems

  27. arXiv:2202.00659  [pdf, other

    cs.CV cs.GR

    Stay Positive: Non-Negative Image Synthesis for Augmented Reality

    Authors: Katie Luo, Guandao Yang, Wenqi Xian, Harald Haraldsson, Bharath Hariharan, Serge Belongie

    Abstract: In applications such as optical see-through and projector augmented reality, producing images amounts to solving non-negative image generation, where one can only add light to an existing image. Most image generation methods, however, are ill-suited to this problem setting, as they make the assumption that one can assign arbitrary color to each pixel. In fact, naive application of existing methods… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10050-10060

  28. arXiv:2108.10967  [pdf, other

    cs.CV

    Field-Guide-Inspired Zero-Shot Learning

    Authors: Utkarsh Mall, Bharath Hariharan, Kavita Bala

    Abstract: Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  29. arXiv:2104.13530  [pdf, other

    cs.CV

    Extreme Rotation Estimation using Dense Correlation Volumes

    Authors: Ruo** Cai, Bharath Hariharan, Noah Snavely, Hadar Averbuch-Elor

    Abstract: We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting, where the images have little or no overlap. We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship, such as light source directions, vanishing points, and symmetries present in the scene. We propose a network design that can automati… ▽ More

    Submitted 19 July, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Published in CVPR 2021; Project page: https://ruo**cai.github.io/ExtremeRotation/

  30. arXiv:2103.17070  [pdf, other

    cs.CV

    PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering

    Authors: Jang Hyun Cho, Utkarsh Mall, Kavita Bala, Bharath Hariharan

    Abstract: We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, sol… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  31. arXiv:2103.14198  [pdf, other

    cs.CV

    Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection

    Authors: Yurong You, Carlos Andres Diaz-Ruiz, Yan Wang, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q Weinberger

    Abstract: Self-driving cars must detect other vehicles and pedestrians in 3D to plan safe routes and avoid collisions. State-of-the-art 3D object detectors, based on deep learning, have shown promising accuracy but are prone to over-fit to domain idiosyncrasies, making them fail in new environments -- a serious problem if autonomous vehicles are meant to operate freely. In this paper, we propose a novel lea… ▽ More

    Submitted 10 July, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted by ICRA 2022

  32. arXiv:2012.01506  [pdf, other

    cs.CV cs.LG

    Few-Shot Classification with Feature Map Reconstruction Networks

    Authors: Davis Wertheimer, Luming Tang, Bharath Hariharan

    Abstract: In this paper we reformulate few-shot classification as a reconstruction problem in latent space. The ability of the network to reconstruct a query feature map from support features of a given class predicts membership of the query in that class. We introduce a novel mechanism for few-shot classification by regressing directly from support features to query features in closed form, without introdu… ▽ More

    Submitted 27 April, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR 2021. Updated to match most recent version. Code is available at https://github.com/Tsingularity/FRN

  33. arXiv:2011.13026  [pdf, other

    cs.CV cs.LG

    Augmentation-Interpolative AutoEncoders for Unsupervised Few-Shot Image Generation

    Authors: Davis Wertheimer, Omid Poursaeed, Bharath Hariharan

    Abstract: We aim to build image generation models that generalize to new domains from few examples. To this end, we first investigate the generalization properties of classic image generators, and discover that autoencoders generalize extremely well to new domains, even when trained on highly constrained data. We leverage this insight to produce a robust, unsupervised few-shot image generation algorithm, an… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  34. arXiv:2010.07734  [pdf, other

    cs.CV cs.AI cs.LG

    Self-training for Few-shot Transfer Across Extreme Task Differences

    Authors: Cheng Perng Phoo, Bharath Hariharan

    Abstract: Most few-shot learning techniques are pre-trained on a large, labeled "base dataset". In problem domains where such large labeled datasets are not available for pre-training (e.g., X-ray, satellite images), one must resort to pre-training in a different "source" problem domain (e.g., ImageNet), which can be very different from the desired target task. Traditional few-shot and transfer learning tec… ▽ More

    Submitted 17 March, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at ICLR 2021(oral)

  35. arXiv:2008.06520  [pdf, other

    cs.CV cs.LG

    Learning Gradient Fields for Shape Generation

    Authors: Ruo** Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, Bharath Hariharan

    Abstract: In this work, we propose a novel technique to generate shapes from point cloud data. A point cloud can be viewed as samples from a distribution of 3D points whose density is concentrated near the surface of the shape. Point cloud generation thus amounts to moving randomly sampled points to high-density areas. We generate point clouds by performing stochastic gradient ascent on an unnormalized prob… ▽ More

    Submitted 18 August, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: Published in ECCV 2020 (Spotlight); Project page: https://www.cs.cornell.edu/~ruo**/ShapeGF/

  36. arXiv:2007.03085  [pdf, other

    cs.CV cs.LG

    Wasserstein Distances for Stereo Disparity Estimation

    Authors: Divyansh Garg, Yan Wang, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao

    Abstract: Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issu… ▽ More

    Submitted 29 March, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Accepted to NeurIPS 2020 (spotlight)

  37. arXiv:2005.08139  [pdf, other

    cs.CV

    Train in Germany, Test in The USA: Making 3D Object Detectors Generalize

    Authors: Yan Wang, Xiangyu Chen, Yurong You, Li Erran, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao

    Abstract: In the domain of autonomous driving, deep learning has substantially improved the 3D object detection accuracy for LiDAR and stereo camera data alike. While deep networks are great at generalization, they are also notorious to over-fit to all kinds of spurious artifacts, such as brightness, car sizes and models, that may appear consistently throughout the data. In fact, most datasets for autonomou… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Accepted to 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020)

  38. arXiv:2004.13324  [pdf, other

    cs.CV

    Learning Feature Descriptors using Camera Pose Supervision

    Authors: Qianqian Wang, Xiaowei Zhou, Bharath Hariharan, Noah Snavely

    Abstract: Recent research on learned visual descriptors has shown promising improvements in correspondence estimation, a key component of many 3D vision tasks. However, existing descriptor learning frameworks typically require ground-truth correspondences between feature points for training, which are challenging to acquire at scale. In this paper we propose a novel weakly-supervised framework that can lear… ▽ More

    Submitted 29 January, 2024; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: ECCV 2020 (oral)

  39. arXiv:2004.12276  [pdf, other

    cs.CV cs.LG eess.IV

    Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

    Authors: Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie

    Abstract: In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes). The proposed task requires both localizing an object and describing its properties. To illustrate the various aspects of this task, we focus on th… ▽ More

    Submitted 18 July, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: eccv2020

  40. arXiv:2004.11992  [pdf, other

    cs.CV cs.LG stat.ML

    Extending and Analyzing Self-Supervised Learning Across Domains

    Authors: Bram Wallace, Bharath Hariharan

    Abstract: Self-supervised representation learning has achieved impressive results in recent years, with experiments primarily coming on ImageNet or other similarly large internet imagery datasets. There has been little to no work with these methods on other smaller domains, such as satellite, textural, or biological imagery. We experiment with several popular methods on an unprecedented variety of domains.… ▽ More

    Submitted 17 August, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

  41. arXiv:2004.03080  [pdf, other

    cs.CV eess.IV

    End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

    Authors: Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao

    Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stere… ▽ More

    Submitted 14 May, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020)

  42. arXiv:2004.00705  [pdf, other

    cs.CV

    Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition

    Authors: Luming Tang, Davis Wertheimer, Bharath Hariharan

    Abstract: Few-shot, fine-grained classification requires a model to learn subtle, fine-grained distinctions between different classes (e.g., birds) based on a few images alone. This requires a remarkable degree of invariance to pose, articulation and background. A solution is to use pose-normalized representations: first localize semantic parts in each image, and then describe images by characterizing the a… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  43. arXiv:1910.13955  [pdf, other

    eess.IV cs.CV cs.RO

    LDLS: 3-D Object Segmentation Through Label Diffusion From 2-D Images

    Authors: Brian H. Wang, Wei-Lun Chao, Yan Wang, Bharath Hariharan, Kilian Q. Weinberger, Mark Campbell

    Abstract: Object segmentation in three-dimensional (3-D) point clouds is a critical task for robots capable of 3-D perception. Despite the impressive performance of deep learning-based approaches on object segmentation in 2-D images, deep learning has not been applied nearly as successfully for 3-D point cloud segmentation. Deep networks generally require large amounts of labeled training data, which are re… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted for publication in IEEE Robotics and Automation Letters with presentation at IROS 2019

  44. arXiv:1910.03560  [pdf, other

    cs.CV cs.LG

    When Does Self-supervision Improve Few-shot Learning?

    Authors: Jong-Chyi Su, Subhransu Maji, Bharath Hariharan

    Abstract: We investigate the role of self-supervised learning (SSL) in the context of few-shot learning. Although recent research has shown the benefits of SSL on large unlabeled datasets, its utility on small datasets is relatively unexplored. We find that SSL reduces the relative error rate of few-shot meta-learners by 4%-27%, even when the datasets are small and only utilizing images within the datasets.… ▽ More

    Submitted 30 July, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: ECCV 2020 camera ready. This is an updated version of "Boosting Supervision with Self-Supervision for Few-shot Learning" arXiv:1906.07079

  45. arXiv:1910.01348  [pdf, other

    cs.LG cs.CV

    On the Efficacy of Knowledge Distillation

    Authors: Jang Hyun Cho, Bharath Hariharan

    Abstract: In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. Starting with the observation that more accurate teachers often don't make good teachers, we attempt to tease apart the factors that affect knowledge distillation performance. We find crucially that larger models do not often make better teachers. We sh… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: 13 pages, including Appendix

    Journal ref: ICCV 2019

  46. arXiv:1909.01205  [pdf, other

    cs.CV cs.LG stat.ML

    Few-Shot Generalization for Single-Image 3D Reconstruction via Priors

    Authors: Bram Wallace, Bharath Hariharan

    Abstract: Recent work on single-view 3D reconstruction shows impressive results, but has been restricted to a few fixed categories where extensive training data is available. The problem of generalizing these models to new classes with limited training data is largely open. To address this problem, we present a new model architecture that reframes single-view 3D reconstruction as learnt, category agnostic r… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: To appear in ICCV 2019

  47. arXiv:1908.11412  [pdf, other

    cs.CV

    GeoStyle: Discovering Fashion Trends and Events

    Authors: Utkarsh Mall, Kevin Matzen, Bharath Hariharan, Noah Snavely, Kavita Bala

    Abstract: Understanding fashion styles and trends is of great potential interest to retailers and consumers alike. The photos people upload to social media are a historical and public data source of how people dress across the world and at different times. While we now have tools to automatically recognize the clothing and style attributes of what people are wearing in these photographs, we lack the ability… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: Accepted in ICCV 2019

  48. arXiv:1906.12320  [pdf, other

    cs.CV cs.LG

    PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

    Authors: Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan

    Abstract: As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to… ▽ More

    Submitted 2 September, 2019; v1 submitted 28 June, 2019; originally announced June 2019.

    Comments: Published in ICCV 2019

  49. arXiv:1906.07079  [pdf, other

    cs.CV cs.LG stat.ML

    Boosting Supervision with Self-Supervision for Few-shot Learning

    Authors: Jong-Chyi Su, Subhransu Maji, Bharath Hariharan

    Abstract: We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision.… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  50. arXiv:1906.06310  [pdf, other

    cs.CV

    Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

    Authors: Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substanti… ▽ More

    Submitted 15 February, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: Accepted to International Conference on Learning Representations (ICLR) 2020