Skip to main content

Showing 1–50 of 100 results for author: Chandraker, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13779  [pdf, other

    cs.CV cs.AI cs.LG

    Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data

    Authors: Tarun Kalluri, Jihyeon Lee, Kihyuk Sohn, Sahil Singla, Manmohan Chandraker, Joseph Xu, Jeremiah Liu

    Abstract: We present a simple and efficient method to leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images. While significant recent advances have resulted in improved techniques for damage assessment using aerial or satellite imagery, they still suffer from poor robustness to domains where manual labeled data is… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2405.06063  [pdf, other

    cs.LG

    A Minimalist Prompt for Zero-Shot Policy Learning

    Authors: Meng Song, Xuezhi Wang, Tanay Biradar, Yao Qin, Manmohan Chandraker

    Abstract: Transformer-based methods have exhibited significant generalization ability when prompted with target-domain demonstrations or example solutions during inference. Although demonstrations, as a way of task specification, can capture rich information that may be hard to specify by language, it remains unclear what information is extracted from the demonstrations to help generalization. Moreover, ass… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.02781  [pdf, other

    cs.CV

    Instantaneous Perception of Moving Objects in 3D

    Authors: Di Liu, Bingbing Zhuang, Dimitris N. Metaxas, Manmohan Chandraker

    Abstract: The perception of 3D motion of surrounding traffic participants is crucial for driving safety. While existing works primarily focus on general large motions, we contend that the instantaneous detection and quantification of subtle motions is equally important as they indicate the nuances in driving behavior that may be safety critical, such as behaviors near a stop sign of parking positions. We de… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  4. arXiv:2405.00900  [pdf, other

    cs.CV

    LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

    Authors: Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker

    Abstract: Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often… ▽ More

    Submitted 4 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: CVPR2024 Highlights

  5. arXiv:2404.15244  [pdf, other

    cs.CV cs.LG

    Efficient Transformer Encoders for Mask2Former-style models

    Authors: Manyi Yao, Abhishek Aich, Yumin Suh, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker

    Abstract: Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the curr… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  6. arXiv:2404.14657  [pdf, other

    cs.CV

    Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

    Authors: Abhishek Aich, Yumin Suh, Samuel Schulter, Manmohan Chandraker

    Abstract: A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses ~50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level re… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.04627  [pdf, other

    cs.CV

    Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

    Authors: Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

    Abstract: Visual program synthesis is a promising approach to exploit the reasoning abilities of large language models for compositional computer vision tasks. Previous work has used few-shot prompting with frozen LLMs to synthesize visual programs. Training an LLM to write better visual programs is an attractive prospect, but it is unclear how to accomplish this. No dataset of visual programs for training… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  8. arXiv:2403.17373  [pdf, other

    cs.CV cs.AI cs.LG

    AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

    Authors: Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker

    Abstract: Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage r… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR-2024

  9. arXiv:2403.05535  [pdf, other

    cs.CV cs.AI cs.CL

    Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

    Authors: Tarun Kalluri, Bodhisattwa Prasad Majumder, Manmohan Chandraker

    Abstract: We introduce LaGTran, a novel framework that utilizes text supervision to guide robust transfer of discriminative knowledge from labeled source to unlabeled target data with domain gaps. While unsupervised adaptation methods have been established to address this problem, they show limitations in handling challenging domain shifts due to their exclusive operation within the pixel-space. Motivated b… ▽ More

    Submitted 5 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: ICML 2024 Camera-Ready. Project Page and Code: https://tarun005.github.io/lagtran/

  10. arXiv:2401.09416  [pdf, other

    cs.CV cs.GR

    TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

    Authors: Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, Zhengqin Li

    Abstract: We present TextureDreamer, a novel image-guided texture synthesis method to transfer relightable textures from a small number of input images (3 to 5) to target 3D shapes across arbitrary categories. Texture creation is a pivotal challenge in vision and graphics. Industrial companies hire experienced artists to manually craft textures for 3D assets. Classical methods require densely sampled views… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Project page: https://texturedreamer.github.io

  11. arXiv:2401.02411  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

    Authors: Alex Trevithick, Matthew Chan, Towaki Takikawa, Umar Iqbal, Shalini De Mello, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano

    Abstract: 3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries of scenes from collections of 2D images via neural volume rendering. Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with p… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: See our project page: https://research.nvidia.com/labs/nxp/wysiwyg/

  12. arXiv:2401.00391  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Controllable Adversaries

    Authors: Wei-Jer Chang, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, Manmohan Chandraker

    Abstract: Evaluating the performance of autonomous vehicle planning algorithms necessitates simulating long-tail safety-critical traffic scenarios. However, traditional methods for generating such scenarios often fall short in terms of controllability and realism and neglect the dynamics of agent interactions. To mitigate these limitations, we introduce SAFE-SIM, a novel diffusion-based controllable closed-… ▽ More

    Submitted 15 June, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Under Review

    ACM Class: I.2.9; I.2.6

  13. arXiv:2401.00125  [pdf, other

    cs.AI cs.CV

    LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

    Authors: S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker

    Abstract: Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that requir… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 15 pages, 8 figures, 7 tables

  14. arXiv:2401.00094  [pdf, other

    cs.CV

    Generating Enhanced Negatives for Training Language-Based Object Detectors

    Authors: Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

    Abstract: The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: Accepted to CVPR 2024. The supplementary document included

  15. arXiv:2310.17050  [pdf, other

    cs.CV

    Exploring Question Decomposition for Zero-Shot VQA

    Authors: Zaid Khan, Vijay Kumar BG, Samuel Schulter, Manmohan Chandraker, Yun Fu

    Abstract: Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy for VQA to overcome this limitation. We probe the ability of recently developed large vision-language models to use human-written decompositions and produce their… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 Camera Ready

  16. arXiv:2310.07361  [pdf, other

    cs.CV

    Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

    Authors: Mateusz Michalkiewicz, Masoud Faraki, Xiang Yu, Manmohan Chandraker, Mahsa Baktashmotlagh

    Abstract: Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. To compensate for the over-parameterized models, numerous regularization techniques have been introduced such as those based on dropout. While these methods achieve significant improvements on classical benchmarks such as ImageNet, their performance diminishes with the introduction of domain shif… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Paper was accepted to ICCV 2023

  17. arXiv:2308.11744  [pdf, other

    cs.CV

    Efficient Controllable Multi-Task Architectures

    Authors: Abhishek Aich, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker, Yumin Suh

    Abstract: We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encod… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  18. arXiv:2308.09865  [pdf, other

    cs.CV cs.GR

    A Theory of Topological Derivatives for Inverse Rendering of Geometry

    Authors: Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi

    Abstract: We introduce a theoretical framework for differentiable surface evolution that allows discrete topology changes through the use of topological derivatives for variational optimization of image functionals. While prior methods for inverse rendering of geometry rely on silhouette gradients for topology changes, such signals are sparse. In contrast, our theory derives topological derivatives that rel… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 23; Project Page at https://ishit.github.io/td/

  19. arXiv:2308.06412  [pdf, other

    cs.CV

    Taming Self-Training for Open-Vocabulary Object Detection

    Authors: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

    Abstract: Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distr… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted to CVPR 2024. The supplementary document included

  20. arXiv:2306.03932  [pdf, other

    cs.CV

    Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

    Authors: Zaid Khan, Vijay Kumar BG, Samuel Schulter, Xiang Yu, Yun Fu, Manmohan Chandraker

    Abstract: Finetuning a large vision language model (VLM) on a target dataset after large scale pretraining is a dominant paradigm in visual question answering (VQA). Datasets for specialized tasks such as knowledge-based VQA or VQA in non natural-image domains are orders of magnitude smaller than those for general-purpose VQA. While collecting additional labels for specialized tasks or domains can be challe… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: CVPR 2023

  21. arXiv:2305.17763  [pdf, other

    cs.CV

    NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization

    Authors: Zhixiang Min, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Enrique Dunn, Manmohan Chandraker

    Abstract: Monocular 3D object localization in driving scenes is a crucial task, but challenging due to its ill-posed nature. Estimating 3D coordinates for each pixel on the object surface holds great potential as it provides dense 2D-3D geometric constraints for the underlying PnP problem. However, high-quality ground truth supervision is not available in driving scenes due to sparsity and various artifacts… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Paper was accepted to CVPR 2023

  22. arXiv:2305.10675  [pdf, other

    cs.CV

    Tuned Contrastive Learning

    Authors: Chaitanya Animesh, Manmohan Chandraker

    Abstract: In recent times, contrastive learning based loss functions have become increasingly popular for visual self-supervised representation learning owing to their state-of-the-art (SOTA) performance. Most of the modern contrastive learning methods generalize only to one positive and multiple negatives per anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss, extends self-supervised c… ▽ More

    Submitted 30 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Preprint Version

  23. arXiv:2305.04374  [pdf, other

    cs.CV

    Spatiotemporally Consistent HDR Indoor Lighting Estimation

    Authors: Zhengqin Li, Li Yu, Mikhail Okunev, Manmohan Chandraker, Zhao Dong

    Abstract: We propose a physically-motivated deep learning framework to solve a general version of the challenging indoor lighting estimation problem. Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position. Particularly, when the input is an LDR video sequence, our framework not only progressively refines the lighting prediction as it sees mor… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

  24. arXiv:2305.02310  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Real-Time Radiance Fields for Single-Image Portrait View Synthesis

    Authors: Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano

    Abstract: We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher q… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Project page: https://research.nvidia.com/labs/nxp/lp3d/

  25. arXiv:2304.05669  [pdf, other

    cs.CV cs.GR

    Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation

    Authors: Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi

    Abstract: Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. Our Factorized Inverse Path Tracing (FIPT) addresses these challenges by using a factored light transport formu… ▽ More

    Submitted 23 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Updated experiment results; modified real-world sections

  26. arXiv:2303.15443  [pdf, other

    cs.CV cs.AI cs.LG

    GeoNet: Benchmarking Unsupervised Adaptation across Geographies

    Authors: Tarun Kalluri, Wangdong Xu, Manmohan Chandraker

    Abstract: In recent years, several efforts have been aimed at improving the robustness of vision models to domains and environments unseen during training. An important practical problem pertains to models deployed in a new geography that is under-represented in the training dataset, posing a direct challenge to fair and inclusive computer vision. In this paper, we study the problem of geographic robustness… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 Camera Ready. Project Page: https://tarun005.github.io/GeoNet

  27. arXiv:2303.05503  [pdf, other

    cs.CV cs.AI cs.LG

    Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

    Authors: Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran

    Abstract: Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world S… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: L3D-IVU Workshop, CVPR 2024. Project page: https://tarun005.github.io/UDOS

  28. arXiv:2210.15908  [pdf, other

    cs.CV cs.RO

    Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object Transport

    Authors: Sriram Narayanan, Dinesh Jayaraman, Manmohan Chandraker

    Abstract: We address key challenges in long-horizon embodied exploration and navigation by proposing a new object transport task and a novel modular framework for temporally extended navigation. Our first contribution is the design of a novel Long-HOT environment focused on deep exploration and long-horizon planning where the agent is required to efficiently find and pick up target objects to be carried and… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  29. arXiv:2210.12878  [pdf, other

    cs.CV

    IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes

    Authors: Shubham Dokania, A. H. Abdul Hafez, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar

    Abstract: Autonomous driving and assistance systems rely on annotated data from traffic and road scenarios to model and learn the various object relations in complex real-world scenarios. Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios and adapt to different situations. Currently, existing datasets, while large-scale, lack su… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: 10 pages, 8 figures, 5 tables, Accepted in Winter Conference on Applications of Computer Vision (WACV 2023)

  30. arXiv:2208.07943  [pdf, other

    cs.CV

    TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

    Authors: Shubham Dokania, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar

    Abstract: High-quality structured data with rich annotations are critical components in intelligent vehicle systems dealing with road scenes. However, data curation and annotation require intensive investments and yield low-diversity scenarios. The recently growing interest in synthetic data raises questions about the scope of improvement in such systems and the amount of manual work still required to produ… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 18 pages, 5 figures, Accepted in European Conference on Computer Vision (ECCV 2022)

  31. arXiv:2208.02804  [pdf, other

    cs.CV cs.LG

    Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels

    Authors: Tarun Kalluri, Manmohan Chandraker

    Abstract: Domain adaptation for semantic segmentation across datasets consisting of the same categories has seen several recent successes. However, a more general scenario is when the source and target datasets correspond to non-overlap** label spaces. For example, categories in segmentation datasets change vastly depending on the type of environment or application, yet share many valuable semantic relati… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: Accepted to L3D workshop at CVPR 2022

  32. arXiv:2207.13339  [pdf, other

    cs.CV

    ALBench: A Framework for Evaluating Active Learning in Object Detection

    Authors: Zhanpeng Feng, Shiliang Zhang, Rinyoichi Takezoe, Wenze Hu, Manmohan Chandraker, Li-Jia Li, Vijay K. Narayanan, Xiaoyu Wang

    Abstract: Active learning is an important technology for automated machine learning systems. In contrast to Neural Architecture Search (NAS) which aims at automating neural network architecture design, active learning aims at automating training data selection. It is especially critical for training a long-tailed task, in which positive samples are sparsely distributed. Active learning alleviates the expens… ▽ More

    Submitted 24 November, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

  33. arXiv:2207.12389  [pdf, other

    cs.CV cs.AI cs.LG

    MemSAC: Memory Augmented Sample Consistency for Large Scale Unsupervised Domain Adaptation

    Authors: Tarun Kalluri, Astuti Sharma, Manmohan Chandraker

    Abstract: Practical real world datasets with plentiful categories introduce new challenges for unsupervised domain adaptation like small inter-class discriminability, that existing approaches relying on domain invariance alone cannot handle sufficiently well. In this work we propose MemSAC, which exploits sample level similarity across source and target domains to achieve discriminative transfer, along with… ▽ More

    Submitted 11 October, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022. Project Webpage: https://tarun005.github.io/MemSAC/

  34. arXiv:2207.08954  [pdf, other

    cs.CV

    Exploiting Unlabeled Data with Vision and Language Models for Object Detection

    Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

    Abstract: Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectiv… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022 (with the supplementary document)

  35. arXiv:2207.00757  [pdf, other

    cs.CV

    PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes

    Authors: Yu-Ying Yeh, Zhengqin Li, Yannick Hold-Geoffroy, Rui Zhu, Zexiang Xu, Miloš Hašan, Kalyan Sunkavalli, Manmohan Chandraker

    Abstract: Most indoor 3D scene reconstruction methods focus on recovering 3D geometry and scene layout. In this work, we go beyond this to propose PhotoScene, a framework that takes input image(s) of a scene along with approximately aligned CAD geometry (either reconstructed automatically or manually specified) and builds a photorealistic digital twin with high-quality materials and similar lighting. We mod… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: Accepted to CVPR 2022; Code is available at https://github.com/ViLab-UCSD/photoscene

  36. arXiv:2206.12784  [pdf, other

    cs.RO

    Learning to Rearrange with Physics-Inspired Risk Awareness

    Authors: Meng Song, Yuhan Liu, Zhengqin Li, Manmohan Chandraker

    Abstract: Real-world applications require a robot operating in the physical world with awareness of potential risks besides accomplishing the task. A large part of risky behaviors arises from interacting with objects in ignorance of affordance. To prevent the agent from making unsafe decisions, we propose to train a robotic agent by reinforcement learning to execute tasks with an awareness of physical prope… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: Accepted to Risk Aware Decision Making Workshop at Robotics, Science and Systems 2022

  37. arXiv:2206.08423  [pdf, other

    cs.CV

    IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes

    Authors: Rui Zhu, Zhengqin Li, Janarbek Matai, Fatih Porikli, Manmohan Chandraker

    Abstract: Indoor scenes exhibit significant appearance variations due to myriad interactions between arbitrarily diverse object shapes, spatially-changing materials, and complex lighting. Shadows, highlights, and inter-reflections caused by visible and invisible light sources require reasoning about long-range interactions for inverse rendering, which seeks to recover the components of image formation, name… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 22 camera ready version with supplementary

  38. arXiv:2205.09343  [pdf, other

    cs.CV

    Physically-Based Editing of Indoor Scene Lighting from a Single Image

    Authors: Zhengqin Li, Jia Shi, Sai Bi, Rui Zhu, Kalyan Sunkavalli, Miloš Hašan, Zexiang Xu, Ravi Ramamoorthi, Manmohan Chandraker

    Abstract: We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks. This is an extremely challenging problem that requires modeling complex light transport, and disentangling HDR lighting from material and geometry with only a partial LDR observation of the scene. We tackle this problem using two novel components: 1) a holistic scen… ▽ More

    Submitted 23 July, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

  39. arXiv:2204.07159  [pdf, other

    cs.CV cs.GR cs.LG

    A Level Set Theory for Neural Implicit Evolution under Explicit Flows

    Authors: Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi

    Abstract: Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy… ▽ More

    Submitted 21 July, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: ECCV 2022 (Oral); Project Page at https://ishit.github.io/nie

  40. arXiv:2203.14949  [pdf, other

    cs.CV cs.LG

    Controllable Dynamic Multi-Task Architectures

    Authors: Dripta S. Raychaudhuri, Yumin Suh, Samuel Schulter, Xiang Yu, Masoud Faraki, Amit K. Roy-Chowdhury, Manmohan Chandraker

    Abstract: Multi-task learning commonly encounters competition for resources among tasks, specifically when model capacity is limited. This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired t… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  41. arXiv:2203.14395  [pdf, other

    cs.CV

    Single-Stream Multi-Level Alignment for Vision-Language Pretraining

    Authors: Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

    Abstract: Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level. Earlier, supervised, non-contrastive methods were capable of finer-grained alignment, but required dense annotations that were not scalable. We propose a si… ▽ More

    Submitted 27 July, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: ECCV 2022

  42. arXiv:2203.03970  [pdf, other

    cs.LG cs.CV

    On Generalizing Beyond Domains in Cross-Domain Continual Learning

    Authors: Christian Simon, Masoud Faraki, Yi-Hsuan Tsai, Xiang Yu, Samuel Schulter, Yumin Suh, Mehrtash Harandi, Manmohan Chandraker

    Abstract: Humans have the ability to accumulate knowledge of new tasks in varying conditions, but deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task. Many recent methods focus on preventing catastrophic forgetting under the assumption of train and test data following similar distributions. In this work, we consider a more realistic scenar… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  43. arXiv:2202.14030  [pdf, other

    cs.CV

    Learning Semantic Segmentation from Multiple Datasets with Label Shifts

    Authors: Dongwan Kim, Yi-Hsuan Tsai, Yumin Suh, Masoud Faraki, Sparsh Garg, Manmohan Chandraker, Bohyung Han

    Abstract: With increasing applications of semantic segmentation, numerous datasets have been proposed in the past few years. Yet labeling remains expensive, thus, it is desirable to jointly train models across aggregations of datasets to enhance data volume and diversity. However, label spaces differ across datasets and may even be in conflict with one another. This paper proposes UniSeg, an effective appro… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  44. arXiv:2111.10046  [pdf, other

    cs.AI cs.LG

    YMIR: A Rapid Data-centric Development Platform for Vision Applications

    Authors: Phoenix X. Huang, Wenze Hu, William Brendel, Manmohan Chandraker, Li-Jia Li, Xiaoyu Wang

    Abstract: This paper introduces an open source platform to support the rapid development of computer vision applications at scale. The platform puts the efficient data development at the center of the machine learning development process, integrates active learning methods, data and model version control, and uses concepts such as projects to enable fast iterations of multiple task specific datasets in para… ▽ More

    Submitted 27 November, 2021; v1 submitted 19 November, 2021; originally announced November 2021.

  45. arXiv:2108.11974  [pdf, other

    cs.CV

    Learning Cross-modal Contrastive Features for Video Domain Adaptation

    Authors: Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker

    Abstract: Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted in ICCV'21

  46. arXiv:2104.08278  [pdf, other

    cs.CV

    Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty

    Authors: Bingbing Zhuang, Manmohan Chandraker

    Abstract: Learning methods for relative camera pose estimation have been developed largely in isolation from classical geometric approaches. The question of how to integrate predictions from deep neural networks (DNNs) and solutions from geometric solvers, such as the 5-point algorithm, has as yet remained under-explored. In this paper, we present a novel framework that involves probabilistic fusion between… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: CVPR 2021, Oral

  47. arXiv:2104.08277  [pdf, other

    cs.CV

    Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

    Authors: Sriram Narayanan, Ramin Moslemi, Francesco Pittaluga, Buyu Liu, Manmohan Chandraker

    Abstract: Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Our work addresses two key challenges in trajectory prediction, learning multimodal outputs, and better predictions by imposing constraints using driving knowledge. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. B… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: CVPR 21 (Oral)

  48. arXiv:2104.06730  [pdf, other

    cs.CV

    Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts

    Authors: Buyu Liu, Bingbing Zhuang, Manmohan Chandraker

    Abstract: We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space as well as a parametric bird's-eye-view (BEV) space. In contrast to prior works that require dense supervision such as semantic labels in perspective view, our method only requires human annotations for parametric attributes that ar… ▽ More

    Submitted 13 April, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: to be appeared in CVPR22

  49. arXiv:2104.03960  [pdf, other

    cs.CV cs.GR

    Modulated Periodic Activations for Generalizable Local Functional Representations

    Authors: Ishit Mehta, Michaël Gharbi, Connelly Barnes, Eli Shechtman, Ravi Ramamoorthi, Manmohan Chandraker

    Abstract: Multi-Layer Perceptrons (MLPs) make powerful functional representations for sampling and reconstruction problems involving low-dimensional signals like images,shapes and light fields. Recent works have significantly improved their ability to represent high-frequency content by using periodic activations or positional encodings. This often came at the expense of generalization: modern methods are t… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: Project Page at https://ishit.github.io/modsine/

  50. arXiv:2104.01286  [pdf, other

    cs.CV

    Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation

    Authors: Astuti Sharma, Tarun Kalluri, Manmohan Chandraker

    Abstract: Domain adaptation deals with training models using large scale labeled data from a specific source domain and then adapting the knowledge to certain target domains that have few or no labels. Many prior works learn domain agnostic feature representations for this purpose using a global distribution alignment objective which does not take into account the finer class specific structure in the sourc… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 (Conference on Computer Vision and Pattern Recognition)