Skip to main content

Showing 1–12 of 12 results for author: Rambhatla, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06908  [pdf, other

    cs.CV

    UVIS: Unsupervised Video Instance Segmentation

    Authors: Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava

    Abstract: Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Workshop

  2. arXiv:2402.03290  [pdf, other

    cs.CV cs.AI cs.LG

    InstanceDiffusion: Instance-level Control for Image Generation

    Authors: Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra

    Abstract: Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image. We introduce InstanceDiffusion that adds precise instance-level control to text-to-image diffusion models. InstanceDiffusion supports free-form language conditions per instance and allows flexible ways to specify instance locations such as simple single points, scribbles, bou… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Preprint; Project page: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

  3. arXiv:2311.10709  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

    Authors: Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra

    Abstract: We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolut… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Project page: https://emu-video.metademolab.com

  4. arXiv:2311.10708  [pdf, other

    cs.CV cs.LG

    SelfEval: Leveraging the discriminative nature of generative models for evaluation

    Authors: Sai Saketh Rambhatla, Ishan Misra

    Abstract: In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, making the generative model directly applicable to discriminative tasks. Using SelfEval, we repurpose standard… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  5. arXiv:2304.05387  [pdf, other

    cs.CV

    MOST: Multiple Object localization with Self-supervised Transformers for object discovery

    Authors: Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava

    Abstract: We tackle the challenging task of unsupervised object localization in this work. Recently, transformers trained with self-supervised learning have been shown to exhibit object localization properties without being trained for this task. In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised lea… ▽ More

    Submitted 26 August, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted to ICCV2023 as an Oral. Project webpage: https://rssaketh.github.io/most

  6. arXiv:2201.04620  [pdf, other

    cs.CV

    SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining

    Authors: Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

    Abstract: Training with sparse annotations is known to reduce the performance of object detectors. Previous methods have focused on proxies for missing ground truth annotations in the form of pseudo-labels for unlabeled boxes. We observe that existing methods suffer at higher levels of sparsity in the data due to noisy pseudo-labels. To prevent this, we propose an end-to-end system that learns to separate t… ▽ More

    Submitted 26 August, 2023; v1 submitted 12 January, 2022; originally announced January 2022.

    Comments: Accepted at ICCV2023. Project webpage: https://www.cs.umd.edu/~sakshams/SparseDet. The first two authors contributed equally

  7. arXiv:2110.13386  [pdf, other

    cs.CV cs.AI

    Self-Denoising Neural Networks for Few Shot Learning

    Authors: Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa

    Abstract: In this paper, we introduce a new architecture for few shot learning, the task of teaching a neural network from as few as one or five labeled examples. Inspired by the theoretical results of Alaine et al that Denoising Autoencoders refine features to lie closer to the true data manifold, we present a new training scheme that adds noise at multiple stages of an existing neural architecture while s… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  8. arXiv:2107.13600  [pdf, other

    cs.LG

    To Boost or not to Boost: On the Limits of Boosted Neural Networks

    Authors: Sai Saketh Rambhatla, Michael Jones, Rama Chellappa

    Abstract: Boosting is a method for finding a highly accurate hypothesis by linearly combining many ``weak" hypotheses, each of which may be only moderately accurate. Thus, boosting is a method for learning an ensemble of classifiers. While boosting has been shown to be very effective for decision trees, its impact on neural networks has not been extensively studied. We prove one important difference between… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  9. arXiv:2105.01652  [pdf, other

    cs.CV cs.AI

    The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory

    Authors: Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

    Abstract: We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset. While existing methods show results on datasets with less cluttered scenes and fewer object instances per image, we present our results on the challenging COCO dataset. Moreover, we argue that, rather than discovering new categories from scratch, discovery algorithms… ▽ More

    Submitted 15 September, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: Accepted to ICCV2021

  10. arXiv:2004.04851  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Spatial Priming for Detecting Human-Object Interactions

    Authors: Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

    Abstract: The relative spatial layout of a human and an object is an important cue for determining how they interact. However, until now, spatial layout has been used just as side-information for detecting human-object interactions (HOIs). In this paper, we present a method for exploiting this spatial layout information for detecting HOIs in images. The proposed method consists of a layout module which prim… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  11. arXiv:1905.03397  [pdf, ps, other

    cs.CV

    A Dual-Path Model With Adaptive Attention For Vehicle Re-Identification

    Authors: Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa

    Abstract: In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearanc… ▽ More

    Submitted 24 September, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: This work has been accepted for oral presentation in ICCV 2019

  12. arXiv:1904.03181  [pdf, other

    cs.CV

    Detecting Human-Object Interactions via Functional Generalization

    Authors: Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

    Abstract: We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar inte… ▽ More

    Submitted 2 September, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: AAAI 2020