Skip to main content

Showing 1–17 of 17 results for author: Poursaeed, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07449  [pdf, other

    cs.CV

    Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

    Authors: Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

    Abstract: Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BLIP-2, LLaVA) demonstrate weak spatial reasoning and localization awareness. Despite generating highly descriptive and elaborate textual answers, these… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  2. arXiv:2312.16339  [pdf, other

    cs.CV cs.LG

    Universal Pyramid Adversarial Training for Improved ViT Performance

    Authors: **-yeh Chiang, Yipin Zhou, Omid Poursaeed, Satya Narayan Shukla, Ashish Shah, Tom Goldstein, Ser-Nam Lim

    Abstract: Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers. However, due to the iterative nature of adversarial training, the technique is up to 7 times more expensive than standard training. To make the method more efficient, we propose Universal Pyramid Adversarial traini… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  3. arXiv:2309.11569  [pdf, other

    cs.CV

    Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

    Authors: Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim

    Abstract: While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length. A common approach to process long videos is applying a short-form video model over uniformly sampled clips of fixed temporal length and aggregating the outputs. This approach neglects the underlying nature of long vide… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  4. arXiv:2306.00989  [pdf, other

    cs.CV cs.LG

    Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

    Authors: Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

    Abstract: Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraini… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ICML 2023 Oral version. Code+Models: https://github.com/facebookresearch/hiera

  5. arXiv:2212.04994  [pdf, other

    cs.CV

    Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning

    Authors: Jishnu Mukhoti, Tsung-Yu Lin, Omid Poursaeed, Rui Wang, Ashish Shah, Philip H. S. Torr, Ser-Nam Lim

    Abstract: We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an image corresponding to a given text input, and therefore transfer seamlessly to the task of open vocabul… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  6. arXiv:2211.11077  [pdf, other

    cs.CV

    Unifying Tracking and Image-Video Object Detection

    Authors: Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim

    Abstract: Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasonin… ▽ More

    Submitted 19 November, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  7. arXiv:2109.02765  [pdf, other

    cs.CV cs.CR cs.LG

    Robustness and Generalization via Generative Adversarial Training

    Authors: Omid Poursaeed, Tianxing Jiang, Harry Yang, Serge Belongie, SerNam Lim

    Abstract: While deep neural networks have achieved remarkable success in various computer vision tasks, they often fail to generalize to new domains and subtle variations of input images. Several defenses have been proposed to improve the robustness against these variations. However, current defenses can only withstand the specific attack used in training, and the models often remain vulnerable to other inp… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021. arXiv admin note: substantial text overlap with arXiv:1911.09058

  8. arXiv:2011.13026  [pdf, other

    cs.CV cs.LG

    Augmentation-Interpolative AutoEncoders for Unsupervised Few-Shot Image Generation

    Authors: Davis Wertheimer, Omid Poursaeed, Bharath Hariharan

    Abstract: We aim to build image generation models that generalize to new domains from few examples. To this end, we first investigate the generalization properties of classic image generators, and discover that autoencoders generalize extremely well to new domains, even when trained on highly constrained data. We leverage this insight to produce a robust, unsupervised few-shot image generation algorithm, an… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  9. arXiv:2008.00305  [pdf, other

    cs.CV cs.GR cs.LG

    Self-supervised Learning of Point Clouds via Orientation Estimation

    Authors: Omid Poursaeed, Tianxing Jiang, Han Qiao, Nayun Xu, Vladimir G. Kim

    Abstract: Point clouds provide a compact and efficient representation of 3D shapes. While deep neural networks have achieved impressive results on point cloud learning tasks, they require massive amounts of manually labeled data, which can be costly and time-consuming to collect. In this paper, we leverage 3D self-supervision for learning downstream tasks on point clouds with fewer labels. A point cloud can… ▽ More

    Submitted 17 October, 2020; v1 submitted 1 August, 2020; originally announced August 2020.

    Comments: 3DV 2020

  10. arXiv:2007.10294  [pdf, other

    cs.CV cs.GR cs.LG

    Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling

    Authors: Omid Poursaeed, Matthew Fisher, Noam Aigerman, Vladimir G. Kim

    Abstract: We propose a novel neural architecture for representing 3D surfaces, which harnesses two complementary shape representations: (i) an explicit representation via an atlas, i.e., embeddings of 2D domains into 3D; (ii) an implicit-function representation, i.e., a scalar function over the 3D volume, with its levels denoting surfaces. We make these two representations synergistic by introducing novel c… ▽ More

    Submitted 16 October, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  11. arXiv:1911.09058  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Fine-grained Synthesis of Unrestricted Adversarial Examples

    Authors: Omid Poursaeed, Tianxing Jiang, Yordanos Goshu, Harry Yang, Serge Belongie, Ser-Nam Lim

    Abstract: We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation. Unlike existing unrestricted attacks that typically hand-craft geometric transformations, we learn stylistic and stochastic modifications leveraging state-of-the-art generative models. This allows us to manipulate an image in a controlled, fine-grained manner witho… ▽ More

    Submitted 22 October, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

  12. arXiv:1910.02060  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Puppet: Generative Layered Cartoon Characters

    Authors: Omid Poursaeed, Vladimir G. Kim, Eli Shechtman, Jun Saito, Serge Belongie

    Abstract: We propose a learning based method for generating new animations of a cartoon character given a few example images. Our method is designed to learn from a traditionally animated sequence, where each frame is drawn by an artist, and thus the input images lack any common structure, correspondences, or labels. We express pose changes as a deformation of a layered 2.5D template mesh, and devise a nove… ▽ More

    Submitted 12 October, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: WACV 2020

  13. arXiv:1810.01575  [pdf, other

    cs.CV cs.CG cs.GR cs.LG stat.ML

    Deep Fundamental Matrix Estimation without Correspondences

    Authors: Omid Poursaeed, Guandao Yang, Aditya Prakash, Qiuren Fang, Hanqing Jiang, Bharath Hariharan, Serge Belongie

    Abstract: Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estim… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: ECCV 2018, Geometry Meets Deep Learning Workshop

  14. arXiv:1712.02328  [pdf, other

    cs.CV cs.CR cs.LG cs.NE stat.ML

    Generative Adversarial Perturbations

    Authors: Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie

    Abstract: In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models. We present trainable deep neural networks for transforming images to adversarial perturbations. Our proposed models can produce image-agnostic and image-dependent perturbations for both targeted and non-targeted… ▽ More

    Submitted 6 July, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

    Comments: CVPR 2018, camera-ready version

  15. Vision-based Real Estate Price Estimation

    Authors: Omid Poursaeed, Tomas Matera, Serge Belongie

    Abstract: Since the advent of online real estate database companies like Zillow, Trulia and Redfin, the problem of automatic estimation of market values for houses has received considerable attention. Several real estate websites provide such estimates using a proprietary formula. Although these estimates are often close to the actual sale prices, in some cases they are highly inaccurate. One of the key fac… ▽ More

    Submitted 3 October, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

    Journal ref: Machine Vision and Applications, 29(4), 667-676, 2018

  16. arXiv:1612.04357  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Stacked Generative Adversarial Networks

    Authors: Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie

    Abstract: In this paper, we propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network. Our model consists of a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations. A representation discriminator is introduced at ea… ▽ More

    Submitted 12 April, 2017; v1 submitted 13 December, 2016; originally announced December 2016.

    Comments: CVPR 2017, camera-ready version

  17. arXiv:1604.07124  [pdf, other

    cs.NI cs.IT

    Analytical Studies of Fragmented-Spectrum Multi-Level OFDM-CDMA Technique in Cognitive Radio Networks

    Authors: Farhad Akhoundi, Saeed Sharifi-Malvajerdi, Omid Poursaeed, Jawad A. Salehi

    Abstract: In this paper, we present a multi-user resource allocation framework using fragmented-spectrum synchronous OFDM-CDMA modulation over a frequency-selective fading channel. In particular, given pre-existing communications in the spectrum where the system is operating, a channel sensing and estimation method is used to obtain information of subcarrier availability. Given this information, some real-v… ▽ More

    Submitted 25 April, 2016; originally announced April 2016.

    Comments: 6 pages and 3 figures