Skip to main content

Showing 1–15 of 15 results for author: Habibian, A

.
  1. arXiv:2401.05735  [pdf, other

    cs.CV cs.LG

    Object-Centric Diffusion for Efficient Video Editing

    Authors: Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

    Abstract: Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we c… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  2. arXiv:2312.08892  [pdf, other

    cs.CV

    VaLID: Variable-Length Input Diffusion for Novel View Synthesis

    Authors: Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian

    Abstract: Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision. As this task is heavily under-constrained, some recent work, like Zero123, tries to solve this problem with generative modeling, specifically using pre-trained diffusion models. Although this strategy generalizes well… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: paper and supplementary material

  3. arXiv:2312.08128  [pdf, other

    cs.CV

    Clockwork Diffusion: Efficient Generation With Model-Step Distillation

    Authors: Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen

    Abstract: This work aims to improve the efficiency of text-to-image diffusion models. While diffusion models use computationally expensive UNet-based denoising operations in every generation step, we identify that not all operations are equally relevant for the final output quality. In particular, we observe that UNet layers operating on high-res feature maps are relatively sensitive to small perturbations.… ▽ More

    Submitted 20 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  4. arXiv:2308.09511  [pdf, other

    cs.CV

    ResQ: Residual Quantization for Video Perception

    Authors: Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian

    Abstract: This paper accelerates video perception, such as semantic segmentation and human pose estimation, by levering cross-frame redundancies. Unlike the existing approaches, which avoid redundant computations by war** the past features using optical-flow or by performing sparse convolutions on frame differences, we approach the problem from a new perspective: low-bit quantization. We observe that resi… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  5. arXiv:2301.02240  [pdf, other

    cs.CV

    Skip-Attention: Improving Vision Transformers by Paying Less Attention

    Authors: Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

    Abstract: This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to ap… ▽ More

    Submitted 17 January, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  6. arXiv:2206.08236  [pdf, other

    cs.CV cs.LG eess.IV

    Simple and Efficient Architectures for Semantic Segmentation

    Authors: Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

    Abstract: Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a sm… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: To be presented at Efficient Deep Learning for Computer Vision Workshop at CVPR 2022

  7. arXiv:2204.02397  [pdf, other

    cs.CV

    SALISA: Saliency-based Input Sampling for Efficient Video Object Detection

    Authors: Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, Amir Ghodrati

    Abstract: High-resolution images are widely adopted for high-performance object detection in videos. However, processing high-resolution inputs comes with high computation costs, and naive down-sampling of the input to reduce the computation costs quickly degrades the detection performance. In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detecti… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 20 pages, 7 figures

  8. arXiv:2203.09594  [pdf, other

    cs.CV cs.LG

    Delta Distillation for Efficient Video Processing

    Authors: Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios Gavves, Fatih Porikli

    Abstract: This paper aims to accelerate video stream processing, such as object detection and semantic segmentation, by leveraging the temporal redundancies that exist between video frames. Instead of propagating and war** features using motion alignment, such as optical flow, we propose a novel knowledge distillation schema coined as Delta Distillation. In our proposal, the student learns the variations… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  9. arXiv:2203.01978  [pdf, other

    eess.IV cs.CV cs.LG

    Region-of-Interest Based Neural Video Compression

    Authors: Yura Perugachi-Diaz, Guillaume Sautière, Davide Abati, Yang Yang, Amirhossein Habibian, Taco S Cohen

    Abstract: Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low… ▽ More

    Submitted 2 November, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Updated arxiv version to the camera-ready version after acceptance at British Machine Vision Conference (BMVC) 2022

  10. arXiv:2104.13400  [pdf, other

    cs.CV cs.LG

    FrameExit: Conditional Early Exiting for Efficient Video Recognition

    Authors: Amir Ghodrati, Babak Ehteshami Bejnordi, Amirhossein Habibian

    Abstract: In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more fr… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 | Oral paper

  11. arXiv:2104.11487  [pdf, other

    cs.CV cs.LG

    Skip-Convolutions for Efficient Video Processing

    Authors: Amirhossein Habibian, Davide Abati, Taco S. Cohen, Babak Ehteshami Bejnordi

    Abstract: We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the mode… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  12. arXiv:2004.09508  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Adversarial Distortion for Learned Video Compression

    Authors: Vijay Veerabadran, Reza Pourreza, Amirhossein Habibian, Taco Cohen

    Abstract: In this paper, we present a novel adversarial lossy video compression model. At extremely low bit-rates, standard video coding schemes suffer from unpleasant reconstruction artifacts such as blocking, ringing etc. Existing learned neural approaches to video compression have achieved reasonable success on reducing the bit-rate for efficient transmission and reduce the impact of artifacts to an exte… ▽ More

    Submitted 18 June, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: CVPR Workshops, 2020

  13. arXiv:1908.05717  [pdf, other

    eess.IV cs.LG stat.ML

    Video Compression With Rate-Distortion Autoencoders

    Authors: Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen

    Abstract: In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find… ▽ More

    Submitted 13 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  14. arXiv:1908.00733  [pdf, other

    cs.LG cs.CV stat.ML

    Learning Variations in Human Motion via Mix-and-Match Perturbation

    Authors: Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian

    Abstract: Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In… ▽ More

    Submitted 24 February, 2020; v1 submitted 2 August, 2019; originally announced August 2019.

  15. arXiv:1511.02492  [pdf, other

    cs.CV cs.MM

    VideoStory Embeddings Recognize Events when Examples are Scarce

    Authors: Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek

    Abstract: This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between vi… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.