Skip to main content

Showing 1–12 of 12 results for author: Saffar, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.11606  [pdf, other

    cs.CV cs.CL

    StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

    Authors: Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender

    Abstract: Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect compre… ▽ More

    Submitted 12 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: NeurIPS D&B 2023

  2. arXiv:2210.02399  [pdf, other

    cs.CV cs.AI

    Phenaki: Variable Length Video Generation From Open Domain Textual Description

    Authors: Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan

    Abstract: We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small repres… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  3. arXiv:2205.00949  [pdf, other

    cs.CV

    Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

    Authors: AJ Piergiovanni, Wei Li, Weicheng Kuo, Mohammad Saffar, Fred Bertsch, Anelia Angelova

    Abstract: We present Answer-Me, a task-aware multi-task framework which unifies a variety of question answering tasks, such as, visual question answering, visual entailment, visual reasoning. In contrast to previous works using contrastive or generative captioning training, we propose a novel and simple recipe to pre-train a vision-language joint model, which is multi-task as well. The pre-training uses onl… ▽ More

    Submitted 30 November, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  4. arXiv:2203.17273  [pdf, other

    cs.CV

    FindIt: Generalized Localization with Natural Language Queries

    Authors: Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova

    Abstract: We propose FindIt, a simple and versatile framework that unifies a variety of visual grounding and localization tasks including referring expression comprehension, text-based localization, and object detection. Key to our architecture is an efficient multi-scale fusion module that unifies the disparate localization requirements across the tasks. In addition, we discover that a standard object dete… ▽ More

    Submitted 8 August, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022 (European Conference on Computer Vision)

  5. arXiv:2106.13195  [pdf, other

    cs.CV cs.LG

    FitVid: Overfitting in Pixel-Level Video Prediction

    Authors: Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan

    Abstract: An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed pas… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  6. arXiv:2012.04603  [pdf, other

    cs.LG

    Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

    Authors: Mohammad Babaeizadeh, Mohammad Taghi Saffar, Danijar Hafner, Harini Kannan, Chelsea Finn, Sergey Levine, Dumitru Erhan

    Abstract: Model-based reinforcement learning (MBRL) methods have shown strong sample efficiency and performance across a variety of tasks, including when faced with high-dimensional visual observations. These methods learn to predict the environment dynamics and expected reward from interaction and use this predictive model to plan and perform the task. However, MBRL methods vary in their fundamental design… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  7. arXiv:2003.05997  [pdf, other

    cs.LG eess.AS stat.ML

    Efficient Content-Based Sparse Attention with Routing Transformers

    Authors: Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier

    Abstract: Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic… ▽ More

    Submitted 24 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: TACL 2020; pre-MIT Press publication version; v5 has a random attention baseline

  8. arXiv:1906.05156  [pdf

    cs.CV cs.AI cs.IT cs.LG

    Evaluation of Dataflow through layers of Deep Neural Networks in Classification and Regression Problems

    Authors: Ahmad Kalhor, Mohsen Saffar, Melika Kheirieh, Somayyeh Hoseinipoor, Babak N. Araabi

    Abstract: This paper introduces two straightforward, effective indices to evaluate the input data and the data flowing through layers of a feedforward deep neural network. For classification problems, the separation rate of target labels in the space of dataflow is explained as a key factor indicating the performance of designed layers in improving the generalization of the network. According to the explain… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    MSC Class: 68Txx

  9. arXiv:1806.09986  [pdf

    cs.CV

    Online Signature Verification using Deep Representation: A new Descriptor

    Authors: Mohammad Hajizadeh Saffar, Mohsen Fayyaz, Mohammad Sabokrou, Mahmood Fathy

    Abstract: This paper presents an accurate method for verifying online signatures. The main difficulty of signature verification come from: (1) Lacking enough training samples (2) The methods must be spatial change invariant. To deal with these difficulties and modeling the signatures efficiently, we propose a method that a one-class classifier per each user is built on discriminative features. First, we pre… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1505.08153

  10. arXiv:1806.06172  [pdf

    cs.CV

    Semantic Video Segmentation: A Review on Recent Approaches

    Authors: Mohammad Hajizadeh Saffar, Mohsen Fayyaz, Mohammad Sabokrou, Mahmood Fathy

    Abstract: This paper gives an overview on semantic segmentation consists of an explanation of this field, it's status and relation with other vision fundamental tasks, different datasets and common evaluation parameters that have been used by researchers. This survey also includes an overall review on a variety of recent approaches (RDF, MRF, CRF, etc.) and their advantages and challenges and shows the supe… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

  11. arXiv:1608.05971  [pdf, other

    cs.CV

    STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

    Authors: Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang

    Abstract: This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features als… ▽ More

    Submitted 2 September, 2016; v1 submitted 21 August, 2016; originally announced August 2016.

  12. arXiv:1508.03710  [pdf

    cs.CV

    A Novel Approach For Finger Vein Verification Based on Self-Taught Learning

    Authors: Mohsen Fayyaz, Masoud PourReza, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy

    Abstract: In this paper, we propose a method for user Finger Vein Authentication (FVA) as a biometric system. Using the discriminative features for classifying theses finger veins is one of the main tips that make difference in related works, Thus we propose to learn a set of representative features, based on autoencoders. We model the user finger vein using a Gaussian distribution. Experimental results sho… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

    Comments: 4 pages, 4 figures, Submitted Iranian Conference on Machine Vision and Image Processing