Skip to main content

Showing 1–22 of 22 results for author: Sajjadi, M S M

.
  1. arXiv:2310.06020  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

    Authors: Maximilian Seitzer, Sjoerd van Steenkiste, Thomas Kipf, Klaus Greff, Mehdi S. M. Sajjadi

    Abstract: Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transformer (DyST) model leverages recent work in neural scene representation to learn a latent decomposition of monocular real-world videos into scene content… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 spotlight. Project website: https://dyst-paper.github.io/

  2. arXiv:2306.08068  [pdf, other

    cs.CV cs.AI cs.LG

    DORSal: Diffusion for Object-centric Representations of Scenes et al

    Authors: Allan Jabri, Sjoerd van Steenkiste, Emiel Hoogeboom, Mehdi S. M. Sajjadi, Thomas Kipf

    Abstract: Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically… ▽ More

    Submitted 2 May, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to ICLR 2024. Project page: https://www.sjoerdvansteenkiste.com/dorsal

  3. arXiv:2305.18890  [pdf, other

    cs.CV cs.LG

    Sensitivity of Slot-Based Object-Centric Models to their Number of Slots

    Authors: Roland S. Zimmermann, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Thomas Kipf, Klaus Greff

    Abstract: Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful objects holds great promise for compositional generalization and downstream learning. In these methods, the number of slots (clusters) $K$ is typically chosen to… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  4. arXiv:2304.00947  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    RePAST: Relative Pose Attention Scene Representation Transformer

    Authors: Aleksandr Safin, Daniel Duckworth, Mehdi S. M. Sajjadi

    Abstract: The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Rela… ▽ More

    Submitted 10 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  5. arXiv:2303.03378  [pdf, other

    cs.LG cs.AI cs.RO

    PaLM-E: An Embodied Multimodal Language Model

    Authors: Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

    Abstract: Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model ar… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  6. arXiv:2302.04973  [pdf, other

    cs.CV cs.AI cs.LG

    Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

    Authors: Ondrej Biza, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin F. Elsayed, Aravindh Mahendran, Thomas Kipf

    Abstract: Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this direction. However, they typically fall short at adequately capturing spatial symmetries present in the visual world, which leads to sample inefficiency… ▽ More

    Submitted 20 July, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Project page: https://invariantsa.github.io/

  7. arXiv:2211.14306  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    RUST: Latent Neural Scene Representations from Unposed Imagery

    Authors: Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff

    Abstract: Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively… ▽ More

    Submitted 24 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: CVPR 2023 Highlight. Project website: https://rust-paper.github.io/

  8. arXiv:2206.06922  [pdf, other

    cs.CV cs.AI cs.LG

    Object Scene Representation Transformer

    Authors: Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetić, Mario Lučić, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

    Abstract: A compositional understanding of the world in terms of objects and their geometry in 3D space is considered a cornerstone of human cognition. Facilitating the learning of such a representation in neural networks holds promise for substantially improving labeled data efficiency. As a key step in this direction, we make progress on the problem of learning 3D-consistent decompositions of complex scen… ▽ More

    Submitted 12 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS '22. Project page: https://osrt-paper.github.io/

  9. arXiv:2203.11194  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Test-time Adaptation with Slot-Centric Models

    Authors: Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki

    Abstract: Current visual detectors, though impressive within their training distribution, often fail to parse out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the ta… ▽ More

    Submitted 27 June, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted at ICML 2023. Project website at https://slot-tta.github.io/

  10. arXiv:2203.03570  [pdf, other

    cs.CV cs.GR cs.LG

    Kubric: A scalable dataset generator

    Authors: Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi , et al. (10 additional authors not shown)

    Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 21 pages, CVPR2022

  11. arXiv:2112.00724  [pdf, other

    cs.CV cs.AI cs.GR

    RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs

    Authors: Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, Noha Radwan

    Abstract: Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input sc… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: Project page available at https://m-niemeyer.github.io/regnerf/index.html

  12. arXiv:2111.13260  [pdf, other

    cs.CV cs.RO

    NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

    Authors: Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi S. M. Sajjadi, Etienne Pot, Andrea Tagliasacchi, Daniel Duckworth

    Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in implicit neural scene representations wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D… ▽ More

    Submitted 2 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Project website: https://nesf3d.github.io/. Updated with minor edits to text

  13. arXiv:2111.13152  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

    Authors: Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi

    Abstract: A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel sc… ▽ More

    Submitted 29 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022, Project website: https://srt-paper.github.io/

    Journal ref: CVPR 2022

  14. arXiv:2008.02268  [pdf, other

    cs.CV cs.GR cs.LG

    NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

    Authors: Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth

    Abstract: We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings,… ▽ More

    Submitted 6 January, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: Project website: https://nerf-w.github.io. Ricardo Martin-Brualla, Noha Radwan, and Mehdi S. M. Sajjadi contributed equally to this work. Updated with results for three additional scenes

  15. arXiv:1903.12436  [pdf, other

    cs.LG stat.ML

    From Variational to Deterministic Autoencoders

    Authors: Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, Bernhard Schölkopf

    Abstract: Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We… ▽ More

    Submitted 29 May, 2020; v1 submitted 29 March, 2019; originally announced March 2019.

    Comments: Partha Ghosh and Mehdi S. M. Sajjadi contributed equally to this work

  16. arXiv:1807.07930  [pdf, other

    cs.CV

    Perceptual Video Super Resolution with Enhanced Temporal Consistency

    Authors: Eduardo Pérez-Pellitero, Mehdi S. M. Sajjadi, Michael Hirsch, Bernhard Schölkopf

    Abstract: With the advent of perceptual loss functions, new possibilities in super-resolution have emerged, and we currently have models that successfully generate near-photorealistic high-resolution images from their low-resolution observations. Up to now, however, such approaches have been exclusively limited to single image super-resolution. The application of perceptual loss functions on video processin… ▽ More

    Submitted 2 May, 2019; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: Major revision and improvement of the manuscript: New network architecture, new loss function and extended experiments

  17. arXiv:1806.00035  [pdf, other

    stat.ML cs.LG

    Assessing Generative Models via Precision and Recall

    Authors: Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

    Abstract: Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode drop**. However, these metrics are unable to distinguish between different failure cases since t… ▽ More

    Submitted 28 October, 2018; v1 submitted 31 May, 2018; originally announced June 2018.

    Comments: NIPS 2018

  18. arXiv:1802.04374  [pdf, other

    stat.ML cs.CR cs.LG

    Tempered Adversarial Networks

    Authors: Mehdi S. M. Sajjadi, Giambattista Parascandolo, Arash Mehrjou, Bernhard Schölkopf

    Abstract: Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces sin… ▽ More

    Submitted 11 July, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: accepted to ICML 2018

  19. arXiv:1801.04590  [pdf, other

    cs.CV cs.AI stat.ML

    Frame-Recurrent Video Super-Resolution

    Authors: Mehdi S. M. Sajjadi, Raviteja Vemulapalli, Matthew Brown

    Abstract: Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire… ▽ More

    Submitted 25 March, 2018; v1 submitted 14 January, 2018; originally announced January 2018.

    Comments: Accepted at CVPR 2018

  20. arXiv:1612.07919  [pdf, other

    cs.CV cs.AI stat.ML

    EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis

    Authors: Mehdi S. M. Sajjadi, Bernhard Schölkopf, Michael Hirsch

    Abstract: Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metr… ▽ More

    Submitted 30 July, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

    Comments: main paper and supplementary material

  21. Depth Estimation Through a Generative Model of Light Field Synthesis

    Authors: Mehdi S. M. Sajjadi, Rolf Köhler, Bernhard Schölkopf, Michael Hirsch

    Abstract: Light field photography captures rich structural information that may facilitate a number of traditional image processing and computer vision tasks. A crucial ingredient in such endeavors is accurate depth recovery. We present a novel framework that allows the recovery of a high quality continuous depth map from light field data. To this end we propose a generative model of a light field that is f… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: German Conference on Pattern Recognition (GCPR) 2016

  22. arXiv:1506.00852  [pdf, other

    cs.LG stat.ML

    Peer Grading in a Course on Algorithms and Data Structures: Machine Learning Algorithms do not Improve over Simple Baselines

    Authors: Mehdi S. M. Sajjadi, Morteza Alamgir, Ulrike von Luxburg

    Abstract: Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then… ▽ More

    Submitted 10 February, 2016; v1 submitted 2 June, 2015; originally announced June 2015.

    Comments: Published at the Third Annual ACM Conference on Learning at Scale L@S