Skip to main content

Showing 1–10 of 10 results for author: Dosovitskiy, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2111.12594  [pdf, other

    cs.CV cs.LG stat.ML

    Conditional Object-Centric Learning from Video

    Authors: Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff

    Abstract: Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for… ▽ More

    Submitted 15 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Published at ICLR 2022. Project page at https://slot-attention-video.github.io/

  2. arXiv:2108.08810  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Do Vision Transformers See Like Convolutional Neural Networks?

    Authors: Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

    Abstract: Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual re… ▽ More

    Submitted 3 March, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

  3. arXiv:2104.03059  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Differentiable Patch Selection for Image Recognition

    Authors: Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner

    Abstract: Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021. Code available at https://github.com/google-research/google-research/tree/master/ptopk_patch_selection/

  4. arXiv:2006.15055  [pdf, other

    cs.LG cs.CV stat.ML

    Object-Centric Learning with Slot Attention

    Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

    Abstract: Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with pe… ▽ More

    Submitted 14 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/google-research/google-research/tree/master/slot_attention

  5. arXiv:1910.04867  [pdf, other

    cs.CV cs.LG stat.ML

    A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

    Authors: Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

    Abstract: Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r… ▽ More

    Submitted 21 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  6. arXiv:1901.08652  [pdf, other

    cs.RO cs.LG stat.ML

    Learning agile and dynamic motor skills for legged robots

    Authors: Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, Marco Hutter

    Abstract: Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited t… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Journal ref: Science Robotics 4.26 (2019): eaau5872

  7. arXiv:1901.03162  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Motion Perception in Reinforcement Learning with Dynamic Objects

    Authors: Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

    Abstract: In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken. However, in existing reinforcement learning works motion is rarely treated explicitly; it is rather assumed that the controller learns the necessary motion representation from temporal stacks of frames implicitly. In this paper, we show that for continuous control tasks lear… ▽ More

    Submitted 1 February, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

  8. arXiv:1806.01175  [pdf, other

    cs.LG cs.AI stat.ML

    TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

    Authors: Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

    Abstract: Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that use temporal differencing (TD) are superior to direct Monte Carlo estimation (MC). How do these results hold up in deep RL, which deals with perceptually compl… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

  9. What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

    Authors: Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

    Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method… ▽ More

    Submitted 22 March, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

    Comments: added references (UCL dataset); added IJCV copyright information

  10. arXiv:1512.02134  [pdf, other

    cs.CV cs.LG stat.ML

    A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

    Authors: Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

    Abstract: Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we pro… ▽ More

    Submitted 7 December, 2015; originally announced December 2015.

    Comments: Includes supplementary material

    ACM Class: I.2.6; I.2.10; I.4.8