Skip to main content

Showing 1–7 of 7 results for author: Dorkenwald, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08657  [pdf, other

    cs.CV

    PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

    Authors: Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom,… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  2. arXiv:2205.11710  [pdf, other

    cs.CV

    SCVRL: Shuffled Contrastive Video Representation Learning

    Authors: Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide Modolo

    Abstract: We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos. Differently from previous contrast learning based methods that mostly focus on learning visual semantics (e.g., CVRL), SCVRL is capable of learning both semantic and motion patterns. For that, we reformulate the popular shuffling pretext task within a modern contrastive learning paradigm. We show that ou… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 - L3DIVU workshop

  3. arXiv:2107.02790  [pdf, other

    cs.CV

    iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: How would a static scene react to a local poke? What are the effects on other parts of an object if you could locally push it? There will be distinctive movement, despite evident variations caused by the stochastic nature of our world. These outcomes are governed by the characteristic kinematics of objects that dictate their overall motion caused by a local interaction. Conversely, the movement of… ▽ More

    Submitted 6 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: ICCV 2021, Project page is available at https://bit.ly/3dJN4Lf

  4. arXiv:2106.11303  [pdf, other

    cs.CV

    Understanding Object Dynamics for Interactive Image-to-Video Synthesis

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: What would be the effect of locally poking a static scene? We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Training requires only videos of moving objects but no information of the underlying manipulation of the physical scene. Our generative model learns to infer natural object dynamics as a response to user interaction an… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: CVPR 2021, project page available at https://bit.ly/3cxfA2L

  5. arXiv:2105.04551  [pdf, other

    cs.CV

    Stochastic Image-to-Video Synthesis using cINNs

    Authors: Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer

    Abstract: Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame. This naturally suggests a bij… ▽ More

    Submitted 17 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  6. arXiv:2103.04677  [pdf, other

    cs.CV

    Behavior-Driven Synthesis of Human Dynamics

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: Generating and representing human behavior are of major importance for various computer vision applications. Commonly, human video synthesis represents behavior as sequences of postures while directly predicting their likely progressions or merely changing the appearance of the depicted persons, thus not being able to exercise control over their actual behavior during the synthesis process. In con… ▽ More

    Submitted 22 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021 as Poster

  7. arXiv:2012.09237  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Behaviour Analysis and Magnification (uBAM) using Deep Learning

    Authors: Biagio Brattoli, Uta Buechler, Michael Dorkenwald, Philipp Reiser, Linard Filli, Fritjof Helmchen, Anna-Sophia Wahl, Bjoern Ommer

    Abstract: Motor behaviour analysis is essential to biomedical research and clinical diagnostics as it provides a non-invasive strategy for identifying motor impairment and its change caused by interventions. State-of-the-art instrumented movement analysis is time- and cost-intensive, since it requires placing physical or virtual markers. Besides the effort required for marking keypoints or annotations neces… ▽ More

    Submitted 6 April, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Published in Nature Machine Intelligence (2021), https://rdcu.be/ch6pL