Skip to main content

Showing 1–5 of 5 results for author: Souček, T

.
  1. arXiv:2312.07322  [pdf, other

    cs.CV

    GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

    Authors: Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic

    Abstract: We address the task of generating temporally consistent and physically plausible images of actions and object state transformations. Given an input image and a text prompt describing the targeted transformation, our generated images preserve the environment and transform objects in the initial image. Our contributions are threefold. First, we leverage a large body of instructional videos and autom… ▽ More

    Submitted 2 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  2. arXiv:2211.13500  [pdf, other

    cs.CV

    Multi-Task Learning of Object State Changes from Uncurated Videos

    Authors: Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

    Abstract: We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos. We introduce three principal contributions. First, we explore alternative multi-task network architectures and identify a model that enables efficient joint learning of multiple object states and actions such as pouring… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  3. arXiv:2203.11637  [pdf, other

    cs.CV

    Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

    Authors: Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

    Abstract: Human actions often induce changes of object states such as "cutting an apple", "cleaning shoes" or "pouring coffee". In this paper, we seek to temporally localize object states (e.g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision. The contributions of this work are threefold. First, we develop a… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: To be published in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  4. arXiv:2008.04838  [pdf, other

    cs.CV

    TransNet V2: An effective deep network architecture for fast shot transition detection

    Authors: Tomáš Souček, Jakub Lokoč

    Abstract: Although automatic shot transition detection approaches are already investigated for more than two decades, an effective universal human-level model was not proposed yet. Even for common shot transitions like hard cuts or simple gradual changes, the potential diversity of analyzed video contents may still lead to both false hits and false dismissals. Recently, deep learning-based approaches signif… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  5. arXiv:1906.03363  [pdf, other

    cs.CV

    TransNet: A deep network for fast detection of common shot transitions

    Authors: Tomáš Souček, Jaroslav Moravec, Jakub Lokoč

    Abstract: Shot boundary detection (SBD) is an important first step in many video processing applications. This paper presents a simple modular convolutional neural network architecture that achieves state-of-the-art results on the RAI dataset with well above real-time inference speed even on a single mediocre GPU. The network employs dilated convolutions and operates just on small resized frames. The traini… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.