Skip to main content

Showing 1–7 of 7 results for author: Einfalt, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.06110  [pdf, other

    cs.CV

    Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

    Authors: Moritz Einfalt, Katja Ludwig, Rainer Lienhart

    Abstract: The state-of-the-art for monocular 3D human pose estimation in videos is dominated by the paradigm of 2D-to-3D pose uplifting. While the uplifting methods themselves are rather efficient, the true computational complexity depends on the per-frame 2D pose estimation. In this paper, we present a Transformer-based pose uplifting scheme that can operate on temporally sparse 2D pose sequences but still… ▽ More

    Submitted 21 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at IEEE/CVF WACV 2023

  2. arXiv:2112.14100  [pdf, other

    cs.CV

    Extended Self-Critical Pipeline for Transforming Videos to Text (TRECVID-VTT Task 2021) -- Team: MMCUniAugsburg

    Authors: Philipp Harzig, Moritz Einfalt, Katja Ludwig, Rainer Lienhart

    Abstract: The Multimedia and Computer Vision Lab of the University of Augsburg participated in the VTT task only. We use the VATEX and TRECVID-VTT datasets for training our VTT models. We base our model on the Transformer approach for both of our submitted runs. For our second model, we adapt the X-Linear Attention Networks for Image Captioning which does not yield the desired bump in scores. For both model… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

    Comments: TRECVID 2021 notebook paper

  3. arXiv:2112.14088  [pdf, other

    cs.CV

    Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation

    Authors: Philipp Harzig, Moritz Einfalt, Rainer Lienhart

    Abstract: Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance. Transformer architectures have shown great performance in both machine translation and image captioning, lacking a straightforward and reproducible application for VTT. However, there is no comp… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  4. arXiv:2010.12317  [pdf, other

    cs.CV

    Error Bounds of Projection Models in Weakly Supervised 3D Human Pose Estimation

    Authors: Nikolas Klug, Moritz Einfalt, Stephan Brehm, Rainer Lienhart

    Abstract: The current state-of-the-art in monocular 3D human pose estimation is heavily influenced by weakly supervised methods. These allow 2D labels to be used to learn effective 3D human pose recovery either directly from images or via 2D-to-3D pose uplifting. In this paper we present a detailed analysis of the most commonly used simplified projection models, which relate the estimated 3D pose representa… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at 3DV 2020

  5. arXiv:2004.09776  [pdf, other

    cs.CV

    Decoupling Video and Human Motion: Towards Practical Event Detection in Athlete Recordings

    Authors: Moritz Einfalt, Rainer Lienhart

    Abstract: In this paper we address the problem of motion event detection in athlete recordings from individual sports. In contrast to recent end-to-end approaches, we propose to use 2D human pose sequences as an intermediate representation that decouples human motion from the raw video information. Combined with domain-adapted athlete tracking, we describe two approaches to event detection on pose sequences… ▽ More

    Submitted 22 April, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Accepted at 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW)

  6. Mining Automatically Estimated Poses from Video Recordings of Top Athletes

    Authors: Rainer Lienhart, Moritz Einfalt, Dan Zecha

    Abstract: Human pose detection systems based on state-of-the-art DNNs are on the go to be extended, adapted and re-trained to fit the application domain of specific sports. Therefore, plenty of noisy pose data will soon be available from videos recorded at a regular and frequent basis. This work is among the first to develop mining algorithms that can mine the expected abundance of noisy and annotation-free… ▽ More

    Submitted 27 April, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: Under review for the International Journal of Computer Science in Sport

  7. Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming

    Authors: Moritz Einfalt, Dan Zecha, Rainer Lienhart

    Abstract: In this paper we consider the problem of human pose estimation in real-world videos of swimmers. Swimming channels allow filming swimmers simultaneously above and below the water surface with a single stationary camera. These recordings can be used to quantitatively assess the athletes' performance. The quantitative evaluation, so far, requires manual annotations of body parts in each video frame.… ▽ More

    Submitted 2 February, 2018; originally announced February 2018.

    Comments: 10 pages, 9 figures, accepted at WACV 2018