Skip to main content

Showing 1–50 of 106 results for author: Gall, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.08909  [pdf, other

    cs.CV

    ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

    Authors: Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall

    Abstract: Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning. Tracking-by-attention, however, entangles detection and tracking queries in one embedding for both the detection and tracking task, which is sub-optimal. Other approaches resemble… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 14 pages, 3 figures, accepted by CVPR 2024

  2. arXiv:2402.18319  [pdf, other

    cs.RO cs.CV

    A Multimodal Handover Failure Detection Dataset and Baselines

    Authors: Santosh Thoduka, Nico Hochgeschwender, Juergen Gall, Paul G. Plöger

    Abstract: An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpre… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted at ICRA 2024

  3. arXiv:2312.15289  [pdf, other

    cs.CV cs.LG eess.IV

    Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

    Authors: Lokesh Veeramacheneni, Moritz Wolter, Hildegard Kuehne, Juergen Gall

    Abstract: Modern metrics for generative learning like Fréchet Inception Distance (FID) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fréchet Wavelet Distance (FWD) as a domain-agnostic metric based on Wavelet Packet Transform ($W_p$). FWD provides a sight across a broad spectru… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  4. arXiv:2312.08892  [pdf, other

    cs.CV

    VaLID: Variable-Length Input Diffusion for Novel View Synthesis

    Authors: Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian

    Abstract: Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision. As this task is heavily under-constrained, some recent work, like Zero123, tries to solve this problem with generative modeling, specifically using pre-trained diffusion models. Although this strategy generalizes well… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: paper and supplementary material

  5. arXiv:2311.15991  [pdf, other

    cs.CV

    DiffAnt: Diffusion Models for Action Anticipation

    Authors: Zeyun Zhong, Chengzhi Wu, Manuel Martin, Michael Voit, Juergen Gall, Jürgen Beyerer

    Abstract: Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow. This uncertainty becomes even larger when predicting far into the future. However, the majority of existing action anticipation models adhere to a deterministic approach, neglecting to account for future uncertainties. In this work, we r… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  6. arXiv:2309.17257  [pdf, other

    cs.CV

    A Survey on Deep Learning Techniques for Action Anticipation

    Authors: Zeyun Zhong, Manuel Martin, Michael Voit, Juergen Gall, Jürgen Beyerer

    Abstract: The ability to anticipate possible future human actions is essential for a wide range of applications, including autonomous driving and human-robot interaction. Consequently, numerous methods have been introduced for action anticipation in recent years, with deep learning-based approaches being particularly popular. In this work, we review the recent advances of action anticipation algorithms with… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Submitted to TPAMI

  7. arXiv:2309.07849  [pdf, other

    cs.CV

    TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation

    Authors: Rong Li, ShiJie Li, Xieyuanli Chen, Teli Ma, Juergen Gall, Junwei Liang

    Abstract: LiDAR semantic segmentation plays a crucial role in enabling autonomous driving and robots to understand their surroundings accurately and robustly. A multitude of methods exist within this domain, including point-based, range-image-based, polar-coordinate-based, and hybrid strategies. Among these, range-image-based techniques have gained widespread adoption in practical applications due to their… ▽ More

    Submitted 14 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: accepted by CVPR2024 Workshop on Autonomous Driving

  8. arXiv:2308.11358  [pdf, other

    cs.CV cs.AI cs.LG

    How Much Temporal Long-Term Context is Needed for Action Segmentation?

    Authors: Emad Bahrami, Gianpiero Francesca, Juergen Gall

    Abstract: Modeling long-term context in videos is crucial for many fine-grained tasks including temporal action segmentation. An interesting question that is still open is how much long-term temporal context is needed for optimal performance. While transformers can model the long-term context of a video, this becomes computationally prohibitive for long videos. Recent works on temporal action segmentation t… ▽ More

    Submitted 25 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  9. arXiv:2308.11356  [pdf, other

    cs.CV cs.AI

    Semantic RGB-D Image Synthesis

    Authors: Shijie Li, Rong Li, Juergen Gall

    Abstract: Collecting diverse sets of training images for RGB-D semantic image segmentation is not always possible. In particular, when robots need to operate in privacy-sensitive areas like homes, the collection is often limited to a small set of locations. As a consequence, the annotated images lack diversity in appearance and approaches for RGB-D semantic image segmentation tend to overfit the training da… ▽ More

    Submitted 18 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV Workshop on Representation Learning with Very Limited Images 2023

  10. arXiv:2308.09717  [pdf, other

    cs.CV

    Smoothness Similarity Regularization for Few-Shot GAN Adaptation

    Authors: Vadim Sushko, Ruyu Wang, Juergen Gall

    Abstract: The task of few-shot GAN adaptation aims to adapt a pre-trained GAN model to a small dataset with very few training images. While existing methods perform well when the dataset for pre-training is structurally similar to the target dataset, the approaches suffer from training instabilities or memorization issues when the objects in the two domains have a very different structure. To mitigate this… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: International Conference on Computer Vision (ICCV) 2023

  11. arXiv:2308.06635  [pdf, other

    cs.CV

    3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

    Authors: Shuxiao Ding, Eike Rehder, Lukas Schneider, Marius Cordts, Juergen Gall

    Abstract: Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) appro… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: 17 pages, 8 figures, accepted by ICCV2023

  12. arXiv:2306.15045  [pdf, other

    cs.CV

    Action Anticipation with Goal Consistency

    Authors: Olga Zatsarynna, Juergen Gall

    Abstract: In this paper, we address the problem of short-term action anticipation, i.e., we want to predict an upcoming action one second before it happens. We propose to harness high-level intent information to anticipate actions that will take place in the future. To this end, we incorporate an additional goal prediction branch into our model and propose a consistency loss function that encourages the ant… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted to ICIP 2023

  13. arXiv:2306.10761  [pdf, other

    cs.CV cs.RO

    PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View

    Authors: Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, Juergen Gall

    Abstract: Accurately perceiving instances and predicting their future motion are key tasks for autonomous vehicles, enabling them to navigate safely in complex urban traffic. While bird's-eye view (BEV) representations are commonplace in perception for autonomous driving, their potential in a motion prediction setting is less explored. Existing approaches for BEV instance prediction from surround cameras re… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: 12 pages, 8 figures. This paper is accepted by IJCAI2023. Peizheng Li and Shuxiao Ding contributed equally to this work

  14. arXiv:2306.05807  [pdf, other

    cs.CV

    A Gated Attention Transformer for Multi-Person Pose Tracking

    Authors: Andreas Doering, Juergen Gall

    Abstract: Multi-person pose tracking is an important element for many applications and requires to estimate the human poses of all persons in a video and to track them over time. The association of poses across frames remains an open research problem, in particular for online tracking methods, due to motion blur, crowded scenes and occlusions. To tackle the association challenge, we propose a Gated Attentio… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted to ICCVW23

  15. Location-aware Adaptive Normalization: A Deep Learning Approach For Wildfire Danger Forecasting

    Authors: Mohamad Hakam Shams Eddin, Ribana Roscher, Juergen Gall

    Abstract: Climate change is expected to intensify and increase extreme events in the weather cycle. Since this has a significant impact on various sectors of our life, recent works are concerned with identifying and predicting such extreme events from Earth observations. With respect to wildfire danger forecasting, previous deep learning approaches duplicate static variables along the time dimension and neg… ▽ More

    Submitted 7 April, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Journal ref: in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-18, 2023, Art no. 4703018

  16. arXiv:2210.06501  [pdf, other

    cs.CV

    Robust Action Segmentation from Timestamp Supervision

    Authors: Yaser Souri, Yazan Abu Farha, Emad Bahrami, Gianpiero Francesca, Juergen Gall

    Abstract: Action segmentation is the task of predicting an action label for each frame of an untrimmed video. As obtaining annotations to train an approach for action segmentation in a fully supervised way is expensive, various approaches have been proposed to train action segmentation models using different forms of weak supervision, e.g., action transcripts, action sets, or more recently timestamps. Times… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  17. arXiv:2210.04085  [pdf, other

    cs.CV

    Dual Pyramid Generative Adversarial Networks for Semantic Image Synthesis

    Authors: Shijie Li, Ming-Ming Cheng, Juergen Gall

    Abstract: The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps. It is highly relevant for tasks like content generation and image editing. Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales. In particular, small objects tend to fade away and large objects are often generated as collages of patc… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: BMVC2022

  18. arXiv:2209.12074  [pdf, other

    cs.CV

    Self-supervised Learning for Unintentional Action Prediction

    Authors: Olga Zatsarynna, Yazan Abu Farha, Juergen Gall

    Abstract: Distinguishing if an action is performed as intended or if an intended action fails is an important skill that not only humans have, but that is also important for intelligent systems that operate in human environments. Recognizing if an action is unintentional or anticipating if an action will fail, however, is not straightforward due to lack of annotated data. While videos of unintentional or fa… ▽ More

    Submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted to GCPR 2022

  19. arXiv:2209.07547  [pdf, other

    cs.CV cs.LG

    One-Shot Synthesis of Images and Segmentation Masks

    Authors: Vadim Sushko, Dan Zhang, Juergen Gall, Anna Khoreva

    Abstract: Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations. However, to learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data, which limits their utilization in restricted image domains. In… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted as a conference paper at IEEE Winter Conference on Applications of Computer Vision (WACV) 2023

  20. arXiv:2209.00638  [pdf, other

    cs.CV

    Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

    Authors: Nadine Behrmann, S. Alireza Golestaneh, Zico Kolter, Juergen Gall, Mehdi Noroozi

    Abstract: This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., map** a sequence of video frames to a sequence of action segments. Our proposed method involves a s… ▽ More

    Submitted 11 October, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: ECCV 2022 (Main Conference)

  21. arXiv:2206.08929  [pdf, other

    cs.CV cs.AI

    TAVA: Template-free Animatable Volumetric Actors

    Authors: Ruilong Li, Julian Tanke, Minh Vo, Michael Zollhofer, Jurgen Gall, Angjoo Kanazawa, Christoph Lassner

    Abstract: Coordinate-based volumetric representations have the potential to generate photo-realistic virtual avatars from images. However, virtual avatars also need to be controllable even to a novel pose that may not have been observed. Traditional techniques, such as LBS, provide such a function; yet it usually requires a hand-designed body template, 3D scan data, and limited appearance models. On the oth… ▽ More

    Submitted 20 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/facebookresearch/tava; Project Website: https://www.liruilong.cn/projects/tava/

  22. arXiv:2206.06741  [pdf, other

    cs.CV

    Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis

    Authors: Rania Briq, Chuhang Zou, Leonid Pishchulin, Chris Broaddus, Juergen Gall

    Abstract: We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths. Existing approaches have mastered motion sequence generation in single action scenarios, but fail to generalize to multi-action and arbitrary-length sequences. We fill this gap by proposing a novel efficient approach that leverages expressiveness of Recurrent Transformers and generative richness of co… ▽ More

    Submitted 27 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: accepted at Transformers for Vision workshop at CVPR 2022

  23. arXiv:2201.11736  [pdf, other

    cs.CV

    Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives

    Authors: David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, Mehdi Noroozi

    Abstract: This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

    Comments: AAAI 2022 (Main Track)

  24. arXiv:2111.15667  [pdf, other

    cs.CV

    Adaptive Token Sampling For Efficient Vision Transformers

    Authors: Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, Sunando Sengupta, Hamid Reza Vaezi Joze, Eric Sommerlade, Hamed Pirsiavash, Juergen Gall

    Abstract: While state-of-the-art vision transformer models achieve promising results in image classification, they are computationally expensive and require many GFLOPs. Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Ada… ▽ More

    Submitted 26 July, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: ECCV 2022

  25. arXiv:2111.08279  [pdf, other

    cs.CV

    Keypoint Message Passing for Video-based Person Re-Identification

    Authors: Di Chen, Andreas Doering, Shanshan Zhang, Jian Yang, Juergen Gall, Bernt Schiele

    Abstract: Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras. Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from… ▽ More

    Submitted 13 December, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: To appear in AAAI 2022

  26. arXiv:2110.14392  [pdf, other

    cs.CV

    TaylorSwiftNet: Taylor Driven Temporal Modeling for Swift Future Frame Prediction

    Authors: Saber Pourheydari, Emad Bahrami, Mohsen Fayyaz, Gianpiero Francesca, Mehdi Noroozi, Juergen Gall

    Abstract: While recurrent neural networks (RNNs) demonstrate outstanding capabilities for future video frame prediction, they model dynamics in a discrete time space, i.e., they predict the frames sequentially with a fixed temporal step. RNNs are therefore prone to accumulate the error as the number of future frames increases. In contrast, partial differential equations (PDEs) model physical phenomena like… ▽ More

    Submitted 12 October, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: BMVC 2022

  27. arXiv:2109.11593  [pdf, other

    cs.CV

    Long Short View Feature Decomposition via Contrastive Video Representation Learning

    Authors: Nadine Behrmann, Mohsen Fayyaz, Juergen Gall, Mehdi Noroozi

    Abstract: Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more be… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (Main Conference)

  28. arXiv:2108.03894  [pdf, other

    cs.CV cs.LG

    FIFA: Fast Inference Approximation for Action Segmentation

    Authors: Yaser Souri, Yazan Abu Farha, Fabien Despinoy, Gianpiero Francesca, Juergen Gall

    Abstract: We introduce FIFA, a fast approximate inference method for action segmentation and alignment. Unlike previous approaches, FIFA does not rely on expensive dynamic programming for inference. Instead, it uses an approximate differentiable energy function that can be minimized using gradient-descent. FIFA is a general approach that can replace exact inference improving its speed by more than 5 times w… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

  29. Using Visual Anomaly Detection for Task Execution Monitoring

    Authors: Santosh Thoduka, Juergen Gall, Paul G. Plöger

    Abstract: Execution monitoring is essential for robots to detect and respond to failures. Since it is impossible to enumerate all failures for a given task, we learn from successful executions of the task to detect visual anomalies during runtime. Our method learns to predict the motions that occur during the nominal execution of a task, including camera and robot body motion. A probabilistic U-Net architec… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  30. arXiv:2107.09504  [pdf, other

    cs.CV

    Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos

    Authors: Olga Zatsarynna, Yazan Abu Farha, Juergen Gall

    Abstract: Anticipating human actions is an important task that needs to be addressed for the development of reliable intelligent agents, such as self-driving cars or robot assistants. While the ability to make future predictions with high accuracy is crucial for designing the anticipation approaches, the speed at which the inference is performed is not less important. Methods that are accurate but not suffi… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: CVPR Precognition Workshop

  31. arXiv:2107.01869  [pdf, other

    cs.CV

    Towards Better Adversarial Synthesis of Human Images from Text

    Authors: Rania Briq, Pratika Kochar, Juergen Gall

    Abstract: This paper proposes an approach that generates multiple 3D human meshes from text. The human shapes are represented by 3D meshes based on the SMPL model. The model's performance is evaluated on the COCO dataset, which contains challenging human shapes and intricate interactions between individuals. The model is able to capture the dynamics of the scene and the interactions between individuals base… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

  32. Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

    Authors: Xieyuanli Chen, Shijie Li, Benedikt Mersch, Louis Wiesmann, Jürgen Gall, Jens Behley, Cyrill Stachniss

    Abstract: The ability to detect and segment moving objects in a scene is essential for building consistent maps, making future state predictions, avoiding collisions, and planning. In this paper, we address the problem of moving object segmentation from 3D LiDAR scans. We propose a novel approach that pushes the current state of the art in LiDAR-only moving object segmentation forward to provide relevant in… ▽ More

    Submitted 13 July, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

    Comments: Accepted by RA-L with IROS 2021

  33. arXiv:2105.05847  [pdf, other

    cs.CV cs.LG

    Learning to Generate Novel Scene Compositions from Single Images and Videos

    Authors: Vadim Sushko, Juergen Gall, Anna Khoreva

    Abstract: Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout re… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: The AI for Content Creation (AICC) workshop at CVPR 2021. The full 8-page version of this submission is available at arXiv:2103.13389

  34. arXiv:2103.13389  [pdf, other

    cs.CV cs.LG

    Generating Novel Scene Compositions from Single Images and Videos

    Authors: Vadim Sushko, Dan Zhang, Juergen Gall, Anna Khoreva

    Abstract: Given a large dataset for training, generative adversarial networks (GANs) can achieve remarkable performance for the image synthesis task. However, training GANs in extremely low data regimes remains a challenge, as overfitting often occurs, leading to memorization or training divergence. In this work, we introduce SIV-GAN, an unconditional generative model that can generate new scene composition… ▽ More

    Submitted 13 December, 2023; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in Computer Vision and Image Understanding: https://www.sciencedirect.com/science/article/pii/S1077314223002680. Code repository: https://github.com/boschresearch/one-shot-synthesis

  35. arXiv:2103.06669  [pdf, other

    cs.CV

    Temporal Action Segmentation from Timestamp Supervision

    Authors: Zhe Li, Yazan Abu Farha, Juergen Gall

    Abstract: Temporal action segmentation approaches have been very successful recently. However, annotating videos with frame-wise labels to train such models is very expensive and time consuming. While weakly supervised methods trained using only ordered action lists require less annotation effort, the performance is still worse than fully supervised approaches. In this paper, we propose to use timestamp sup… ▽ More

    Submitted 26 March, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  36. arXiv:2101.09745  [pdf, other

    cs.CV

    Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

    Authors: Julian Tanke, Juergen Gall

    Abstract: In this work we propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Estimating 3D human poses from multiple views has several compelling properties: human poses are estimated within a global coordinate space and multiple cameras provide an extended field of view which helps in resolving ambiguities, occlusions and motion blur. Our approach builds… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: German Conference on Pattern Recognition 2019

    Journal ref: GCPR 2019, pages 537--550

  37. arXiv:2101.08581  [pdf, other

    cs.CV

    Hierarchical Graph-RNNs for Action Detection of Multiple Activities

    Authors: Sovan Biswas, Yaser Souri, Juergen Gall

    Abstract: In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at ICIP 2019

  38. arXiv:2101.08567  [pdf, other

    cs.CV

    Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting

    Authors: Sovan Biswas, Juergen Gall

    Abstract: Since collecting and annotating data for spatio-temporal action detection is very expensive, there is a need to learn approaches with less supervision. Weakly supervised approaches do not require any bounding box annotations and can be trained only from labels that indicate whether an action occurs in a video clip. Current approaches, however, cannot handle the case when there are multiple persons… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted in ACCV 2020

  39. arXiv:2012.04781  [pdf, other

    cs.CV cs.LG eess.IV

    You Only Need Adversarial Supervision for Semantic Image Synthesis

    Authors: Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, Anna Khoreva

    Abstract: Despite their recent successes, GAN models for semantic image synthesis still suffer from poor image quality when trained with only adversarial supervision. Historically, additionally employing the VGG-based perceptual loss has helped to overcome this issue, significantly improving the synthesis quality, but at the same time limiting the progress of GAN models for semantic image synthesis. In this… ▽ More

    Submitted 19 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Published at ICLR 2021 (Main Conference). Code repository: https://github.com/boschresearch/OASIS

  40. arXiv:2011.08652  [pdf, other

    cs.CV

    3D CNNs with Adaptive Temporal Feature Resolutions

    Authors: Mohsen Fayyaz, Emad Bahrami, Ali Diba, Mehdi Noroozi, Ehsan Adeli, Luc Van Gool, Juergen Gall

    Abstract: While state-of-the-art 3D Convolutional Neural Networks (CNN) achieve very good results on action recognition datasets, they are computationally very expensive and require many GFLOPs. While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips. In this work, we therefore introduce a different… ▽ More

    Submitted 11 August, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: CVPR 2021

  41. arXiv:2011.06243  [pdf, ps, other

    cs.CV

    PoseTrackReID: Dataset Description

    Authors: Andreas Doering, Di Chen, Shanshan Zhang, Bernt Schiele, Juergen Gall

    Abstract: Current datasets for video-based person re-identification (re-ID) do not include structural knowledge in form of human pose annotations for the persons of interest. Nonetheless, pose information is very helpful to disentangle useful feature information from background or occlusion noise. Especially real-world scenarios, such as surveillance, contain a lot of occlusions in human crowds or by obstac… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  42. arXiv:2011.06037  [pdf, other

    cs.CV cs.LG

    Unsupervised Video Representation Learning by Bidirectional Feature Prediction

    Authors: Nadine Behrmann, Juergen Gall, Mehdi Noroozi

    Abstract: This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from unobserved past frames is complementary to one that originates from the future frames. The rationale behind our method is to encourage the network to explore the te… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted at WACV 2021

  43. arXiv:2010.07367  [pdf, other

    cs.CV

    Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition

    Authors: Shijie Li, **hui Yi, Yazan Abu Farha, Juergen Gall

    Abstract: With the advances in capturing 2D or 3D skeleton data, skeleton-based action recognition has received an increasing interest over the last years. As skeleton data is commonly represented by graphs, graph convolutional networks have been proposed for this task. While current graph convolutional networks accurately recognize actions, they are too expensive for robotics applications where limited com… ▽ More

    Submitted 18 January, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)

  44. arXiv:2009.01142  [pdf, other

    cs.CV

    Long-Term Anticipation of Activities with Cycle Consistency

    Authors: Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, Juergen Gall

    Abstract: With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and th… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: GCPR 2020

  45. arXiv:2008.09162  [pdf, other

    cs.CV

    Multi-scale Interaction for Real-time LiDAR Data Segmentation on an Embedded Platform

    Authors: Shijie Li, Xieyuanli Chen, Yun Liu, Dengxin Dai, Cyrill Stachniss, Juergen Gall

    Abstract: Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles, which are usually equipped with an embedded platform and have limited computational resources. Approaches that operate directly on the point cloud use complex spatial aggregation operations, which are very expensive and difficult to optimize for embedded platforms. They are therefore not suitable for real-t… ▽ More

    Submitted 28 November, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Journal ref: IEEE Robotics and Automation Letters (RA-L) 2021

  46. arXiv:2008.05023  [pdf, other

    cs.CV

    Audio- and Gaze-driven Facial Animation of Codec Avatars

    Authors: Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

    Abstract: Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i.e., for virtual reality), and are almost indistinguishable from video. In this paper we describe the first approach to animate these parametric models in real-time which could be deployed on commodity virtual reality hardware using audio and/or eye trackin… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  47. arXiv:2008.03928  [pdf, other

    cs.CV

    Rethinking 3D LiDAR Point Cloud Segmentation

    Authors: Shijie Li, Yun Liu, Juergen Gall

    Abstract: Many point-based semantic segmentation methods have been designed for indoor scenarios, but they struggle if they are applied to point clouds that are captured by a LiDAR sensor in an outdoor environment. In order to make these methods more efficient and robust such that they can handle LiDAR data, we introduce the general concept of reformulating 3D point-based operations such that they can opera… ▽ More

    Submitted 2 December, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: TNNLS 2021

  48. arXiv:2006.09220  [pdf, other

    cs.CV

    MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation

    Authors: Shijie Li, Yazan Abu Farha, Yun Liu, Ming-Ming Cheng, Juergen Gall

    Abstract: With the success of deep learning in classifying short trimmed videos, more attention has been focused on temporally segmenting and classifying activities in long untrimmed videos. State-of-the-art approaches for action segmentation utilize several layers of temporal convolution and temporal pooling. Despite the capabilities of these approaches in capturing temporal dependencies, their predictions… ▽ More

    Submitted 2 September, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: substantial text overlap with arXiv:1903.01945

  49. arXiv:2005.09743  [pdf, ps, other

    cs.CV

    On Evaluating Weakly Supervised Action Segmentation Methods

    Authors: Yaser Souri, Alexander Richard, Luca Minciullo, Juergen Gall

    Abstract: Action segmentation is the task of temporally segmenting every frame of an untrimmed video. Weakly supervised approaches to action segmentation, especially from transcripts have been of considerable interest to the computer vision community. In this work, we focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches that are often overlooked: the performance… ▽ More

    Submitted 21 October, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: Technical Report

  50. arXiv:2005.00340  [pdf, other

    cs.CV

    Adversarial Synthesis of Human Pose from Text

    Authors: Yifei Zhang, Rania Briq, Julian Tanke, Juergen Gall

    Abstract: This work focuses on synthesizing human poses from human-level text descriptions. We propose a model that is based on a conditional generative adversarial network. It is designed to generate 2D human poses conditioned on human-written text descriptions. The model is trained and evaluated using the COCO dataset, which consists of images capturing complex everyday scenes with various human poses. We… ▽ More

    Submitted 16 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: DAGM GCPR 2020