Skip to main content

Showing 1–18 of 18 results for author: Poppe, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19006  [pdf, other

    cs.CV

    VideoMambaPro: A Leap Forward for Mamba in Video Understanding

    Authors: Hui Lu, Albert Ali Salah, Ronald Poppe

    Abstract: Video understanding requires the extraction of rich spatio-temporal representations, which transformer models achieve through self-attention. Unfortunately, self-attention poses a computational burden. In NLP, Mamba has surfaced as an efficient alternative for transformers. However, Mamba's successes do not trivially extend to computer vision tasks, including those in video analysis. In this paper… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2403.16128  [pdf, other

    cs.CV

    Enhancing Video Transformers for Action Understanding with VLM-aided Training

    Authors: Hui Lu, Hu Jian, Ronald Poppe, Albert Ali Salah

    Abstract: Owing to their ability to extract relevant spatio-temporal video embeddings, Vision Transformers (ViTs) are currently the best performing models in video action understanding. However, their generalization over domains or datasets is somewhat limited. In contrast, Visual Language Models (VLMs) have demonstrated exceptional generalization performance, but are currently unable to process videos. Con… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  3. arXiv:2403.11818  [pdf, other

    cs.CV

    TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions

    Authors: Hui Lu, Albert Ali Salah, Ronald Poppe

    Abstract: A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuou… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  4. arXiv:2312.06285  [pdf, other

    cs.CV cs.AI

    Compensation Sampling for Improved Convergence in Diffusion Models

    Authors: Hui Lu, Albert ali Salah, Ronald Poppe

    Abstract: Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address th… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  5. arXiv:2111.00772  [pdf, other

    cs.CV

    AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. Meeting both these requirements remains a challenge. To th… ▽ More

    Submitted 2 December, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

  6. arXiv:2105.05312  [pdf, other

    cs.CV

    Incremental Few-Shot Instance Segmentation

    Authors: Dan Andrei Ganea, Bas Boom, Ronald Poppe

    Abstract: Few-shot instance segmentation methods are promising when labeled training data for novel classes is scarce. However, current approaches do not facilitate flexible addition of novel classes. They also require that examples of each class are provided at train and test time, which is memory intensive. In this paper, we address these limitations by presenting the first incremental approach to few-sho… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  7. arXiv:2101.00440  [pdf, other

    cs.CV

    Refining activation downsampling with SoftPool

    Authors: Alexandros Stergiou, Ronald Poppe, Grigorios Kalliatakis

    Abstract: Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial to increase the receptive fields and to reduce computational requirements of subsequent convolutions. An important feature of the pooling operation is the minimization of information loss, with respect to the initial activation maps, without a significant impact on the computation and… ▽ More

    Submitted 18 March, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

  8. arXiv:2011.03949  [pdf, other

    cs.CV

    Multi-Temporal Convolutions for Human Action Recognition in Videos

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved to extract informative motions that are executed at different time scales. To address this challenge, we present a novel spatio-temporal convolution block that… ▽ More

    Submitted 31 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

  9. Learn to cycle: Time-consistent feature discovery for action recognition

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeez… ▽ More

    Submitted 23 June, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  10. Learning Class Regularized Features for Action Recognition

    Authors: Alexandros Stergiou, Ronald Poppe, Remco C. Veltkamp

    Abstract: Training Deep Convolutional Neural Networks (CNNs) is based on the notion of using multiple kernels and non-linearities in their subsequent activations to extract useful features. The kernels are used as general feature extractors without specific correspondence to the target class. As a result, the extracted features do not correspond to specific classes. Subtle differences between similar classe… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  11. Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we introduce a novel convolution block for CNN architectures with video input. Our proposed Fractioned Adjacent Spatial and Temporal (FAST) 3D convolutions are a na… ▽ More

    Submitted 22 October, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

  12. Class Feature Pyramids for Video Explanation

    Authors: Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Ronald Poppe, Remco Veltkamp

    Abstract: Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spat… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

  13. arXiv:1909.06761  [pdf, other

    cs.CV cs.LG

    Multitask Learning to Improve Egocentric Action Recognition

    Authors: Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp

    Abstract: In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks to train complex neural networks. It allows training a network for multiple objectives in parallel, in order to improve performance on at least one of them by capitalizing on a shared representation that is developed to accommodate more information than it otherwise would for a single t… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: 10 pages, 3 figures, accepted at the 5th Egocentric Perception, Interaction and Computing (EPIC) workshop at ICCV 2019, code repository: https://github.com/georkap/hand_track_classification

  14. arXiv:1906.08331  [pdf, other

    cs.CV eess.IV

    Light Field Saliency Detection with Deep Convolutional Networks

    Authors: Jun Zhang, Yamei Liu, Sheng** Zhang, Ronald Poppe, Meng Wang

    Abstract: Light field imaging presents an attractive alternative to RGB imaging because of the recording of the direction of the incoming light. The detection of salient regions in a light field image benefits from the additional modeling of angular patterns. For RGB imaging, methods using CNNs have achieved excellent results on a range of tasks, including saliency detection. However, it is not trivial to u… ▽ More

    Submitted 29 October, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: 14 pages, 14 figures

  15. arXiv:1905.00742  [pdf, other

    cs.CV

    Egocentric Hand Track and Object-based Human Action Recognition

    Authors: Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas P. J. J. Noldus, Remco C. Veltkamp

    Abstract: Egocentric vision is an emerging field of computer vision that is characterized by the acquisition of images and video from the first person perspective. In this paper we address the challenge of egocentric human action recognition by utilizing the presence and position of detected regions of interest in the scene explicitly, without further use of visual features. Initially, we recognize that h… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: Accepted for publication at UIC 2019:Track 3, 8 pages, 5 figures, Index terms: egocentric action recognition, hand detection, hand tracking, hand identification, sequence classification, code available at: https://github.com/georkap/hand_track_classification

  16. Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

    Authors: Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Remco Veltkamp, Ronald Poppe

    Abstract: Deep learning approaches have been established as the main methodology for video classification and recognition. Recently, 3-dimensional convolutions have been used to achieve state-of-the-art performance in many challenging video datasets. Because of the high level of complexity of these methods, as the convolution operations are also extended to additional dimension in order to extract features… ▽ More

    Submitted 12 May, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Journal ref: IEEE International Conference on Image Processing (ICIP 2019)

  17. arXiv:1811.01938  [pdf

    cs.CL cs.HC

    A personal model of trumpery: Deception detection in a real-world high-stakes setting

    Authors: Sophie van der Zee, Ronald Poppe, Alice Havrileck, Aurelien Baillon

    Abstract: Language use reveals information about who we are and how we feel1-3. One of the pioneers in text analysis, Walter Weintraub, manually counted which types of words people used in medical interviews and showed that the frequency of first-person singular pronouns (i.e., I, me, my) was a reliable indicator of depression, with depressed people using I more often than people who are not depressed4. Sev… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

  18. Analyzing Human-Human Interactions: A Survey

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Many videos depict people, and it is their interactions that inform us of their activities, relation to one another and the cultural and social setting. With advances in human action recognition, researchers have begun to address the automated recognition of these human-human interactions from video. The main challenges stem from dealing with the considerable variation in recording setting, the ap… ▽ More

    Submitted 17 August, 2019; v1 submitted 31 July, 2018; originally announced August 2018.