Skip to main content

Showing 1–17 of 17 results for author: Stergiou, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09754  [pdf, other

    cs.CV

    LAVIB: A Large-scale Video Interpolation Benchmark

    Authors: Alexandros Stergiou

    Abstract: This paper introduces a LArge-scale Video Interpolation Benchmark (LAVIB) for the low-level video task of video frame interpolation (VFI). LAVIB comprises a large collection of high-resolution videos sourced from the web through an automated pipeline with minimal requirements for human verification. Metrics are computed for each video's motion magnitudes, luminance conditions, frame sharpness, and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Website: https://alexandrosstergiou.github.io/datasets/LAVIB/

  2. arXiv:2403.18074  [pdf, other

    cs.CV eess.IV

    Every Shot Counts: Using Exemplars for Repetition Counting in Videos

    Authors: Saptarshi Sinha, Alexandros Stergiou, Dima Damen

    Abstract: Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project website: https://sinhasaptarshi.github.io/escounts

  3. arXiv:2311.01851  [pdf, other

    cs.CV

    Holistic Representation Learning for Multitask Trajectory Anomaly Detection

    Authors: Alexandros Stergiou, Brent De Weerdt, Nikos Deligiannis

    Abstract: Video anomaly detection deals with the recognition of abnormal events in videos. Apart from the visual signal, video anomaly detection has also been addressed with the use of skeleton sequences. We propose a holistic representation of skeleton trajectories to learn expected motions across segments at different times. Our approach uses multitask learning to reconstruct any continuous unobserved tem… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted at Winter Conference on Applications of Computer Vision (WACV) 2023

  4. arXiv:2303.09941  [pdf, other

    cs.CV

    Lea** Into Memories: Space-Time Deep Feature Synthesis

    Authors: Alexandros Stergiou, Nikos Deligiannis

    Abstract: The success of deep learning models has led to their adaptation and adoption by prominent video understanding methods. The majority of these approaches encode features in a joint space-time modality for which the inner workings and learned representations are difficult to visually interpret. We propose LEArned Preconscious Synthesis (LEAPS), an architecture-independent method for synthesizing vide… ▽ More

    Submitted 25 July, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Accepted at IEEE/CVF International Conference on Computer Vision (ICCV) 2023

  5. arXiv:2210.11328  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Play It Back: Iterative Attention for Audio Recognition

    Authors: Alexandros Stergiou, Dima Damen

    Abstract: A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative sounds to increase their prediction confidence. We propose an end-to-end attention-based architecture that through selective repetition attends over the most discr… ▽ More

    Submitted 12 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

  6. arXiv:2204.13340  [pdf, other

    cs.CV

    The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

    Authors: Alexandros Stergiou, Dima Damen

    Abstract: Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video. We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. Our proposed Temporal Progressive (TemPr) model is composed of multiple attention towers, one for each scale. The predic… ▽ More

    Submitted 1 April, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  7. arXiv:2111.00772  [pdf, other

    cs.CV

    AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. Meeting both these requirements remains a challenge. To th… ▽ More

    Submitted 2 December, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

  8. Efficient Modelling Across Time of Human Actions and Interactions

    Authors: Alexandros Stergiou

    Abstract: This thesis focuses on video understanding for human action and interaction recognition. We start by identifying the main challenges related to action recognition from videos and review how they have been addressed by current methods. Based on these challenges, and by focusing on the temporal aspect of actions, we argue that current fixed-sized spatio-temporal kernels in 3D convolutional neural… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: PhD thesis

  9. arXiv:2101.12447  [pdf, other

    cs.CV

    The Mind's Eye: Visualizing Class-Agnostic Features of CNNs

    Authors: Alexandros Stergiou

    Abstract: Visual interpretability of Convolutional Neural Networks (CNNs) has gained significant popularity because of the great challenges that CNN complexity imposes to understanding their inner workings. Although many techniques have been proposed to visualize class features of CNNs, most of them do not provide a correspondence between inputs and the extracted features in specific layers. This prevents t… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

  10. arXiv:2101.00440  [pdf, other

    cs.CV

    Refining activation downsampling with SoftPool

    Authors: Alexandros Stergiou, Ronald Poppe, Grigorios Kalliatakis

    Abstract: Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial to increase the receptive fields and to reduce computational requirements of subsequent convolutions. An important feature of the pooling operation is the minimization of information loss, with respect to the initial activation maps, without a significant impact on the computation and… ▽ More

    Submitted 18 March, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

  11. arXiv:2011.03949  [pdf, other

    cs.CV

    Multi-Temporal Convolutions for Human Action Recognition in Videos

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved to extract informative motions that are executed at different time scales. To address this challenge, we present a novel spatio-temporal convolution block that… ▽ More

    Submitted 31 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

  12. Learn to cycle: Time-consistent feature discovery for action recognition

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeez… ▽ More

    Submitted 23 June, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  13. Learning Class Regularized Features for Action Recognition

    Authors: Alexandros Stergiou, Ronald Poppe, Remco C. Veltkamp

    Abstract: Training Deep Convolutional Neural Networks (CNNs) is based on the notion of using multiple kernels and non-linearities in their subsequent activations to extract useful features. The kernels are used as general feature extractors without specific correspondence to the target class. As a result, the extracted features do not correspond to specific classes. Subtle differences between similar classe… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  14. Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we introduce a novel convolution block for CNN architectures with video input. Our proposed Fractioned Adjacent Spatial and Temporal (FAST) 3D convolutions are a na… ▽ More

    Submitted 22 October, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

  15. Class Feature Pyramids for Video Explanation

    Authors: Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Ronald Poppe, Remco Veltkamp

    Abstract: Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spat… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

  16. Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

    Authors: Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Remco Veltkamp, Ronald Poppe

    Abstract: Deep learning approaches have been established as the main methodology for video classification and recognition. Recently, 3-dimensional convolutions have been used to achieve state-of-the-art performance in many challenging video datasets. Because of the high level of complexity of these methods, as the convolution operations are also extended to additional dimension in order to extract features… ▽ More

    Submitted 12 May, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Journal ref: IEEE International Conference on Image Processing (ICIP 2019)

  17. Analyzing Human-Human Interactions: A Survey

    Authors: Alexandros Stergiou, Ronald Poppe

    Abstract: Many videos depict people, and it is their interactions that inform us of their activities, relation to one another and the cultural and social setting. With advances in human action recognition, researchers have begun to address the automated recognition of these human-human interactions from video. The main challenges stem from dealing with the considerable variation in recording setting, the ap… ▽ More

    Submitted 17 August, 2019; v1 submitted 31 July, 2018; originally announced August 2018.