Skip to main content

Showing 1–47 of 47 results for author: Bremond, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09390  [pdf, other

    cs.CV cs.LG

    LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

    Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

    Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2311.02432  [pdf, other

    cs.CV

    P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification

    Authors: Abid Ali, Ashish Marisetty, Francois Bremond

    Abstract: Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Journal ref: WACV 2024

  3. arXiv:2309.06130  [pdf, other

    cs.CV cs.AI

    JOADAA: joint online action detection and action anticipation

    Authors: Mohammed Guermal, Francois Bremond, Rui Dai, Abid Ali

    Abstract: Action anticipation involves forecasting future actions by connecting past events to future ones. However, this reasoning ignores the real-life hierarchy of events which is considered to be composed of three main parts: past, present, and future. We argue that considering these three main parts and their dependencies could improve performance. On the other hand, online action detection is the task… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  4. arXiv:2309.00696  [pdf, other

    cs.CV

    AAN: Attributes-Aware Network for Temporal Action Detection

    Authors: Rui Dai, Srijan Das, Michael S. Ryoo, Francois Bremond

    Abstract: The challenge of long-term video understanding remains constrained by the efficient extraction of object semantics and the modelling of their relationships for downstream tasks. Although the CLIP visual features exhibit discriminative properties for various vision tasks, particularly in object encoding, they are suboptimal for long-term video understanding. To address this issue, we present the At… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  5. arXiv:2308.14500  [pdf, other

    cs.CV

    LAC: Latent Action Composition for Skeleton-based Action Segmentation

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions. However, their performances remain limited as the visual features cannot sufficiently express composable actions. In thi… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  6. MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions

    Authors: Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Dominik Schiller, Mohammed Guermal, Dominike Thomas, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

    Abstract: Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes t… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: ACM MultiMedia'23

  7. arXiv:2305.06437  [pdf, other

    cs.CV cs.AI

    Self-Supervised Video Representation Learning via Latent Time Navigation

    Authors: Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time. This leads to loss of pertinent information related to temporal relationships, rendering actions such as `enter' and `leave' to be indistinguishable. To mitigate this limitation, we propose Latent Time Navigation (LTN), a… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: AAAI 2023

  8. arXiv:2301.07923  [pdf

    cs.CV

    Human-Scene Network: A Novel Baseline with Self-rectifying Loss for Weakly supervised Video Anomaly Detection

    Authors: Snehashis Majhi, Rui Dai, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Video anomaly detection in surveillance systems with only video-level labels (i.e. weakly-supervised) is challenging. This is due to, (i) the complex integration of human and scene based anomalies comprising of subtle and sharp spatio-temporal cues in real-world scenarios, (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we propose a Human-S… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  9. Learning Invariance from Generated Variance for Unsupervised Person Re-identification

    Authors: Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond

    Abstract: This work focuses on unsupervised representation learning in person re-identification (ReID). Recent self-supervised contrastive learning methods learn invariance by maximizing the representation similarity between two augmented views of a same image. However, traditional data augmentation may bring to the fore undesirable distortions on identity features, which is not always favorable in id-sensi… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: Extension of conference paper arXiv:2012.09071. Accepted to TPAMI. Project page: https://github.com/chenhao2345/GCL-extended

  10. arXiv:2212.03968  [pdf, other

    cs.CV

    Multimodal Vision Transformers with Forced Attention for Behavior Analysis

    Authors: Tanay Agrawal, Michal Balazia, Philipp Müller, François Brémond

    Abstract: Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transfor… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Preprint. Full paper accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, Jan 2023. 11 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  11. arXiv:2209.00065  [pdf, other

    cs.CV

    ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings. When dealing with estimated skeleton data in real-world videos, such methods perform poorly due to the large variations across subjects and camera viewpoints. To address this issue, we introduce ViA, a novel View-In… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: project website: https://walker-a11y.github.io/ViA-project

  12. arXiv:2208.09191  [pdf, other

    cs.CV

    Synthetic Data in Human Analysis: A Survey

    Authors: Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch, Francois Bremond, Antitza Dantcheva

    Abstract: Deep neural networks have become prevalent in human analysis, boosting the performance of applications, such as biometric recognition, action recognition, as well as person re-identification. However, the performance of such networks scales with the available training data. In human analysis, the demand for large-scale datasets poses a severe challenge, as data collection is tedious, time-expensiv… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  13. Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation

    Authors: Michal Balazia, Philipp Müller, Ákos Levente Tánczos, August von Liechtenstein, François Brémond

    Abstract: Body language is an eye-catching social signal and its automatic analysis can significantly advance artificial intelligence systems to understand and actively participate in social interactions. While computer vision has made impressive progress in low-level tasks like head and body pose estimation, the detection of more subtle behaviors such as gesturing, grooming, or fumbling is not well explore… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: Preprint. Full paper accepted at the ACM International Conference on Multimedia (ACMMM), Lisbon, Portugal, October 2022. 10 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  14. arXiv:2204.09468  [pdf, other

    cs.CV

    THORN: Temporal Human-Object Relation Network for Action Recognition

    Authors: Mohammed Guermal, Rui Dai, Francois Bremond

    Abstract: Most action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network:… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  15. arXiv:2203.09043  [pdf, other

    cs.CV

    Latent Image Animator: Learning to Animate Images via Latent Space Navigation

    Authors: Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva

    Abstract: Due to the remarkable progress of deep generative models, animating images has become increasingly efficient, whereas associated results have become increasingly realistic. Current animation-approaches commonly exploit structure representation extracted from driving videos. Such structure representation is instrumental in transferring motion from driving videos to still images. However, such appro… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: ICLR 2022, project link https://wyhsirius.github.io/LIA-project

  16. arXiv:2203.06468  [pdf, other

    cs.CV

    Unsupervised Lifelong Person Re-identification via Contrastive Rehearsal

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: Existing unsupervised person re-identification (ReID) methods focus on adapting a model trained on a source domain to a fixed target domain. However, an adapted ReID model usually only works well on a certain target domain, but can hardly memorize the source domain knowledge and generalize to upcoming unseen data. In this paper, we propose unsupervised lifelong person ReID, which focuses on contin… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.

  17. Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

    Authors: Tanay Agrawal, Dhruv Agarwal, Michal Balazia, Neelabh Sinha, Francois Bremond

    Abstract: Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, w… ▽ More

    Submitted 12 January, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Preprint. Final paper accepted at the 17th International Conference on Computer Vision Theory and Applications (VISAPP), virtual, February, 2022. 8 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  18. arXiv:2112.03902  [pdf, other

    cs.CV

    MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

    Authors: Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond

    Abstract: Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. The temporal relation is complex in those datasets, including challenges like composite action, and co-occurring action. For detecting actions in those complex videos, efficiently capturing both short-term and long-term temporal information in the video is critical. To this end, we… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: Accepted in CVPR 2022

  19. arXiv:2110.13473  [pdf, other

    cs.CV cs.AI

    CTRN: Class-Temporal Relational Network for Action Detection

    Authors: Rui Dai, Srijan Das, Francois Bremond

    Abstract: Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. There are many real-world challenges in those datasets, such as composite action, co-occurring action, and high temporal variation of instance duration. For handling these challenges, we propose to explore both the class and temporal relations of detected actions. In this work, we i… ▽ More

    Submitted 11 July, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  20. arXiv:2110.08270  [pdf, other

    cs.LG cs.CL

    From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

    Authors: Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, François Bremond

    Abstract: Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modali… ▽ More

    Submitted 19 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 10 pages

  21. FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

    Authors: Neelabh Sinha, Michal Balazia, Francois Bremond

    Abstract: 3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way o… ▽ More

    Submitted 7 December, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), virtual, November 2021. 8 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  22. arXiv:2108.08996  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Joint Anomaly Detection and Classification

    Authors: Snehashis Majhi, Srijan Das, Francois Bremond, Ratnakar Dash, Pankaj Kumar Sa

    Abstract: Anomaly activities such as robbery, explosion, accidents, etc. need immediate actions for preventing loss of human life and property in real world surveillance systems. Although the recent automation in surveillance systems are capable of detecting the anomalies, but they still need human efforts for categorizing the anomalies and taking necessary preventive actions. This is due to the lack of met… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: Provisionally accepted in the first round of FG 2021

  23. arXiv:2108.03619  [pdf

    cs.CV

    Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

    Authors: Rui Dai, Srijan Das, Francois Bremond

    Abstract: In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only categorizing actions, but also localizing them in untrimmed videos. Therefore, transferring knowledge pertaining to temporal relations is critical for this task which is m… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

  24. arXiv:2107.08580  [pdf, other

    cs.CV

    UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition

    Authors: Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

    Abstract: Action recognition based on skeleton data has recently witnessed increasing attention and progress. State-of-the-art approaches adopting Graph Convolutional networks (GCNs) can effectively extract features on human skeletons relying on the pre-defined human topology. Despite associated progress, GCN-based methods have difficulties to generalize across domains, especially with different human topol… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: Code is available at: https://github.com/YangDi666/UNIK

  25. arXiv:2105.08141  [pdf, other

    cs.CV cs.AI

    VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

    Authors: Srijan Das, Rui Dai, Di Yang, Francois Bremond

    Abstract: Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: submitted to a journal

  26. arXiv:2104.04546  [pdf, other

    eess.SP cs.LG stat.AP

    One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

    Authors: Laura M. Ferrari, Guy Abi Hanna, Paolo Volpe, Esma Ismailova, François Bremond, Maria A. Zuluaga

    Abstract: A limiting factor towards the wide routine use of wearables devices for continuous healthcare monitoring is their cumbersome and obtrusive nature. This is particularly true for electroencephalography (EEG) recordings, which require the placement of multiple electrodes in contact with the scalp. In this work, we propose to identify the optimal wearable EEG electrode set-up, in terms of minimal numb… ▽ More

    Submitted 19 May, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

  27. arXiv:2103.16364  [pdf, other

    cs.CV

    ICE: Inter-instance Contrastive Encoding for Unsupervised Person Re-identification

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: Unsupervised person re-identification (ReID) aims at learning discriminative identity features without annotations. Recently, self-supervised contrastive learning has gained increasing attention for its effectiveness in unsupervised representation learning. The main idea of instance contrastive learning is to match a same instance in different augmented views. However, the relationship between dif… ▽ More

    Submitted 18 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  28. How Unique Is a Face: An Investigative Study

    Authors: Michal Balazia, S L Happy, Francois Bremond, Antitza Dantcheva

    Abstract: Face recognition has been widely accepted as a means of identification in applications ranging from border control to security in the banking sector. Surprisingly, while widely accepted, we still lack the understanding of uniqueness or distinctiveness of faces as biometric modality. In this work, we study the impact of factors such as image resolution, feature representation, database size, age an… ▽ More

    Submitted 7 December, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: Preprint. Full paper accepted at the IEEE/IAPR International Conference on Pattern Recognition (ICPR), Milan, Italy, January 2021. 6 pages

    MSC Class: 68T05; 68T10 ACM Class: I.5

  29. arXiv:2101.03049  [pdf, other

    cs.CV

    InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation

    Authors: Yaohui Wang, Francois Bremond, Antitza Dantcheva

    Abstract: In this work, we introduce an unconditional video generative model, InMoDeGAN, targeted to (a) generate high quality videos, as well as to (b) allow for interpretation of the latent space. For the latter, we place emphasis on interpreting and manipulating motion. Towards this, we decompose motion into semantic sub-spaces, which allow for control of generated samples. We design the architecture of… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: Please visit https://wyhsirius.github.io/InMoDeGAN/ for introductions and more

  30. arXiv:2012.09071  [pdf, other

    cs.CV

    Joint Generative and Contrastive Learning for Unsupervised Person Re-identification

    Authors: Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond

    Abstract: Recent self-supervised contrastive learning provides an effective approach for unsupervised person re-identification (ReID) by learning invariance from different views (transformed versions) of an input. In this paper, we incorporate a Generative Adversarial Network (GAN) and a contrastive learning module into one joint training framework. While the GAN provides online data augmentation for contra… ▽ More

    Submitted 30 March, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: CVPR 2021. Source code: https://github.com/chenhao2345/GCL

  31. arXiv:2011.13776  [pdf, other

    cs.CV

    Enhancing Diversity in Teacher-Student Networks via Asymmetric branches for Unsupervised Person Re-identification

    Authors: Hao Chen, Benoit Lagadec, Francois Bremond

    Abstract: The objective of unsupervised person re-identification (Re-ID) is to learn discriminative features without labor-intensive identity annotations. State-of-the-art unsupervised Re-ID methods assign pseudo labels to unlabeled images in the target domain and learn from these noisy pseudo labels. Recently introduced Mean Teacher Model is a promising way to mitigate the label noise. However, during the… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: WACV 2021

  32. arXiv:2011.05358  [pdf, other

    cs.CV

    Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

    Authors: Di Yang, Rui Dai, Yaohui Wang, Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond

    Abstract: Taking advantage of human pose data for understanding human activities has attracted much attention these days. However, state-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to occlusion, truncation and low-resolution in real-world un-annotated videos. Hence, in this work, we propose 1) a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refin… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: WACV2021

  33. arXiv:2010.14982  [pdf

    cs.CV

    Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection

    Authors: Rui Dai, Srijan Das, Saurav Sharma, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, Gianpiero Francesca

    Abstract: Designing activity detection systems that can be successfully deployed in daily-living environments requires datasets that pose the challenges typical of real-world scenarios. In this paper, we introduce a new untrimmed daily-living dataset that features several real-world challenges: Toyota Smarthome Untrimmed (TSU). TSU contains a wide variety of activities performed in a spontaneous manner. The… ▽ More

    Submitted 10 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Toyota Smarthome Untrimmed dataset, project page: https://project.inria.fr/toyotasmarthome

  34. arXiv:2007.03056  [pdf, other

    cs.CV

    VPN: Learning Video-Pose Embedding for Activities of Daily Living

    Authors: Srijan Das, Saurav Sharma, Rui Dai, Francois Bremond, Monique Thonnat

    Abstract: In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL). ADL have two specific properties (i) subtle spatio-temporal patterns and (ii) similar visual patterns varying with time. Therefore, ADL may look very similar and often necessitate to look at their fine-grained details to distinguish them. Because the recent spatio-temporal 3D ConvNets are too rig… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Accepted in ECCV 2020

  35. arXiv:1912.05523  [pdf, other

    cs.CV

    G3AN: Disentangling Appearance and Motion for Video Generation

    Authors: Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva

    Abstract: Creating realistic human videos entails the challenge of being able to simultaneously generate both appearance, as well as motion. To tackle this challenge, we introduce G$^{3}$AN, a novel spatio-temporal generative model, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. The latter is achieved by decomposing appearanc… ▽ More

    Submitted 13 June, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: CVPR 2020, project link https://wyhsirius.github.io/G3AN/

  36. arXiv:1909.05704  [pdf, other

    cs.CV cs.LG

    Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints

    Authors: Carlos Caetano, François Brémond, William Robson Schwartz

    Abstract: In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition. To that end, two main baseline approaches have been researched: (i) Recurrent Neural Networks (RNNs) with Long-Short Term Memory (LSTM); and (ii) skeleton image representations used as input to a Convolutional Neural Network (CNN). Although RNN ap… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: Conference on Graphics, Patterns and Images (SIBGRAPI2019). arXiv admin note: substantial text overlap with arXiv:1907.13025

  37. arXiv:1907.13025  [pdf, other

    cs.CV cs.LG eess.IV

    SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition

    Authors: Carlos Caetano, Jessica Sena, François Brémond, Jefersson A. dos Santos, William Robson Schwartz

    Abstract: Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of eac… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

    Comments: 16-th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS2019)

  38. arXiv:1802.00421  [pdf, other

    cs.CV

    Deep-Temporal LSTM for Daily Living Action Recognition

    Authors: Srijan Das, Michal Koperski, Francois Bremond, Gianpiero Francesca

    Abstract: In this paper, we propose to improve the traditional use of RNNs by employing a many to many model for video classification. We analyze the importance of modeling spatial layout and temporal encoding for daily living action recognition. Many RGB methods focus only on short term temporal information obtained from optical flow. Skeleton based methods on the other hand show that modeling long term sk… ▽ More

    Submitted 15 June, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

    Comments: Submitted in conference

  39. arXiv:1607.05975  [pdf, other

    cs.CV

    Person Re-identification for Real-world Surveillance Systems

    Authors: Furqan M. Khan, Francois Bremond

    Abstract: Appearance based person re-identification in a real-world video surveillance system with non-overlap** camera views is a challenging problem for many reasons. Current state-of-the-art methods often address the problem by relying on supervised learning of similarity metrics or ranking functions to implicitly model appearance transformation between cameras for each camera pair, or group, in the sy… ▽ More

    Submitted 20 July, 2016; originally announced July 2016.

    Comments: Person re-identification, Visual surveillance

    ACM Class: I.2.10

  40. arXiv:1404.2005  [pdf, other

    cs.CV

    Automatic Tracker Selection w.r.t Object Detection Performance

    Authors: Duc Phu Chau, François Bremond, Monique Thonnat, Slawomir Bak

    Abstract: The tracking algorithm performance depends on video content. This paper presents a new multi-object tracking approach which is able to cope with video content variations. First the object detection is improved using Kanade- Lucas-Tomasi (KLT) feature tracking. Second, for each mobile object, an appropriate tracker is selected among a KLT-based tracker and a discriminative appearance-based tracker.… ▽ More

    Submitted 8 April, 2014; originally announced April 2014.

    Comments: IEEE Winter Conference on Applications of Computer Vision (WACV 2014) (2014)

  41. arXiv:1307.5653  [pdf, other

    cs.CV

    Online Tracking Parameter Adaptation based on Evaluation

    Authors: Duc Phu Chau, Julien Badie, François Bremond, Monique Thonnat

    Abstract: Parameter tuning is a common issue for many tracking algorithms. In order to solve this problem, this paper proposes an online parameter tuning to adapt a tracking algorithm to various scene contexts. In an offline training phase, this approach learns how to tune the tracker parameters to cope with different contexts. In the online control phase, once the tracking quality is evaluated as not good… ▽ More

    Submitted 22 July, 2013; originally announced July 2013.

    Comments: IEEE International Conference on Advanced Video and Signal-based Surveillance (2013)

  42. arXiv:1305.2687  [pdf, other

    cs.CV

    Automatic Parameter Adaptation for Multi-object Tracking

    Authors: Duc Phu Chau, Monique Thonnat, François Bremond

    Abstract: Object tracking quality usually depends on video context (e.g. object occlusion level, object density). In order to decrease this dependency, this paper presents a learning approach to adapt the tracker parameters to the context variations. In an offline phase, satisfactory tracking parameters are learned for video context clusters. In the online control phase, once a context change is detected, t… ▽ More

    Submitted 13 May, 2013; originally announced May 2013.

    Comments: International Conference on Computer Vision Systems (ICVS) (2013)

  43. arXiv:1304.5212  [pdf, other

    cs.CV

    Object Tracking in Videos: Approaches and Issues

    Authors: Duc Phu Chau, François Bremond, Monique Thonnat

    Abstract: Mobile object tracking has an important role in the computer vision applications. In this paper, we use a tracked target-based taxonomy to present the object tracking algorithms. The tracked targets are divided into three categories: points of interest, appearance and silhouette of mobile objects. Advantages and limitations of the tracking approaches are also analyzed to find the future directions… ▽ More

    Submitted 18 April, 2013; originally announced April 2013.

    Journal ref: The International Workshop "Rencontres UNS-UD" (RUNSUD) (2013)

  44. A generic framework for video understanding applied to group behavior recognition

    Authors: Sofia Zaidenberg, Bernard Boulay, François Bremond

    Abstract: This paper presents an approach to detect and track groups of people in video-surveillance applications, and to automatically recognize their behavior. This method keeps track of individuals moving together by maintaining a spacial and temporal group coherence. First, people are individually detected and tracked. Second, their trajectories are analyzed over a temporal window and clustered using th… ▽ More

    Submitted 22 June, 2012; originally announced June 2012.

    Comments: (20/03/2012)

    Journal ref: 9th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS 2012) (2012) 136 -142

  45. arXiv:1112.1200  [pdf, ps, other

    cs.CV

    A multi-feature tracking algorithm enabling adaptation to context variations

    Authors: Duc Phu Chau, François Bremond, Monique Thonnat

    Abstract: We propose in this paper a tracking algorithm which is able to adapt itself to different scene contexts. A feature pool is used to compute the matching score between two detected objects. This feature pool includes 2D, 3D displacement distances, 2D sizes, color histogram, histogram of oriented gradient (HOG), color covariance and dominant color. An offline learning process is proposed to search fo… ▽ More

    Submitted 6 December, 2011; originally announced December 2011.

    Comments: The International Conference on Imaging for Crime Detection and Prevention (ICDP) (2011)

  46. arXiv:1106.2695  [pdf, ps, other

    cs.CV

    Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering

    Authors: Duc Phu Chau, François Bremond, Monique Thonnat, Etienne Corvee

    Abstract: This paper presents a new algorithm to track mobile objects in different scene conditions. The main idea of the proposed tracker includes estimation, multi-features similarity measures and trajectory filtering. A feature set (distance, area, shape ratio, color histogram) is defined for each tracked object to search for the best matching object. Its best matching object and its state estimated by t… ▽ More

    Submitted 14 June, 2011; originally announced June 2011.

    Journal ref: The International Conference on Computer Vision Theory and Applications (VISAPP) (2011)

  47. arXiv:1007.0313  [pdf

    cs.CV

    Repairing People Trajectories Based on Point Clustering

    Authors: Duc Phu Chau, Francois Bremond, Etienne Corvee, Monique Thonnat

    Abstract: This paper presents a method for improving any object tracking algorithm based on machine learning. During the training phase, important trajectory features are extracted which are then used to calculate a confidence value of trajectory. The positions at which objects are usually lost and found are clustered in order to construct the set of 'lost zones' and 'found zones' in the scene. Using these… ▽ More

    Submitted 2 July, 2010; originally announced July 2010.

    Journal ref: The International Conference on Computer Vision Theory and Applications (VISAPP), Lisboa : Portugal (2009)