Skip to main content

Showing 1–10 of 10 results for author: Alcazar, J L

.
  1. arXiv:2305.18418  [pdf, other

    cs.CV cs.AI cs.LG

    Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

    Authors: Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

    Abstract: Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which place… ▽ More

    Submitted 28 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted at CLVision Workshop - CVPR23 (Best Paper Award)

  2. arXiv:2212.04842  [pdf, other

    cs.CV cs.AI

    PIVOT: Prompting for Video Continual Learning

    Authors: Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

    Abstract: Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectivel… ▽ More

    Submitted 4 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  3. arXiv:2203.14250  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    End-to-End Active Speaker Detection

    Authors: Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem

    Abstract: Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This… ▽ More

    Submitted 25 July, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

  4. arXiv:2201.09381  [pdf, other

    cs.CV

    vCLIMB: A Novel Video Class Incremental Learning Benchmark

    Authors: Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

    Abstract: Continual learning (CL) is under-explored in the video domain. The few existing works contain splits with imbalanced class distributions over the tasks, or study the problem in unsuitable datasets. We introduce vCLIMB, a novel video continual learning benchmark. vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. In contrast to previous… ▽ More

    Submitted 6 April, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: An updated version of our CVPR 2022 paper (oral); v2 adds minor text changes. The code of our benchmark can be found at: https://vclimb.netlify.app/

  5. arXiv:2112.00431  [pdf, other

    cs.CV cs.AI

    MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

    Authors: Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem

    Abstract: The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-t… ▽ More

    Submitted 28 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: 12 Pages, 6 Figures, 7 Tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR 2022

  6. arXiv:2109.05569  [pdf, other

    cs.CV

    MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, whi… ▽ More

    Submitted 24 October, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Paper's website: https://www.alejandropardo.net/publication/moviecuts/

    Journal ref: ECCV 2022

  7. arXiv:2108.04294  [pdf, other

    cs.CV cs.MM

    Learning to Cut by Watching Movies

    Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem

    Abstract: Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea i… ▽ More

    Submitted 29 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Accepted at ICCV2021. Paper website: https://alejandropardo.net/publication/learning-to-cut/

  8. arXiv:2106.01667  [pdf, other

    cs.CV

    APES: Audiovisual Person Search in Untrimmed Video

    Authors: Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron

    Abstract: Humans are arguably one of the most important subjects in video streams, many real-world applications such as video summarization or video editing workflows often require the automatic search and retrieval of a person of interest. Despite tremendous efforts in the person reidentification and retrieval domains, few works have developed audiovisual search strategies. In this paper, we present the Au… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  9. arXiv:2005.09812  [pdf, other

    cs.CV cs.SD eess.AS

    Active Speakers in Context

    Authors: Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

    Abstract: Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationshi… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  10. arXiv:1904.05847  [pdf, other

    cs.CV

    MAIN: Multi-Attention Instance Network for Video Segmentation

    Authors: Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

    Abstract: Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Netwo… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.