Skip to main content

Showing 1–3 of 3 results for author: Chalk, J

.
  1. arXiv:2404.05559  [pdf, other

    cs.CV

    TIM: A Time Interval Machine for Audio-Visual Action Recognition

    Authors: Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen

    Abstract: Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events. We propose the Time Interval Machine (TIM) where a modalit… ▽ More

    Submitted 9 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project Webpage: https://jacobchalk.github.io/TIM-Project

  2. arXiv:2404.05072  [pdf, other

    cs.CV

    Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

    Authors: Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen

    Abstract: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We int… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 21 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/

  3. arXiv:2302.00646  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Epic-Sounds: A Large-scale Dataset of Actions That Sound

    Authors: Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman

    Abstract: We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through groupi… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: 6 pages, 4 figures