Skip to main content

Showing 1–6 of 6 results for author: Heyward, J

.
  1. arXiv:2407.05921  [pdf, other

    cs.CV cs.AI cs.LG

    TAPVid-3D: A Benchmark for Tracking Any Point in 3D

    Authors: Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch

    Abstract: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three-dimensional point tracking has none. To this end, leveraging existing footage, we build a new benchmark for 3D point tracking featuring 4,000+ real-w… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2402.00847  [pdf, other

    cs.CV stat.ML

    BootsTAP: Bootstrapped Training for Tracking-Any-Point

    Authors: Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman

    Abstract: To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulat… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  3. arXiv:2312.13090  [pdf, other

    cs.CV

    Perception Test 2023: A Summary of the First Challenge And Outcome

    Authors: Joseph Heyward, João Carreira, Dima Damen, Andrew Zisserman, Viorica Pătrăucean

    Abstract: The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023, with the goal of benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark. The challenge had six tracks covering low-level and high-level tasks, with both a language and non-language interface, across video, audio,… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  4. arXiv:2312.07395  [pdf, other

    cs.CV cs.CL

    A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

    Authors: Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh

    Abstract: Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. However, we expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in stan… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  5. arXiv:2312.00598  [pdf, other

    cs.CV cs.AI

    Learning from One Continuous Video Stream

    Authors: João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

    Abstract: We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of str… ▽ More

    Submitted 28 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: CVPR camera ready version

  6. arXiv:2305.13786  [pdf, other

    cs.CV cs.AI cs.LG

    Perception Test: A Diagnostic Benchmark for Multimodal Video Models

    Authors: Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, João Carreira

    Abstract: We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks