Skip to main content

Showing 1–18 of 18 results for author: Perrett, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.05072  [pdf, other

    cs.CV

    Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

    Authors: Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen

    Abstract: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We int… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 21 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/

  2. arXiv:2311.16446  [pdf, other

    cs.CV

    Centre Stage: Centricity-based Audio-Visual Temporal Action Detection

    Authors: Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett

    Abstract: Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality. In this paper, we explore different strategies to incorporate the audio modality, using multi-scale cross-attention to fuse the two modalities. We also demonstrate the correlation between the distance from the timestep to the action centre and the accuracy of the predicted boundaries.… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to VUA workshop at BMVC 2023

  3. arXiv:2306.08713  [pdf, other

    cs.CV

    What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

    Authors: Chiara Plizzari, Toby Perrett, Barbara Caputo, Dima Damen

    Abstract: We propose and address a new generalisation problem: can a model trained for action recognition successfully classify actions when they are performed within a previously unseen scenario and in a previously unseen location? To answer this question, we introduce the Action Recognition Generalisation Over scenarios and locations dataset (ARGO1M), which contains 1.1M video clips from the large-scale E… ▽ More

    Submitted 24 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at ICCV 2023. Project page: https://chiaraplizz.github.io/what-can-a-cook/

  4. arXiv:2304.01143  [pdf, other

    cs.CV

    Use Your Head: Improving Long-Tail Video Recognition

    Authors: Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen

    Abstract: This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sam… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  5. arXiv:2210.14284  [pdf, other

    cs.CV

    Refining Action Boundaries for One-stage Detection

    Authors: Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett

    Abstract: Current one-stage action detection methods, which simultaneously predict action boundaries and the corresponding class, do not estimate or use a measure of confidence in their boundary predictions, which can lead to inaccurate boundaries. We incorporate the estimation of boundary confidence into one-stage anchor-free detection, through an additional prediction head that predicts the refined bounda… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted to AVSS 2022. Our code is available at https://github.com/hanielwang/Refining_Boundary_Head.git

  6. arXiv:2207.06789  [pdf, other

    cs.CV

    Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things

    Authors: Alessandro Masullo, Toby Perrett, Tilo Burghardt, Ian Craddock, Dima Damen, Majid Mirmehdi

    Abstract: We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) which takes advantage of learning using privileged information (LUPI). We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability. Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalit… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  7. arXiv:2206.05496  [pdf, other

    cs.CV

    An Evaluation of OCR on Egocentric Data

    Authors: Valentin Popescu, Dima Damen, Toby Perrett

    Abstract: In this paper, we evaluate state-of-the-art OCR methods on Egocentric data. We annotate text in EPIC-KITCHENS images, and demonstrate that existing OCR methods struggle with rotated text, which is frequently observed on objects being handled. We introduce a simple rotate-and-merge procedure which can be applied to pre-trained OCR models that halves the normalized edit distance error. This suggests… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: Extended Abstract, EPIC workshop at CVPR 22

  8. arXiv:2201.00434  [pdf, other

    cs.CV

    TVNet: Temporal Voting Network for Action Localization

    Authors: Hanyuan Wang, Dima Damen, Majid Mirmehdi, Toby Perrett

    Abstract: We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate conf… ▽ More

    Submitted 2 January, 2022; originally announced January 2022.

    Comments: 9 pages, 7 figures, 11 tables

  9. arXiv:2101.06184  [pdf, other

    cs.CV

    Temporal-Relational CrossTransformers for Few-Shot Action Recognition

    Authors: Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, Dima Damen

    Abstract: We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video represent… ▽ More

    Submitted 28 March, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: Accepted in CVPR 2021

  10. arXiv:2007.14658  [pdf, other

    cs.CV

    Meta-Learning with Context-Agnostic Initialisations

    Authors: Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, Dima Damen

    Abstract: Meta-learning approaches have addressed few-shot problems by finding initialisations suited for fine-tuning to target tasks. Often there are additional properties within training data (which we refer to as context), not relevant to the target task, which act as a distractor to meta-learning, particularly when the target task contains examples from a novel context not seen during training. We addre… ▽ More

    Submitted 22 October, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: Accepted at ACCV 2020

  11. Rescaling Egocentric Vision

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a nov… ▽ More

    Submitted 17 September, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Accepted at the International Journal of Computer Vision (IJCV). Dataset available from: http://epic-kitchens.github.io/

  12. arXiv:2005.00343  [pdf, other

    cs.CV

    The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions.… ▽ More

    Submitted 29 April, 2020; originally announced May 2020.

    Comments: Preprint for paper at IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1804.02748

  13. Sit-to-Stand Analysis in the Wild using Silhouettes for Longitudinal Health Monitoring

    Authors: Alessandro Masullo, Tilo Burghardt, Toby Perrett, Dima Damen, Majid Mirmehdi

    Abstract: We present the first fully automated Sit-to-Stand or Stand-to-Sit (StS) analysis framework for long-term monitoring of patients in free-living environments using video silhouettes. Our method adopts a coarse-to-fine time localisation approach, where a deep learning classifier identifies possible StS sequences from silhouettes, and a smart peak detection stage provides fine localisation based on 3D… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

  14. arXiv:1904.08634  [pdf, other

    cs.CV

    DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition

    Authors: Toby Perrett, Dima Damen

    Abstract: Domain alignment in convolutional networks aims to learn the degree of layer-specific feature alignment beneficial to the joint learning of source and target datasets. While increasingly popular in convolutional networks, there have been no previous attempts to achieve domain alignment in recurrent networks. Similar to spatial features, both source and target domains are likely to exhibit temporal… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019

  15. arXiv:1810.06704  [pdf, ps, other

    math.CO cs.DM

    Colouring Graphs with Sparse Neighbourhoods: Bounds and Applications

    Authors: Marthe Bonamy, Thomas Perrett, Luke Postle

    Abstract: Let $G$ be a graph with chromatic number $χ$, maximum degree $Δ$ and clique number $ω$. Reed's conjecture states that $χ\leq \lceil (1-\varepsilon)(Δ+ 1) + \varepsilonω\rceil$ for all $\varepsilon \leq 1/2$. It was shown by King and Reed that, provided $Δ$ is large enough, the conjecture holds for $\varepsilon \leq 1/130,000$. In this article, we show that the same statement holds for… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Comments: Submitted for publication in July 2016

  16. arXiv:1804.02748  [pdf, other

    cs.CV

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen… ▽ More

    Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.io

  17. arXiv:1609.06257  [pdf, ps, other

    math.CO cs.DM

    Gallai's path decomposition conjecture for graphs of small maximum degree

    Authors: Marthe Bonamy, Thomas Perrett

    Abstract: Gallai's path decomposition conjecture states that the edges of any connected graph on n vertices can be decomposed into at most (n+1)/2 paths. We confirm that conjecture for all graphs with maximum degree at most five.

    Submitted 20 September, 2016; originally announced September 2016.

    Comments: 11 pages, 11 figures, submitted

  18. arXiv:1512.07080  [pdf, other

    cs.CV

    Cost-based Feature Transfer for Vehicle Occupant Classification

    Authors: Toby Perrett, Majid Mirmehdi, Eduardo Dias

    Abstract: Knowledge of human presence and interaction in a vehicle is of growing interest to vehicle manufacturers for design and safety purposes. We present a framework to perform the tasks of occupant detection and occupant classification for automatic child locks and airbag suppression. It operates for all passenger seats, using a single overhead camera. A transfer learning technique is introduced to mak… ▽ More

    Submitted 22 December, 2015; originally announced December 2015.

    Comments: 9 pages, 4 figures, 5 tables

    ACM Class: I.4.9