Skip to main content

Showing 1–5 of 5 results for author: Fish, E

.
  1. arXiv:2403.18915  [pdf, other

    cs.CV cs.LG

    PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

    Authors: Edward Fish, Jon Weinbren, Andrew Gilbert

    Abstract: This paper introduces a novel approach to temporal action localization (TAL) in few-shot learning. Our work addresses the inherent limitations of conventional single-prompt learning methods that often lead to overfitting due to the inability to generalize across varying contexts in real-world videos. Recognizing the diversity of camera views, backgrounds, and objects in videos, we propose a multi-… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Under Review

  2. arXiv:2310.03456  [pdf, other

    cs.CV cs.LG cs.MM

    Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization

    Authors: Edward Fish, Jon Weinbren, Andrew Gilbert

    Abstract: Temporal Action Localization (TAL) aims to identify actions' start, end, and class labels in untrimmed videos. While recent advancements using transformer networks and Feature Pyramid Networks (FPN) have enhanced visual feature recognition in TAL tasks, less progress has been made in the integration of audio features into such frameworks. This paper introduces the Multi-Resolution Audio-Visual Fea… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Under Review

  3. arXiv:2307.12659  [pdf, other

    cs.SD cs.CL eess.AS

    A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

    Authors: Edward Fish, Umberto Michieli, Mete Ozay

    Abstract: Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small s… ▽ More

    Submitted 11 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: INTERSPEECH 2023. Code is available at https://github.com/SamsungLabs/myQASR

  4. arXiv:2208.01753  [pdf, other

    cs.CV cs.LG cs.MM

    Two-Stream Transformer Architecture for Long Video Understanding

    Authors: Edward Fish, Jon Weinbren, Andrew Gilbert

    Abstract: Pure vision transformer architectures are highly effective for short video classification and action recognition tasks. However, due to the quadratic complexity of self attention and lack of inductive bias, transformers are resource intensive and suffer from data inefficiencies. Long form video understanding tasks amplify data and memory efficiency problems in transformers making current approache… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

  5. arXiv:2012.02639  [pdf, other

    cs.CV cs.IR cs.LG cs.MM

    Rethinking movie genre classification with fine-grained semantic clustering

    Authors: Edward Fish, Jon Weinbren, Andrew Gilbert

    Abstract: Movie genre classification is an active research area in machine learning. However, due to the limited labels available, there can be large semantic variations between movies within a single genre definition. We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information within the multi-modal content of movies. By leveraging pre-trained 'expert' networks, we learn the in… ▽ More

    Submitted 20 January, 2021; v1 submitted 4 December, 2020; originally announced December 2020.