Skip to main content

Showing 1–3 of 3 results for author: Dawes, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06499  [pdf, other

    cs.CV cs.HC

    NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

    Authors: Asmar Nadeem, Faegheh Sardari, Robert Dawes, Syed Sameed Husain, Adrian Hilton, Armin Mustafa

    Abstract: Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, w… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2310.16754  [pdf, other

    cs.CV

    CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

    Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

    Abstract: In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often n… ▽ More

    Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  3. arXiv:2303.14829  [pdf, other

    cs.CV

    SEM-POS: Grammatically and Semantically Correct Video Captioning

    Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

    Abstract: Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and f… ▽ More

    Submitted 4 April, 2023; v1 submitted 26 March, 2023; originally announced March 2023.