Skip to main content

Showing 1–1 of 1 results for author: Bastas, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.02916  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Audio Captioning Transformer with Patchout and Text Guidance

    Authors: Thodoris Kouzelis, Grigoris Bastas, Athanasios Katsamanis, Alexandros Potamianos

    Abstract: Automated audio captioning is multi-modal translation task that aim to generate textual descriptions for a given audio clip. In this paper we propose a full Transformer architecture that utilizes Patchout as proposed in [1], significantly reducing the computational complexity and avoiding overfitting. The caption generation is partly conditioned on textual AudioSet tags extracted by a pre-trained… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 5 pages, 1 figure

    ACM Class: F.2.2; I.2.7