Skip to main content

Showing 1–1 of 1 results for author: Gosh, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.01615  [pdf, other

    cs.SD cs.CL eess.AS

    FLAP: Fast Language-Audio Pre-training

    Authors: Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Gosh

    Abstract: We propose Fast Language-Audio Pre-training (FLAP), a self-supervised approach that efficiently and effectively learns aligned audio and language representations through masking, contrastive learning and reconstruction. For efficiency, FLAP randomly drops audio spectrogram tokens, focusing solely on the remaining ones for self-supervision. Through inter-modal contrastive learning, FLAP learns to a… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 6 pages