Skip to main content

Showing 1–1 of 1 results for author: Carpuat, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.14402  [pdf, other

    cs.SD cs.CL eess.AS

    XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

    Authors: HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

    Abstract: Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-v… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.