Skip to main content

Showing 1–1 of 1 results for author: Lagos, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06371  [pdf, other

    cs.CL cs.SD eess.AS

    mHuBERT-147: A Compact Multilingual HuBERT Model

    Authors: Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu

    Abstract: We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data. To scale up the multi-iteration HuBERT approach, we use faiss-based clustering, achieving 5.2x faster label assignment than the original method. We also apply a new multilingual batching up-sampling strategy, leveraging both language and data… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Extended version of the Interspeech 2024 paper of same name