Skip to main content

Showing 1–1 of 1 results for author: Southwell, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06582  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

    Authors: Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, Jacob Whitehill

    Abstract: Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguistic information that can improve accuracy in a variety of tasks. In this paper, we present a decoder… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.