Skip to main content

Showing 1–4 of 4 results for author: Zayats, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.18669  [pdf, other

    cs.LG cs.AI cs.CL eess.AS

    Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

    Authors: Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield

    Abstract: Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain gener… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Under review at NeurIPS

  2. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  3. arXiv:2109.06952  [pdf, other

    cs.CL cs.SD eess.AS

    Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

    Authors: Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy

    Abstract: Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has previously been shown that personalization through model fine-tuning substantially improves performance. However, maintaining such large models per speaker is costly an… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021

  4. arXiv:2005.03595  [pdf

    physics.optics cond-mat.mtrl-sci eess.IV

    Machine learning -- based diffractive imaging with subwavelength resolution

    Authors: Abantika Ghosh, Diane J. Roth, Luke H. Nicholls, William P. Wardley, Anatoly V. Zayats, Viktor A. Podolskiy

    Abstract: Far-field characterization of small objects is severely constrained by the diffraction limit. Existing tools achieving sub-diffraction resolution often utilize point-by-point image reconstruction via scanning or labelling. Here, we present a new imaging technique capable of fast and accurate characterization of two-dimensional structures with at least wavelength/25 resolution, based on a single fa… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.