Skip to main content

Showing 1–2 of 2 results for author: Langman, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05298  [pdf, other

    eess.AS

    Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

    Authors: Ryan Langman, Ante Jukić, Kunal Dhawan, Nithin Rao Koluguri, Boris Ginsburg

    Abstract: Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS). However, the data distribution produced by such codecs is too complex for some TTS models to predict, hence requir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. Improving fairness in speaker verification via Group-adapted Fusion Network

    Authors: Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke

    Abstract: Modern speaker verification models use deep neural networks to encode utterance audio into discriminative embedding vectors. During the training process, these networks are typically optimized to differentiate arbitrary speakers. This learning process biases the learning of fine voice characteristics towards dominant demographic groups, which can lead to an unfair performance disparity across diff… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: To appear in Proc. IEEE ICASSP 2022

    Journal ref: Proc. IEEE ICASSP, May 2022, pp. 7077-7081