Skip to main content

Showing 1–2 of 2 results for author: McNulty, L

.
  1. arXiv:2312.08494  [pdf, other

    cs.SD cs.LG eess.AS

    PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models

    Authors: Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli

    Abstract: Perceptual modification of voice is an elusive goal. While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to convert one voice to another, but these modifications are handled by black box models, and the specifics of what perceptual qualities to modify and ho… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  2. arXiv:2310.02497  [pdf, other

    cs.SD cs.LG eess.AS

    Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities

    Authors: Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli

    Abstract: Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptu… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.