Skip to main content

Showing 1–3 of 3 results for author: Fong, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2210.16045  [pdf, other

    cs.SD cs.CL eess.AS

    Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

    Authors: Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, Jilong Wu, Thilo Köhler, Qing He

    Abstract: Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity, speaker identity, and prosody. However, one limitation of prior work is the usage of finetuning to optimise performance: this requires further model… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  2. arXiv:2105.01573  [pdf, other

    eess.AS cs.SD

    Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

    Authors: Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

    Abstract: This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copy-synthesis, voice transformation, linguistic code-switching, and content-based privacy masking.… ▽ More

    Submitted 28 June, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: Accepted to Speech Synthesis Workshop 2021 (SSW11)

  3. arXiv:1807.11893  [pdf, other

    eess.AS cs.SD

    Manual Post-editing of Automatically Transcribed Speeches from the Icelandic Parliament - Althingi

    Authors: Judy Y. Fong, Michal Borsky, Inga R. Helgadóttir, Jon Gudnason

    Abstract: The design objectives for an automatic transcription system are to produce text readable by humans and to minimize the impact on manual post-editing. This study reports on a recognition system used for transcribing speeches in the Icelandic parliament - Althingi. It evaluates the system performance and its effect on manual post-editing. The results are compared against the original manual transcri… ▽ More

    Submitted 31 July, 2018; originally announced July 2018.

    Comments: submitted to IEEE SLT 2018, Athens