Skip to main content

Showing 1–8 of 8 results for author: Clark, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.15037  [pdf

    eess.SY

    Implementation of Firm-Dispatchable Generation in South Africa

    Authors: Stephen R. Clark, Craig McGregor

    Abstract: South Africa is currently facing a critical situation in its power generation landscape, which is plagued by frequent power outages and the need to move from fossil fuels to renewable energy sources. This period emphasizes the importance of having firm-dispatchable power to balance out the intermittent nature of wind and solar energy sources. The paper proposes to repurpose old coal-fired power pl… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  2. arXiv:2403.10869  [pdf

    eess.SY

    Firm-Dispatchable Power and its Requirement in a Power System based on Variable Generation

    Authors: Stephen R. Clark, Craig McGregor

    Abstract: Many countries have commenced a transition from fossil fuel-based electricity generation systems to sustainable systems based on wind and solar generation. It is often noted that the least cost approach would involve a massive scale-up in the building of variable renewables, supported by battery storage and gas peaking plants. The required backup should be firm-dispatchable generation rather than… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  3. arXiv:2208.13183  [pdf, other

    cs.SD eess.AS

    Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

    Authors: Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark

    Abstract: Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. T… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: To be published in Interspeech 2022

  4. arXiv:1911.01601  [pdf, other

    eess.AS cs.CR cs.SD eess.SP

    ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

    Authors: Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika , et al. (15 additional authors not shown)

    Abstract: Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to imperso… ▽ More

    Submitted 14 July, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted, Computer Speech and Language. This manuscript version is made available under the CC-BY-NC-ND 4.0. For the published version on Elsevier website, please visit https://doi.org/10.1016/j.csl.2020.101114

  5. arXiv:1909.03965  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs

    Authors: Rob Clark, Hanna Silen, Tom Kenter, Ralph Leith

    Abstract: Text-to-speech systems are typically evaluated on single sentences. When long-form content, such as data consisting of full paragraphs or dialogues is considered, evaluating sentences in isolation is not always appropriate as the context in which the sentences are synthesized is missing. In this paper, we investigate three different ways of evaluating the naturalness of long-form text-to-speech sy… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted for The 10th ISCA Speech Synthesis Workshop (SSW10), 6 pages

  6. arXiv:1905.07195  [pdf, other

    cs.CL cs.SD eess.AS

    CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

    Authors: Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

    Abstract: The prosodic aspects of speech signals produced by current text-to-speech systems are typically averaged over training material, and as such lack the variety and liveliness found in natural speech. To avoid monotony and averaged prosody contours, it is desirable to have a way of modeling the variation in the prosodic aspects of speech, so audio signals can be synthesized in multiple ways for a giv… ▽ More

    Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

  7. arXiv:1904.02882  [pdf, other

    cs.SD eess.AS

    LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

    Authors: Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

    Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: Submitted for Interspeech 2019, 7 pages

  8. arXiv:1803.09047  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

    Authors: RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous

    Abstract: We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synth… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.