Skip to main content

Showing 1–5 of 5 results for author: Gibiansky, A

.
  1. arXiv:2112.03099  [pdf, other

    cs.SD cs.CL eess.AS

    VocBench: A Neural Vocoder Benchmark for Speech Synthesis

    Authors: Ehab A. AlBadawy, Andrew Gibiansky, Qing He, Jilong Wu, Ming-Ching Chang, Siwei Lyu

    Abstract: Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-Spectrograms. In recent years, different approaches have been introduced to develop such vocoders. However, it becomes more challenging to assess thes… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: To appear in icassp 2022

  2. arXiv:1710.07654  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

    Authors: Wei **, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

    Abstract: We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common erro… ▽ More

    Submitted 22 February, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018. (v3 changed paper title)

  3. arXiv:1705.08947  [pdf, other

    cs.CL

    Deep Voice 2: Multi-Speaker Neural Text-to-Speech

    Authors: Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei **, Jonathan Raiman, Yanqi Zhou

    Abstract: We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constr… ▽ More

    Submitted 20 September, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted in NIPS 2017

  4. arXiv:1703.05390  [pdf

    cs.CL cs.AI cs.LG

    Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

    Authors: Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates

    Abstract: Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the… ▽ More

    Submitted 4 July, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Accepted to Interspeech 2017

  5. arXiv:1702.07825  [pdf, other

    cs.CL cs.LG cs.NE cs.SD

    Deep Voice: Real-time Neural Text-to-Speech

    Authors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi

    Abstract: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency predi… ▽ More

    Submitted 7 March, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

    Comments: Submitted to ICML 2017