Skip to main content

Showing 51–58 of 58 results for author: Arik, S

.
  1. arXiv:1806.07912  [pdf, other

    cs.NE cs.AI

    Resource-Efficient Neural Architect

    Authors: Yanqi Zhou, Siavash Ebrahimi, Sercan Ö. Arık, Haonan Yu, Hairong Liu, Greg Diamos

    Abstract: Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

  2. arXiv:1802.06006  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Neural Voice Cloning with a Few Samples

    Authors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei **, Yanqi Zhou

    Abstract: Voice cloning is a highly desired feature for personalized speech interfaces. Neural network based speech synthesis has been shown to generate high quality speech for a large number of speakers. In this paper, we introduce a neural voice cloning system that takes a few audio samples as input. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuni… ▽ More

    Submitted 12 October, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

  3. arXiv:1710.07654  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

    Authors: Wei **, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

    Abstract: We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common erro… ▽ More

    Submitted 22 February, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018. (v3 changed paper title)

  4. Low-complexity implementation of convex optimization-based phase retrieval

    Authors: Sercan O. Arik, Joseph M. Kahn

    Abstract: Phase retrieval has important applications in optical imaging, communications and sensing. Lifting the dimensionality of the problem allows phase retrieval to be approximated as a convex optimization problem in a higher-dimensional space. Convex optimization-based phase retrieval has been shown to yield high accuracy, yet its low-complexity implementation has not been explored. In this paper, we s… ▽ More

    Submitted 19 March, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

  5. arXiv:1705.08947  [pdf, other

    cs.CL

    Deep Voice 2: Multi-Speaker Neural Text-to-Speech

    Authors: Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei **, Jonathan Raiman, Yanqi Zhou

    Abstract: We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constr… ▽ More

    Submitted 20 September, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted in NIPS 2017

  6. arXiv:1703.05390  [pdf

    cs.CL cs.AI cs.LG

    Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

    Authors: Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates

    Abstract: Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the… ▽ More

    Submitted 4 July, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Accepted to Interspeech 2017

  7. arXiv:1702.07825  [pdf, other

    cs.CL cs.LG cs.NE cs.SD

    Deep Voice: Real-time Neural Text-to-Speech

    Authors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi

    Abstract: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency predi… ▽ More

    Submitted 7 March, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

    Comments: Submitted to ICML 2017

  8. arXiv:1406.0824  [pdf, other

    q-fin.ST cs.CE cs.LG q-fin.PM stat.ML

    Supervised classification-based stock prediction and portfolio optimization

    Authors: Sercan Arik, Sukru Burc Eryilmaz, Adam Goldberg

    Abstract: As the number of publicly traded companies as well as the amount of their financial data grows rapidly, it is highly desired to have tracking, analysis, and eventually stock selections automated. There have been few works focusing on estimating the stock prices of individual companies. However, many of those have worked with very small number of financial parameters. In this work, we apply machine… ▽ More

    Submitted 3 June, 2014; originally announced June 2014.