Skip to main content

Showing 1–4 of 4 results for author: Leary, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:1910.10261  [pdf, other

    eess.AS

    QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

    Authors: Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang

    Abstract: We propose a new end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable convolutional layers, batch normalization, and ReLU layers. It is trained with CTC loss. The proposed network achieves near state-of-the-art accuracy on LibriSpe… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  2. arXiv:1910.10032  [pdf, ps, other

    cs.CL eess.AS

    GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

    Authors: Hugo Braun, Justin Luitjens, Ryan Leary, Tim Kaldewey, Daniel Povey

    Abstract: We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output (I/O) bandwidth, and uses a novel Viterbi implementation designed to maximize parallelism. The reduced memory footprint allows the decoder to process significant… ▽ More

    Submitted 13 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP 2020

  3. arXiv:1909.09577  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    NeMo: a toolkit for building AI applications using Neural Modules

    Authors: Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen

    Abstract: NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: 6 pages plus references

  4. arXiv:1904.03288  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Jasper: An End-to-End Convolutional Neural Acoustic Model

    Authors: Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

    Abstract: In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep arc… ▽ More

    Submitted 26 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted to INTERSPEECH 2019