Skip to main content

Showing 1–5 of 5 results for author: Tomar, V S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2010.07895  [pdf, other

    cs.SD eess.AS

    Deep Convolutional Neural Network-based Inverse Filtering Approach for Speech De-reverberation

    Authors: Hanwook Chung, Vikrant Singh Tomar, Benoit Champagne

    Abstract: In this paper, we introduce a spectral-domain inverse filtering approach for single-channel speech de-reverberation using deep convolutional neural network (CNN). The main goal is to better handle realistic reverberant conditions where the room impulse response (RIR) filter is longer than the short-time Fourier transform (STFT) analysis window. To this end, we consider the convolutive transfer fun… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  2. arXiv:1904.03670  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speech Model Pre-training for End-to-End Spoken Language Understanding

    Authors: Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

    Abstract: Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is firs… ▽ More

    Submitted 25 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: Accepted to Interspeech 2019

  3. arXiv:1811.10736  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    DONUT: CTC-based Query-by-Example Keyword Spotting

    Authors: Loren Lugosch, Samuel Myer, Vikrant Singh Tomar

    Abstract: Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a sm… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: Accepted to NeurIPS 2018 Workshop on Interpretability and Robustness for Audio, Speech, and Language

  4. arXiv:1807.04353  [pdf, other

    eess.AS cs.SD

    Efficient keyword spotting using time delay neural networks

    Authors: Samuel Myer, Vikrant Singh Tomar

    Abstract: This paper describes a novel method of live keyword spotting using a two-stage time delay neural network. The model is trained using transfer learning: initial training with phone targets from a large speech corpus is followed by training with keyword targets from a smaller data set. The accuracy of the system is evaluated on two separate tasks. The first is the freely available Google Speech Comm… ▽ More

    Submitted 28 August, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

    Comments: Will appear in Interspeech 2018

  5. arXiv:1807.02465  [pdf, other

    eess.AS cs.SD

    Tone Recognition Using Lifters and CTC

    Authors: Loren Lugosch, Vikrant Singh Tomar

    Abstract: In this paper, we present a new method for recognizing tones in continuous speech for tonal languages. The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using a connectionist temporal classification (CTC) network. The performance of the proposed method i… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Accepted to Interspeech 2018