Showing 1–1 of 1 results for author: Valcarce, D

Search v0.5.6 released 2020-02-24

arXiv:2010.10203 [pdf, other]

cs.LG cs.CL cs.SD eess.AS

Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Authors: Daria Soboleva, Ondrej Skopek, Márius Šajgalík, Victor Cărbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos

Abstract: We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our mo… ▽ More We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, kee** the model size small and latency low. △ Less

Submitted 10 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted to IEEE ICASSP 2021

Search v0.5.6 released 2020-02-24