Skip to main content

Showing 1–15 of 15 results for author: Omologo, M

.
  1. arXiv:2303.00692  [pdf, other

    eess.AS

    Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

    Authors: Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas

    Abstract: To achieve robust far-field automatic speech recognition (ASR), existing techniques typically employ an acoustic front end (AFE) cascaded with a neural transducer (NT) ASR model. The AFE output, however, could be unreliable, as the beamforming output in AFE is steered to a wrong direction. A promising way to address this issue is to exploit the microphone signals before the beamforming stage and a… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  2. arXiv:2205.05590  [pdf, other

    cs.CL cs.SD eess.AS

    A neural prosody encoder for end-ro-end dialogue act classification

    Authors: Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo

    Abstract: Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we pr… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  3. arXiv:2111.03250  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Context-Aware Transformer Transducer for Speech Recognition

    Authors: Feng-Ju Chang, **g Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann

    Abstract: End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty recognizing uncommon words, that appear infrequently in the training data. One promising method, to improve the recognition accuracy on such rare words, is to latch onto personalized/contextual information at inference. In this work, we present a novel context-aware transformer transducer (CATT) network that improves… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted to ASRU 2021

  4. arXiv:2108.12953  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Channel Transformer Transducer for Speech Recognition

    Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

    Abstract: Multi-channel inputs offer several advantages over single-channel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems.… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Journal ref: Published in INTERSPEECH 2021

  5. Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

    Authors: Tina Raissi, Santiago Pascual, Maurizio Omologo

    Abstract: In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based mo… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

    ACM Class: I.2.7

  6. arXiv:1909.13447  [pdf

    eess.AS cs.CL cs.SD

    DiPCo -- Dinner Party Corpus

    Authors: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

    Abstract: We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices position… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  7. arXiv:1901.08983  [pdf, other

    cs.SD eess.AS

    LOCATA challenge: speaker localization with a planar array

    Authors: Xinyuan Qian, Andrea Cavallaro, Alessio Brutti, Maurizio Omologo

    Abstract: This document describes our submission to the 2018 LOCalization And TrAcking (LOCATA) challenge (Tasks 1, 3, 5). We estimate the 3D position of a speaker using the Global Coherence Field (GCF) computed from multiple microphone pairs of a DICIT planar array. One of the main challenges when using such an array with omnidirectional microphones is the front-back ambiguity, which is particularly eviden… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

    Comments: In Proceedings of the LOCATA ChallengeWorkshop - a satellite event of IWAENC 2018 (arXiv:1811.08482 )

    Report number: LOCATAchallenge/2018/05

  8. arXiv:1805.10498  [pdf, other

    eess.AS cs.LG cs.NE cs.SD

    Automatic context window composition for distant speech recognition

    Authors: Mirco Ravanelli, Maurizio Omologo

    Abstract: Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems. A key aspect behind the rapid rise and success of DNNs is their ability to better manage large time contexts. With this regard, asymmetric context windows that embed more past than future frames have been recently used with feed-forward neural networks. Th… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

    Comments: This is a preprint version of the paper published on Speech Communication Journal, 2018. Please see https://www.sciencedirect.com/science/article/pii/S0167639318300128 for the published version of this article

  9. arXiv:1803.10225  [pdf, other

    eess.AS cs.NE cs.SD eess.SP

    Light Gated Recurrent Units for Speech Recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech reco… ▽ More

    Submitted 26 March, 2018; originally announced March 2018.

    Comments: Copyright 2018 IEEE

    Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 92-102, April 2018

  10. arXiv:1711.09470  [pdf, other

    eess.AS cs.SD

    Realistic multi-microphone data simulation for distant speech recognition

    Authors: Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo

    Abstract: The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from… ▽ More

    Submitted 26 November, 2017; originally announced November 2017.

    Comments: Proc. of Interspeech 2016

  11. arXiv:1710.03538  [pdf, other

    eess.AS cs.CL cs.SD

    Contaminated speech training methods for robust DNN-HMM distant speech recognition

    Authors: Mirco Ravanelli, Maurizio Omologo

    Abstract: Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic conditions, on the other hand, remains a crucial open issue for future applications of human-machine interaction. To this end, several advances in speech enhance… ▽ More

    Submitted 10 October, 2017; originally announced October 2017.

    Journal ref: INTERSPEECH 2015

  12. arXiv:1710.02560  [pdf, other

    eess.AS cs.CL cs.SD

    The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments

    Authors: Mirco Ravanelli, Maurizio Omologo

    Abstract: This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native spea… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

    Comments: ASRU 2015

  13. arXiv:1710.00641  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Improving speech recognition by revising gated recurrent units

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs hav… ▽ More

    Submitted 29 September, 2017; originally announced October 2017.

  14. arXiv:1703.08471  [pdf, other

    cs.CL cs.LG

    Batch-normalized joint training for DNN-based distant speech recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing mo… ▽ More

    Submitted 24 March, 2017; originally announced March 2017.

    Comments: arXiv admin note: text overlap with arXiv:1703.08002

  15. arXiv:1703.08002  [pdf, other

    cs.CL cs.LG

    A network of deep neural networks for distant speech recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met. A prominent limitation of current systems lies in the lack of matching and communication between the various technologies involved in the distan… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.