Skip to main content

Showing 1–13 of 13 results for author: Hoffmeister, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.02417  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

    Abstract: While word error rates of automatic speech recognition (ASR) systems have consistently fallen, natural language understanding (NLU) applications built on top of ASR systems still attribute significant numbers of failures to low-quality speech recognition results. Existing assistant systems collect large numbers of these unsuccessful interactions, but these systems usually fail to learn from these… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: To appear in ICASSP 2024

  2. arXiv:2301.02736  [pdf, other

    eess.AS cs.LG cs.SD

    Using External Off-Policy Speech-To-Text Map**s in Contextual End-To-End Automated Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

    Abstract: Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning). In this work, we investigate the potential of leveragin… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

  3. arXiv:2110.09890  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Modal Pre-Training for Automated Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

    Abstract: Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance. Unfortunately, approaches relying on such hyper-local information tend to be vulnerable to both local-level corruption (such as audio-frame drops, or loud noises) and global-level noise (such as environmental noise, or background noise… ▽ More

    Submitted 15 September, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Presented at ICASSP 2022

  4. arXiv:2108.06329  [pdf, other

    cs.CL cs.LG

    Low-Resource Adaptation of Open-Domain Generative Chatbots

    Authors: Greyson Gerhard-Young, Raviteja Anantha, Srinivas Chappidi, Björn Hoffmeister

    Abstract: Recent work building open-domain chatbots has demonstrated that increasing model size improves performance. On the other hand, latency and connectivity considerations dictate the move of digital assistants on the device. Giving a digital assistant like Siri, Alexa, or Google Assistant the ability to discuss just about anything leads to the need for reducing the chatbot model size such that it fits… ▽ More

    Submitted 8 April, 2022; v1 submitted 13 August, 2021; originally announced August 2021.

    Comments: Accepted at ACL DialDoc 2022

  5. arXiv:1909.13447  [pdf

    eess.AS cs.CL cs.SD

    DiPCo -- Dinner Party Corpus

    Authors: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

    Abstract: We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices position… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  6. Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

    Authors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the differenc… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP2019, 5 pages. arXiv admin note: substantial text overlap with arXiv:1903.05299

    Report number: https://doi.org/10.1109/ICASSP.2019.8682294

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, page 6635-6639

  7. Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

    Authors: Minhua Wu, Kenichi Kumatani, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this w… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP 2019, 5 pages

    Report number: https://doi.org/10.1109/ICASSP.2019.8682977

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, pages 6640-6644

  8. arXiv:1902.02383  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    End-to-end Anchored Speech Recognition

    Authors: Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister

    Abstract: Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device-directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from t… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: Accepted by ICASSP 2019

  9. arXiv:1901.02348  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

    Authors: Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

    Abstract: For real-world speech recognition applications, noise robustness is still a challenge. In this work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy corpus for improving automatic speech recognition (ASR) performance under multimedia noise. On top of that, we apply a logits selection method which only preserves the k highest values to prevent wrong emphasis o… ▽ More

    Submitted 15 March, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

    Comments: To Appear in ICASSP 2019

  10. arXiv:1812.04647  [pdf, other

    cs.CL

    Scalable language model adaptation for spoken dialogue systems

    Authors: Ankur Gandhe, Ariya Rastrow, Bjorn Hoffmeister

    Abstract: Language models (LM) for interactive speech recognition systems are trained on large amounts of data and the model parameters are optimized on past user data. New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions. It is unclear how to… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: Accepted at SLT 2018

  11. arXiv:1809.07832  [pdf, other

    cs.CL

    LSTM-based Whisper Detection

    Authors: Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

    Abstract: This article presents a whisper speech detector in the far-field domain. The proposed system consists of a long-short term memory (LSTM) neural network trained on log-filterbank energy (LFBE) acoustic features. This model is trained and evaluated on recordings of human interactions with voice-controlled, far-field devices in whisper and normal phonation modes. We compare multiple inference approac… ▽ More

    Submitted 5 April, 2020; v1 submitted 20 September, 2018; originally announced September 2018.

  12. arXiv:1808.02504  [pdf, other

    cs.CL eess.AS

    Device-directed Utterance Detection

    Authors: Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

    Abstract: In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: $"Computer,~play~music", "Computer,~reduce~the~volume"$. In this interaction,… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: Interspeech 2018 (accepted)

  13. arXiv:1711.00549  [pdf, other

    cs.CL cs.AI cs.NE cs.SE

    Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

    Authors: Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn Hoffmeister, Markus Dreyer, Stanislav Peshterliev, Ankur Gandhe, Denis Filiminov, Ariya Rastrow, Christian Monson, Agnika Kumar

    Abstract: This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon's virtual assistant, Alexa. At Amazon, the infrastructure powers over 25,000 skills deployed through the ASK, as well as AWS's Amazon Lex SLU Servic… ▽ More

    Submitted 2 March, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Published at the 1st Workshop on Conversational AI at NIPS 2017 (NIPS-WCAI)

    MSC Class: 68T50