Skip to main content

Showing 1–10 of 10 results for author: Strom, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2207.06920  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

    Authors: Lu Zeng, Sree Hari Krishnan Parthasarathi, Yuzong Liu, Alex Escott, Santosh Kumar Cheekatmalla, Nikko Strom, Shiv Vitaladevuni

    Abstract: We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, incl… ▽ More

    Submitted 8 September, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  2. arXiv:2104.09088  [pdf, other

    cs.CL cs.LG

    Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems

    Authors: Anish Acharya, Suranjit Adhikari, Sanchit Agarwal, Vincent Auvray, Nehal Belgamwar, Arijit Biswas, Shubhra Chandra, Tagyoung Chung, Maryam Fazel-Zarandi, Raefer Gabriel, Shuyang Gao, Rahul Goel, Dilek Hakkani-Tur, Jan Jezabek, Abhay Jha, Jiun-Yu Kao, Prakash Krishnan, Peter Ku, Anuj Goyal, Chien-Wei Lin, Qing Liu, Arindam Mandal, Angeliki Metallinou, Vishal Naik, Yi Pan , et al. (6 additional authors not shown)

    Abstract: Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Journal ref: NAACL 2021 System Demonstrations Track

  3. Managing Traceability Information Models: Not such a simple task after all?

    Authors: Salome Maro, Jan-Philipp Steghöfer, Eric Knauss, Jennifer Horkoff, Rashidah Kasauli, Rebekka Wohlrab, Jesper Lysemose Korsgaard, Florian Wartenberg, Niels Jørgen Strøm, Ruben Alexandersson

    Abstract: Practitioners are poorly supported by the scientific literature when managing traceability information models (TIMs), which capture the structure and semantics of trace links. In practice, companies manage their TIMs in very different ways, even in cases where companies share many similarities. We present our findings from an in-depth focus group about TIM management with three different systems e… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: IEEE Software (2020)

  4. arXiv:1904.10584  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Realizing Petabyte Scale Acoustic Modeling

    Authors: Sree Hari Krishnan Parthasarathi, Nitin Sivakrishnan, Pranav Ladkat, Nikko Strom

    Abstract: Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical levels, we utilize semi-supervised learning (SSL) to learn acoustic models (AM) from the vast firehose of untranscribed audio data. Learning an AM from 1 Millio… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

    Comments: 2156-3357 ©2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information

  5. arXiv:1904.01624  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    Lessons from Building Acoustic Models with a Million Hours of Speech

    Authors: Sree Hari Krishnan Parthasarathi, Nikko Strom

    Abstract: This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, hel** scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generatio… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: "Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."

  6. Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

    Authors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the differenc… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP2019, 5 pages. arXiv admin note: substantial text overlap with arXiv:1903.05299

    Report number: https://doi.org/10.1109/ICASSP.2019.8682294

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, page 6635-6639

  7. Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

    Authors: Minhua Wu, Kenichi Kumatani, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

    Abstract: Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this w… ▽ More

    Submitted 28 April, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: ICASSP 2019, 5 pages

    Report number: https://doi.org/10.1109/ICASSP.2019.8682977

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, pages 6640-6644

  8. arXiv:1811.06296  [pdf, other

    eess.AS cs.SD

    Comprehensive evaluation of statistical speech waveform synthesis

    Authors: Thomas Merritt, Bartosz Putrycz, Adam Nadolski, Tianjun Ye, Daniel Korzekwa, Wiktor Dolecki, Thomas Drugman, Viacheslav Klimkov, Alexis Moinet, Andrew Breen, Rafal Kuklinski, Nikko Strom, Roberto Barra-Chicote

    Abstract: Statistical TTS systems that directly predict the speech waveform have recently reported improvements in synthesis quality. This investigation evaluates Amazon's statistical speech waveform synthesis (SSWS) system. An in-depth evaluation of SSWS is conducted across a number of domains to better understand the consistency in quality. The results of this evaluation are validated by repeating the pro… ▽ More

    Submitted 11 December, 2018; v1 submitted 15 November, 2018; originally announced November 2018.

  9. arXiv:1808.00563  [pdf, other

    cs.CL cs.LG stat.ML

    Data Augmentation for Robust Keyword Spotting under Playback Interference

    Authors: Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko Strom

    Abstract: Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the dev… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  10. arXiv:1705.02411  [pdf, other

    cs.CL cs.LG stat.ML

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Authors: Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

    Abstract: We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance.… ▽ More

    Submitted 5 May, 2017; originally announced May 2017.

    Journal ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp. 474-480). IEEE