Skip to main content

Showing 1–13 of 13 results for author: King, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.04159  [pdf, other

    eess.AS

    Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers

    Authors: Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris

    Abstract: Streaming speech recognition architectures are employed for low-latency, real-time applications. Such architectures are often characterized by their causality. Causal architectures emit tokens at each frame, relying only on current and past signal, while non-causal models are exposed to a window of future frames at each step to increase predictive accuracy. This dichotomy amounts to a trade-off fo… ▽ More

    Submitted 9 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted to ICML 2023

  2. arXiv:2303.00692  [pdf, other

    eess.AS

    Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

    Authors: Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas

    Abstract: To achieve robust far-field automatic speech recognition (ASR), existing techniques typically employ an acoustic front end (AFE) cascaded with a neural transducer (NT) ASR model. The AFE output, however, could be unreliable, as the beamforming output in AFE is steered to a wrong direction. A promising way to address this issue is to exploit the microphone signals before the beamforming stage and a… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  3. A Method for Crash Prediction and Avoidance Using Hidden Markov Models

    Authors: Avinash Prabu, Lingxi Li, Brian King, Yaobin Chen

    Abstract: In recent years, automotive technology has made a steady progress. In particular, Advanced Driver Assistance System (ADAS) has enabled many safety features in commercial vehicles, for instance, pedestrian detection, lane kee** assist, emergency automatic braking, etc. Although these features provide drivers with a safer operational environment, crashes still happen occasionally due to the comple… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Conference: 2019 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)

  4. Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

    Authors: Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

    Abstract: As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Proc. Interspeech 2022

    Journal ref: Proc. Interspeech, Sept. 2022, pp. 1268-1272

  5. Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

    Authors: Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

    Abstract: We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER). However, when the ASR model is adapted to get better performance on these high-WER regions, its parameters wander from t… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at Interspeech 2022

    Journal ref: Proc. Interspeech, Sept. 2022, pp. 1298-1302

  6. arXiv:2207.02393  [pdf, other

    cs.CL cs.SD eess.AS

    Compute Cost Amortized Transformer for Streaming ASR

    Authors: Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

    Abstract: We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on acc… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  7. arXiv:2112.00350  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Investigation of Training Label Error Impact on RNN-T

    Authors: I-Fan Chen, Brian King, Jasha Droppo

    Abstract: In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradati… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 6 pages

  8. arXiv:2106.07734  [pdf, other

    cs.CL cs.LG eess.AS

    CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

    Authors: Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

    Abstract: We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted at InterSpeech 2021

  9. arXiv:2106.02750  [pdf, other

    eess.AS cs.AI

    Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

    Authors: Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

    Abstract: Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an additional (auxiliary) data source is occasionally available, it cannot be used. An ASR model that operates on both primary and auxiliary data can ac… ▽ More

    Submitted 28 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

  10. arXiv:2105.05920  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

    Authors: Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

    Abstract: Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input ch… ▽ More

    Submitted 14 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

  11. arXiv:2102.03951  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Multi-Channel Transformer for Speech Recognition

    Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

    Abstract: Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consist… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

  12. arXiv:2010.10354  [pdf, ps, other

    eess.SP

    Time-domain Representation of Passband Scattering Parameters

    Authors: Justin B. King

    Abstract: This paper presents a simple and accurate method for the inclusion of linear, time-invariant (LTI) networks, described by RF frequency-domain data, within equivalent baseband time-domain simulations. The time-domain representation is formulated as an equivalent baseband discrete-time impulse response, which may be convolved with the equivalent baseband form of the input signal, to obtain the corre… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted for publication the the Asia-Pacific Microwave Conference 2020, Hong Kong, China

  13. arXiv:2007.00131  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

    Authors: Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

    Abstract: Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-fre… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.