Search | arXiv e-print repository

Keyword-Guided Adaptation of Automatic Speech Recognition

Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

Abstract: Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model t… ▽ More Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches to effectively steer the decoder towards these prompts: KG-Whisper, which is aimed at fine-tuning the Whisper decoder, and KG-Whisper-PT, which learns a prompt prefix. Our results show a significant improvement in the recognition accuracy of specified keywords and in reducing the overall word error rates. Specifically, in unseen language generalization, we demonstrate an average WER improvement of 5.1% over Whisper. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to InterSpeech 2024

arXiv:2309.08561 [pdf, other]

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

Abstract: Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encod… ▽ More Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Under Review

arXiv:2301.11640 [pdf, other]

Hardware Implementation of Task-based Quantization in Multi-user Signal Recovery

Authors: Xing Zhang, Haiyang Zhang, Nimrod Glazer, Oded Cohen, Eliya Reznitskiy, Shlomi Savariego, Moshe Namer, Yonina C. Eldar

Abstract: Quantization plays a critical role in digital signal processing systems, allowing the representation of continuous amplitude signals with a finite number of bits. However, accurately representing signals requires a large number of quantization bits, which causes severe cost, power consumption, and memory burden. A promising way to address this issue is task-based quantization. By exploiting the ta… ▽ More Quantization plays a critical role in digital signal processing systems, allowing the representation of continuous amplitude signals with a finite number of bits. However, accurately representing signals requires a large number of quantization bits, which causes severe cost, power consumption, and memory burden. A promising way to address this issue is task-based quantization. By exploiting the task information for the overall system design, task-based quantization can achieve satisfying performance with low quantization costs. In this work, we apply task-based quantization to multi-user signal recovery and present a hardware prototype implementation. The prototype consists of a tailored configurable combining board, and a software-based processing and demonstration system. Through experiments, we verify that with proper design, the task-based quantization achieves a reduction of 25 fold in memory by reducing from 16 receivers with 16 bits each to 2 receivers with 5 bits each, without compromising signal recovery performance. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.09609 [pdf, other]

A Hardware Prototype of Wideband High-Dynamic Range ADC

Authors: Satish Mulleti, Eliya Reznitskiy, Shlomi Savariego, Moshe Namer, Nimrod Glazer, Yonina C. Eldar

Abstract: Key parameters of analog-to-digital converters (ADCs) are their sampling rate and dynamic range. Power consumption and cost of an ADC are directly proportional to the sampling rate; hence, it is desirable to keep it as low as possible. The dynamic range of an ADC also plays an important role, and ideally, it should be greater than the signal's; otherwise, the signal will be clipped. To avoid clipp… ▽ More Key parameters of analog-to-digital converters (ADCs) are their sampling rate and dynamic range. Power consumption and cost of an ADC are directly proportional to the sampling rate; hence, it is desirable to keep it as low as possible. The dynamic range of an ADC also plays an important role, and ideally, it should be greater than the signal's; otherwise, the signal will be clipped. To avoid clip**, modulo folding can be used before sampling, followed by an unfolding algorithm to recover the true signal. In this paper, we present a modulo hardware prototype that can be used before sampling to avoid clip**. Our modulo hardware operates prior to the sampling mechanism and can fold higher frequency signals compared to existing hardware. We present a detailed design of the hardware and also address key issues that arise during implementation. In terms of applications, we show the reconstruction of finite-rate-of-innovation signals which are beyond the dynamic range of the ADC. Our system operates at six times below the Nyquist rate of the signal and can accommodate eight-times larger signals than the ADC's dynamic range. △ Less

Submitted 29 January, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 11

arXiv:2301.02012 [pdf, other]

Hardware Prototype of a Time-Encoding Sub-Nyquist ADC

Authors: Hila Naaman, Nimrod Glazer, Moshe Namer, Daniel Bilik, Shlomi Savariego, Yonina C. Eldar

Abstract: Analog-to-digital converters (ADCs) are key components of digital signal processing. Classical samplers in this framework are controlled by a global clock. At high sampling rates, clocks are expensive and power-hungry, thus increasing the cost and energy consumption of ADCs. It is, therefore, desirable to sample using a clock-less ADC at the lowest possible rate. An integrate-and-fire time-encodin… ▽ More Analog-to-digital converters (ADCs) are key components of digital signal processing. Classical samplers in this framework are controlled by a global clock. At high sampling rates, clocks are expensive and power-hungry, thus increasing the cost and energy consumption of ADCs. It is, therefore, desirable to sample using a clock-less ADC at the lowest possible rate. An integrate-and-fire time-encoding machine (IF-TEM) is a time-based power-efficient asynchronous design that is not synced to a global clock. Finite-rate-of-innovation (FRI) signals, ubiquitous in various applications, have fewer degrees of freedom than the signal's Nyquist rate, enabling sub-Nyquist sampling signal models. This work proposes a power-efficient IF-TEM ADC architecture and demonstrates sub-Nyquist sampling and FRI signal recovery. Using an IF-TEM, we implement in hardware the first sub-Nyquist time-based sampler. We offer a feasible approach for accurately estimating the FRI parameters from IF-TEM data. The suggested hardware and reconstruction approach retrieves FRI parameters with an error of up to -25dB while operating at rates approximately 10 times lower than the Nyquist rate, paving the way to low-power ADC architectures. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Showing 1–5 of 5 results for author: Glazer, N