-
Keyword-Guided Adaptation of Automatic Speech Recognition
Authors:
Aviv Shamsian,
Aviv Navon,
Neta Glazer,
Gill Hetz,
Joseph Keshet
Abstract:
Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model t…
▽ More
Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches to effectively steer the decoder towards these prompts: KG-Whisper, which is aimed at fine-tuning the Whisper decoder, and KG-Whisper-PT, which learns a prompt prefix. Our results show a significant improvement in the recognition accuracy of specified keywords and in reducing the overall word error rates. Specifically, in unseen language generalization, we demonstrate an average WER improvement of 5.1% over Whisper.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Open-vocabulary Keyword-spotting with Adaptive Instance Normalization
Authors:
Aviv Navon,
Aviv Shamsian,
Neta Glazer,
Gill Hetz,
Joseph Keshet
Abstract:
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encod…
▽ More
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Hardware Implementation of Task-based Quantization in Multi-user Signal Recovery
Authors:
Xing Zhang,
Haiyang Zhang,
Nimrod Glazer,
Oded Cohen,
Eliya Reznitskiy,
Shlomi Savariego,
Moshe Namer,
Yonina C. Eldar
Abstract:
Quantization plays a critical role in digital signal processing systems, allowing the representation of continuous amplitude signals with a finite number of bits. However, accurately representing signals requires a large number of quantization bits, which causes severe cost, power consumption, and memory burden. A promising way to address this issue is task-based quantization. By exploiting the ta…
▽ More
Quantization plays a critical role in digital signal processing systems, allowing the representation of continuous amplitude signals with a finite number of bits. However, accurately representing signals requires a large number of quantization bits, which causes severe cost, power consumption, and memory burden. A promising way to address this issue is task-based quantization. By exploiting the task information for the overall system design, task-based quantization can achieve satisfying performance with low quantization costs. In this work, we apply task-based quantization to multi-user signal recovery and present a hardware prototype implementation. The prototype consists of a tailored configurable combining board, and a software-based processing and demonstration system. Through experiments, we verify that with proper design, the task-based quantization achieves a reduction of 25 fold in memory by reducing from 16 receivers with 16 bits each to 2 receivers with 5 bits each, without compromising signal recovery performance.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
A Hardware Prototype of Wideband High-Dynamic Range ADC
Authors:
Satish Mulleti,
Eliya Reznitskiy,
Shlomi Savariego,
Moshe Namer,
Nimrod Glazer,
Yonina C. Eldar
Abstract:
Key parameters of analog-to-digital converters (ADCs) are their sampling rate and dynamic range. Power consumption and cost of an ADC are directly proportional to the sampling rate; hence, it is desirable to keep it as low as possible. The dynamic range of an ADC also plays an important role, and ideally, it should be greater than the signal's; otherwise, the signal will be clipped. To avoid clipp…
▽ More
Key parameters of analog-to-digital converters (ADCs) are their sampling rate and dynamic range. Power consumption and cost of an ADC are directly proportional to the sampling rate; hence, it is desirable to keep it as low as possible. The dynamic range of an ADC also plays an important role, and ideally, it should be greater than the signal's; otherwise, the signal will be clipped. To avoid clip**, modulo folding can be used before sampling, followed by an unfolding algorithm to recover the true signal. In this paper, we present a modulo hardware prototype that can be used before sampling to avoid clip**. Our modulo hardware operates prior to the sampling mechanism and can fold higher frequency signals compared to existing hardware. We present a detailed design of the hardware and also address key issues that arise during implementation. In terms of applications, we show the reconstruction of finite-rate-of-innovation signals which are beyond the dynamic range of the ADC. Our system operates at six times below the Nyquist rate of the signal and can accommodate eight-times larger signals than the ADC's dynamic range.
△ Less
Submitted 29 January, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Hardware Prototype of a Time-Encoding Sub-Nyquist ADC
Authors:
Hila Naaman,
Nimrod Glazer,
Moshe Namer,
Daniel Bilik,
Shlomi Savariego,
Yonina C. Eldar
Abstract:
Analog-to-digital converters (ADCs) are key components of digital signal processing. Classical samplers in this framework are controlled by a global clock. At high sampling rates, clocks are expensive and power-hungry, thus increasing the cost and energy consumption of ADCs. It is, therefore, desirable to sample using a clock-less ADC at the lowest possible rate. An integrate-and-fire time-encodin…
▽ More
Analog-to-digital converters (ADCs) are key components of digital signal processing. Classical samplers in this framework are controlled by a global clock. At high sampling rates, clocks are expensive and power-hungry, thus increasing the cost and energy consumption of ADCs. It is, therefore, desirable to sample using a clock-less ADC at the lowest possible rate. An integrate-and-fire time-encoding machine (IF-TEM) is a time-based power-efficient asynchronous design that is not synced to a global clock. Finite-rate-of-innovation (FRI) signals, ubiquitous in various applications, have fewer degrees of freedom than the signal's Nyquist rate, enabling sub-Nyquist sampling signal models. This work proposes a power-efficient IF-TEM ADC architecture and demonstrates sub-Nyquist sampling and FRI signal recovery. Using an IF-TEM, we implement in hardware the first sub-Nyquist time-based sampler. We offer a feasible approach for accurately estimating the FRI parameters from IF-TEM data. The suggested hardware and reconstruction approach retrieves FRI parameters with an error of up to -25dB while operating at rates approximately 10 times lower than the Nyquist rate, paving the way to low-power ADC architectures.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.