Skip to main content

Showing 1–9 of 9 results for author: Dighe, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.15261  [pdf, ps, other

    cs.SD cs.HC cs.LG eess.AS

    Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

    Authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

    Abstract: Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 5 pages

  2. arXiv:2309.04842  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Leveraging Large Language Models for Exploiting ASR Uncertainty

    Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

    Abstract: While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Added references

  3. arXiv:2210.12134  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

    Authors: Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

    Abstract: Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  4. arXiv:2203.15975  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

    Authors: Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

    Abstract: We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH 2022

  5. arXiv:2110.04656  [pdf, other

    cs.SD cs.LG eess.AS

    Streaming on-device detection of device directed speech from voice and touch-based invocation

    Authors: Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

    Abstract: When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-t… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

  6. arXiv:2105.06598  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

    Authors: Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

    Abstract: We present a unified and hardware efficient architecture for two stage voice trigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants can get falsely activated to audio segments acoustically similar to the trigger phrase of interest. FTM systems cancel such activations by using post trigger audio context. Traditional FTM systems rely on automatic… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  7. arXiv:2010.10591  [pdf, other

    eess.AS cs.LG cs.SD

    Knowledge Transfer for Efficient On-device False Trigger Mitigation

    Authors: Pranay Dighe, Erik Marchi, Srikanth Vishnubhotla, Sachin Kajarekar, Devang Naik

    Abstract: In this paper, we address the task of determining whether a given utterance is directed towards a voice-enabled smart-assistant device or not. An undirected utterance is termed as a "false trigger" and false trigger mitigation (FTM) is essential for designing a privacy-centric non-intrusive smart assistant. The directedness of an utterance can be identified by running automatic speech recognition… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  8. arXiv:2008.08113  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation

    Authors: Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik

    Abstract: False triggers in voice assistants are unintended invocations of the assistant, which not only degrade the user experience but may also compromise privacy. False trigger mitigation (FTM) is a process to detect the false trigger events and respond appropriately to the user. In this paper, we propose a novel solution to the FTM problem by introducing a parallel ASR decoding process with a special la… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

  9. arXiv:2001.10822  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Lattice-based Improvements for Voice Triggering Using Graph Neural Networks

    Authors: Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik, Adithya Sagar, Ying Ma, Stephen Pulman, Jason Williams

    Abstract: Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.