Skip to main content

Showing 1–5 of 5 results for author: Rikhye, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.15566  [pdf

    eess.IV cs.CV cs.LG

    Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings

    Authors: Rajeev V. Rikhye, Aaron Loh, Grace Eunhae Hong, Preeti Singh, Margaret Ann Smith, Vijaytha Muralidharan, Doris Wong, Rory Sayres, Michelle Phung, Nicolas Betancourt, Bradley Fong, Rachna Sahasrabudhe, Khoban Nasim, Alec Eschholz, Basil Mustafa, Jan Freyberg, Terry Spitz, Yossi Matias, Greg S. Corrado, Katherine Chou, Dale R. Webster, Peggy Bui, Yuan Liu, Yun Liu, Justin Ko , et al. (1 additional authors not shown)

    Abstract: Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  2. arXiv:2204.03793  [pdf, other

    eess.AS cs.LG cs.SD

    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

    Authors: Shao** Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

    Abstract: Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although… ▽ More

    Submitted 24 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2022

  3. arXiv:2202.12169  [pdf, other

    eess.AS cs.LG stat.ML

    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw

    Abstract: VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlap** speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as… ▽ More

    Submitted 26 April, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  4. arXiv:2107.01201  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw

    Abstract: In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated… ▽ More

    Submitted 8 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

  5. arXiv:2104.13970  [pdf, other

    eess.AS cs.LG cs.SD

    Personalized Keyphrase Detection using Speaker and Environment Information

    Authors: Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng, Huang, Arun Narayanan, Ian McGraw

    Abstract: In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditio… ▽ More

    Submitted 15 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.