Skip to main content

Showing 1–6 of 6 results for author: Kalgaonkar, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.10993  [pdf, other

    cs.SD cs.HC eess.AS

    Directional Source Separation for Robust Speech Recognition on Smart Glasses

    Authors: Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

    Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  2. arXiv:2211.03643  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Egocentric Audio-Visual Noise Suppression

    Authors: Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

    Abstract: This paper studies audio-visual noise suppression for egocentric videos -- where the speaker is not captured in the video. Instead, potential noise sources are visible on screen with the camera emulating the off-screen speaker's view of the outside world. This setting is different from prior work in audio-visual speech enhancement that relies on lip and facial visuals. In this paper, we first demo… ▽ More

    Submitted 2 May, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  3. arXiv:2211.00589  [pdf, other

    eess.AS cs.SD eess.SP

    SCA: Streaming Cross-attention Alignment for Echo Cancellation

    Authors: Yang Liu, Yangyang Shi, Yun Li, Kaustubh Kalgaonkar, Sriram Srinivasan, Xin Lei

    Abstract: End-to-End deep learning has shown promising results for speech enhancement tasks, such as noise suppression, dereverberation, and speech separation. However, most state-of-the-art methods for echo cancellation are either classical DSP-based or hybrid DSP-ML algorithms. Components such as the delay estimator and adaptive linear filter are based on traditional signal processing concepts, and deep l… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  4. arXiv:1911.02115  [pdf, ps, other

    eess.AS cs.SD

    Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

    Authors: Weipeng He, Lu Lu, Biqiao Zhang, Jay Mahadeokar, Kaustubh Kalgaonkar, Christian Fuegen

    Abstract: In this paper, we introduce spatial attention for refining the information in multi-direction neural beamformer for far-field automatic speech recognition. Previous approaches of neural beamformers with multiple look directions, such as the factored complex linear projection, have shown promising results. However, the features extracted by such methods contain redundant information, as only the di… ▽ More

    Submitted 9 March, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: To be presented at ICASSP 2020

  5. arXiv:1911.01629  [pdf, other

    cs.CL cs.LG eess.AS

    RNN-T For Latency Controlled ASR With Improved Beam Search

    Authors: Mahaveer Jain, Kjell Schubert, Jay Mahadeokar, Ching-Feng Yeh, Kaustubh Kalgaonkar, Anuroop Sriram, Christian Fuegen, Michael L. Seltzer

    Abstract: Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we inve… ▽ More

    Submitted 16 January, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

  6. arXiv:1910.12977  [pdf, other

    eess.AS cs.CL cs.SD

    Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

    Authors: Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer

    Abstract: We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with advantages in parallel computation and capturing contexts. We propose 1) using VGGNet with causal convolution to incorporate positional information and reduce frame rate for efficient inference 2) using truncated self-at… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.