Skip to main content

Showing 1–12 of 12 results for author: Baskar, M K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14701  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

    Authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Neeraj Gaur, Zhong Meng

    Abstract: In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning. This is a simple approach and does not increase the model complexity or… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2308.07486  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    O-1: Self-training with Oracle and 1-best Hypothesis

    Authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

    Abstract: We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  3. arXiv:2303.05958  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

    Authors: Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran

    Abstract: This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft dis… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  4. arXiv:2204.00770  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Speaker adaptation for Wav2vec2 based dysarthric ASR

    Authors: Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim Polzehl, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: Dysarthric speech recognition has posed major challenges due to lack of training data and heavy mismatch in speaker characteristics. Recent ASR systems have benefited from readily available pretrained models such as wav2vec2 to improve the recognition performance. Speaker adaptation using fMLLR and xvectors have provided major gains for dysarthric speech with very little adaptation data. However,… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  5. arXiv:2202.12719  [pdf, other

    cs.SD cs.CL eess.AS

    Ask2Mask: Guided Data Selection for Masked Speech Modeling

    Authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno

    Abstract: Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant informati… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  6. arXiv:2104.07474  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition

    Authors: Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, Jan "Honza'' Černocký

    Abstract: Self-supervised ASR-TTS models suffer in out-of-domain data conditions. Here we propose an enhanced ASR-TTS (EAT) model that incorporates two main features: 1) The ASR$\rightarrow$TTS direction is equipped with a language model reward to penalize the ASR hypotheses before forwarding it to TTS. 2) In the TTS$\rightarrow$ASR direction, a hyper-parameter is introduced to scale the attention context f… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  7. arXiv:2001.11360  [pdf, ps, other

    eess.AS cs.LG cs.SD

    BUT Opensat 2019 Speech Recognition System

    Authors: Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Hari Krishna Vydana, Karel Veselý, Jan "Honza'' Černocký

    Abstract: The paper describes the BUT Automatic Speech Recognition (ASR) systems submitted for OpenSAT evaluations under two domain categories such as low resourced languages and public safety communications. The first was challenging due to lack of training data, therefore various architectures and multilingual approaches were employed. The combination led to superior performance. The second domain was cha… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: REJECTED in ICASSP 2020

  8. arXiv:1905.01152  [pdf, ps, other

    eess.AS cs.CL cs.IR cs.LG cs.SD

    Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

    Authors: Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký

    Abstract: Sequence-to-sequence automatic speech recognition (ASR) models require large quantities of data to attain high performance. For this reason, there has been a recent surge in interest for unsupervised and semi-supervised training in such models. This work builds upon recent results showing notable improvements in semi-supervised training using cycle-consistency and related techniques. Such techniqu… ▽ More

    Submitted 20 August, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

    Comments: INTERSPEECH 2019

  9. arXiv:1811.03451  [pdf, other

    eess.AS cs.CL cs.LG

    Analysis of Multilingual Sequence-to-Sequence speech recognition systems

    Authors: Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan "Honza'' Černocký

    Abstract: This paper investigates the applications of various multilingual approaches developed in conventional hidden Markov model (HMM) systems to sequence-to-sequence (seq2seq) automatic speech recognition (ASR). On a set composed of Babel data, we first show the effectiveness of multi-lingual training with stacked bottle-neck (SBN) features. Then we explore various architectures and training strategies… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1810.03459

  10. arXiv:1811.02770  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Promising Accurate Prefix Boosting for sequence-to-sequence ASR

    Authors: Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Černocký

    Abstract: In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR. PAPB is devised to unify the training and testing scheme in an effective manner. The training procedure involves maximizing the score of each partial correct sequence obtained during beam search compared to other hypotheses. The training o… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

  11. arXiv:1811.02162  [pdf, other

    eess.AS cs.SD

    Language model integration based on memory control for sequence to sequence speech recognition

    Authors: Jae** Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesus Villalba, Najim Dehak

    Abstract: In this paper, we explore several new schemes to train a seq2seq model to integrate a pre-trained LM. Our proposed fusion methods focus on the memory cell state and the hidden state in the seq2seq decoder long short-term memory (LSTM), and the memory cell state is updated by the LM unlike the prior studies. This means the memory retained by the main seq2seq would be adjusted by the external LM. Th… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 figure, 5 tables, submitted to ICASSP 2019

  12. arXiv:1810.03459  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

    Authors: Jae** Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

    Abstract: Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model a… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.