Skip to main content

Showing 1–13 of 13 results for author: Madikeri, S

Searching in archive cs. Search in all archives.
.
  1. Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek

    Abstract: We propose a simple approach for weighting self-connecting edges in a Graph Convolutional Network (GCN) and show its impact on depression detection from transcribed clinical interviews. To this end, we use a GCN for modeling non-consecutive and long-distance semantics to classify the transcriptions into depressed or control subjects. The proposed method aims to mitigate the limiting assumptions of… ▽ More

    Submitted 11 March, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Paper Accepted to Interspeech 2023

    Journal ref: Interspeech 2023

  2. arXiv:2306.15685  [pdf, other

    eess.AS cs.CL

    Implementing contextual biasing in GPU decoder for online ASR

    Authors: Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

    Abstract: GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  3. arXiv:2305.01155  [pdf, other

    eess.AS cs.CL cs.HC cs.SD

    Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

    Authors: Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault, Khalid Choukri

    Abstract: Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: Manuscript under review

  4. arXiv:2212.08489  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

    Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

    Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted in ICASSP 2023

    ACM Class: I.2.7

    Journal ref: ICASSP 2023

  5. A Comparison of Methods for OOV-word Recognition on a New Public Dataset

    Authors: Rudolf A. Braun, Srikanth Madikeri, Petr Motlicek

    Abstract: A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a n… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  6. arXiv:2104.02558  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on thr… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  7. arXiv:2012.14252  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the c… ▽ More

    Submitted 6 April, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

  8. arXiv:2010.12277  [pdf, other

    cs.SD eess.AS

    Speech Activity Detection Based on Multilingual Speech Recognition System

    Authors: Seyyed Saeed Sarfjoo, Srikanth Madikeri, Petr Motlicek

    Abstract: To better model the contextual information and increase the generalization ability of Speech Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform SAD. Sequence discriminative training of Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of th… ▽ More

    Submitted 11 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Submitted to Interspeech 2021

  9. arXiv:2010.03466  [pdf, ps, other

    eess.AS cs.SD

    Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

    Authors: Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

    Abstract: We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to Py… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  10. arXiv:2006.09054  [pdf, other

    eess.AS cs.SD

    Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

    Authors: Amrutha Prasad, Petr Motlicek, Srikanth Madikeri

    Abstract: State-of-the-art hybrid automatic speech recognition (ASR) system exploits deep neural network (DNN) based acoustic models (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. The AMs typically have millions of parameters and require significant parameter reduction to operate on embedded devices. The impact of parameter quantization on the overal… ▽ More

    Submitted 20 November, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Submitted to ICASSP21

  11. arXiv:2006.02093  [pdf, other

    cs.SI cs.SD eess.AS

    Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

    Authors: Mael Fabien, Seyyed Saeed Sarfjoo, Petr Motlicek, Srikanth Madikeri

    Abstract: Criminal investigations mostly rely on the collection of speech conversational data in order to identify speakers and build or enrich an existing criminal network. Social network analysis tools are then applied to identify the most central characters and the different communities within the network. We introduce two candidate datasets for criminal conversational data, Crime Scene Investigation (CS… ▽ More

    Submitted 21 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

  12. arXiv:1906.01496  [pdf, other

    cs.CL cs.LG stat.ML

    Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains

    Authors: Navid Rekabsaz, Nikolaos Pappas, James Henderson, Banriskhem K. Khonglah, Srikanth Madikeri

    Abstract: Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource langu… ▽ More

    Submitted 29 May, 2019; originally announced June 2019.

  13. Incremental Transfer Learning in Two-pass Information Bottleneck based Speaker Diarization System for Meetings

    Authors: Nauman Dawalatabad, Srikanth Madikeri, C Chandra Sekhar, Hema A Murthy

    Abstract: The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN). This pap… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

    Comments: 5 pages, 2 figures, To appear in Proc. ICASSP 2019, May 12-17, 2019, Brighton, UK