Skip to main content

Showing 1–32 of 32 results for author: Kopparapu, S K

.
  1. arXiv:2406.02563  [pdf, other

    eess.AS cs.CL cs.SD

    A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system

    Authors: Sunil Kumar Kopparapu, Ashish Panda

    Abstract: Unlike hybrid speech recognition systems where the use of tokens was restricted to phones, biphones or triphones the choice of tokens in the end-to-end ASR systems is derived from the text corpus of the training data. The use of tokenization algorithms like Byte Pair Encoding (BPE) and WordPiece is popular in identifying the tokens that are used in the overall training process of the speech recogn… ▽ More

    Submitted 29 April, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2306.08012  [pdf, other

    cs.SD cs.CL eess.AS

    A Novel Scheme to classify Read and Spontaneous Speech

    Authors: Sunil Kumar Kopparapu

    Abstract: The COVID-19 pandemic has led to an increased use of remote telephonic interviews, making it important to distinguish between scripted and spontaneous speech in audio recordings. In this paper, we propose a novel scheme for identifying read and spontaneous speech. Our approach uses a pre-trained DeepSpeech audio-to-alphabet recognition engine to generate a sequence of alphabets from the audio. Fro… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 14 pages, 8 figures

  3. arXiv:2304.03169   

    cs.CL cs.SD eess.AS

    Selective Data Augmentation for Robust Speech Translation

    Authors: Rajul Acharya, Ashish Panda, Sunil Kumar Kopparapu

    Abstract: Speech translation (ST) systems translate speech in one language to text in another language. End-to-end ST systems (e2e-ST) have gained popularity over cascade systems because of their enhanced performance due to reduced latency and computational cost. Though resource intensive, e2e-ST systems have the inherent ability to retain para and non-linguistic characteristics of the speech unlike cascade… ▽ More

    Submitted 25 April, 2023; v1 submitted 22 March, 2023; originally announced April 2023.

    Comments: Did not realize that the experiments and the analysis based on the experiments were incomplete

  4. arXiv:2210.06354  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

    Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that corre… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: 9 pages, 8 figures,

  5. arXiv:2203.13259  [pdf, other

    eess.AS cs.AI cs.SD

    Computing Optimal Location of Microphone for Improved Speech Recognition

    Authors: Karan Nathwani, Bhavya Dixit, Sunil Kumar Kopparapu

    Abstract: It was shown in our earlier work that the measurement error in the microphone position affected the room impulse response (RIR) which in turn affected the single-channel close microphone and multi-channel distant microphone speech recognition. In this paper, as an extension, we systematically study to identify the optimal location of the microphone, given an approximate and hence erroneous locatio… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: 5 pages

  6. arXiv:2202.03271  [pdf, ps, other

    eess.SP cs.LG

    Spectro Temporal EEG Biomarkers For Binary Emotion Classification

    Authors: Upasana Tiwari, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Electroencephalogram (EEG) is one of the most reliable physiological signal for emotion detection. Being non-stationary in nature, EEGs are better analysed by spectro temporal representations. Standard features like Discrete Wavelet Transformation (DWT) can represent temporal changes in spectral dynamics of an EEG, but is insufficient to extract information other way around, i.e. spectral changes… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  7. arXiv:2201.12352  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Audio Captioning using Attention weighted Event based Embeddings

    Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Automatic Audio Captioning (AAC) refers to the task of translating audio into a natural language that describes the audio events, source of the events and their relationships. The limited samples in AAC datasets at present, has set up a trend to incorporate transfer learning with Audio Event Detection (AED) as a parent task. Towards this direction, in this paper, we propose an encoder-decoder arch… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

  8. arXiv:2201.09470  [pdf, other

    eess.AS cs.CR cs.SD

    Synthetic speech detection using meta-learning with prototypical loss

    Authors: Monisankha Pal, Aditya Raikar, Ashish Panda, Sunil Kumar Kopparapu

    Abstract: Recent works on speech spoofing countermeasures still lack generalization ability to unseen spoofing attacks. This is one of the key issues of ASVspoof challenges especially with the rapid development of diverse and high-quality spoofing algorithms. In this work, we address the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unsee… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  9. arXiv:2103.13823  [pdf, ps, other

    cs.LG cs.AI

    A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

    Authors: Ayush Tripathi, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority classes in the imbalanced dataset. In this paper, we propose a novel three step technique to address imbalanced data. As a first step we significantly oversample the… ▽ More

    Submitted 26 March, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 8 pages

    Journal ref: ICPR 2020

  10. arXiv:2103.06157  [pdf, other

    cs.SD cs.AI eess.AS

    Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System

    Authors: Ayush Tripathi, Swapnil Bhosale, Sunil Kumar Kopparapu

    Abstract: Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery. The loss of fine control over muscles that assist the movement of lips, vocal chords, tongue and diaphragm results in abnormal speech delivery. One can assess the severity level of dysarthria by analyzing the intelligibility of speech spoken by an individual. Conti… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 29 pages, 2 figures, Computer Speech & Language 2021

  11. arXiv:2102.08074  [pdf, other

    cs.SD cs.LG eess.AS

    Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining

    Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. Prototypical networks incorporate few-shot metric learning, by constructing a class prototype in the form of a mean vector of the embedded support points within a class. The performance of prototypical networks in extreme few-shot scenarios (like one-shot) degrades drastically, mainl… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 5 pages

  12. arXiv:2002.12788  [pdf, other

    eess.AS cs.AI

    Identification of Dementia Using Audio Biomarkers

    Authors: Rupayan Chakraborty, Meghna Pandharipande, Chitralekha Bhat, Sunil Kumar Kopparapu

    Abstract: Dementia is a syndrome, generally of a chronic nature characterized by a deterioration in cognitive function, especially in the geriatric population and is severe enough to impact their daily activities. Early diagnosis of dementia is essential to provide timely treatment to alleviate the effects and sometimes to slow the progression of dementia. Speech has been known to provide an indication of a… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures

  13. arXiv:1912.11151  [pdf, other

    eess.AS cs.CL cs.SD

    A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

    Authors: Sri Harsha Dumpala, Imran Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: 7 pages, 3 figures, ICASSP-2019

  14. arXiv:1911.01421  [pdf, ps, other

    cs.CL cs.LG

    A Deep Learning approach for Hindi Named Entity Recognition

    Authors: Bansi Shah, Sunil Kumar Kopparapu

    Abstract: Named Entity Recognition is one of the most important text processing requirement in many NLP tasks. In this paper we use a deep architecture to accomplish the task of recognizing named entities in a given Hindi text sentence. Bidirectional Long Short Term Memory (BiLSTM) based techniques have been used for NER task in literature. In this paper, we first tune BiLSTM low-resource scenario to work f… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 7 pages; work done during internship at TCS

  15. arXiv:1712.05608  [pdf, other

    cs.CL cs.SD eess.AS

    A Novel Approach for Effective Learning in Low Resourced Scenarios

    Authors: Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Deep learning based discriminative methods, being the state-of-the-art machine learning techniques, are ill-suited for learning from lower amounts of data. In this paper, we propose a novel framework, called simultaneous two sample learning (s2sL), to effectively learn the class discriminative characteristics, even from very low amount of data. In s2sL, more than one sample (here, two samples) are… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

    Comments: Presented at NIPS 2017 Machine Learning for Audio Signal Processing (ML4Audio) Workshop, Dec. 2017

  16. arXiv:1710.06923  [pdf, other

    cs.CL cs.AI

    Adapting general-purpose speech recognition engine output for domain-specific natural language question answering

    Authors: C. Anantaram, Sunil Kumar Kopparapu

    Abstract: Speech-based natural language question-answering interfaces to enterprise systems are gaining a lot of attention. General-purpose speech engines can be integrated with NLP systems to provide such interfaces. Usually, general-purpose speech engines are trained on large `general' corpus. However, when such engines are used for specific domains, they may not recognize domain-specific words well, and… ▽ More

    Submitted 12 October, 2017; originally announced October 2017.

    Comments: 20 opages

  17. arXiv:1705.09289  [pdf, other

    cs.SD

    Improved I-vector-based Speaker Recognition for Utterances with Speaker Generated Non-speech sounds

    Authors: Sri Harsha Dumpala, Ashish Panda, Sunil Kumar Kopparapu

    Abstract: Conversational speech not only contains several variants of neutral speech but is also prominently interlaced with several speaker generated non-speech sounds such as laughter and breath. A robust speaker recognition system should be capable of recognizing a speaker irrespective of these variations in his speech. An understanding of whether the speaker-specific information represented by these var… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

  18. arXiv:1704.07055  [pdf, other

    cs.LG cs.NE

    k-FFNN: A priori knowledge infused Feed-forward Neural Networks

    Authors: Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Recurrent neural network (RNN) are being extensively used over feed-forward neural networks (FFNN) because of their inherent capability to capture temporal relationships that exist in the sequential data such as speech. This aspect of RNN is advantageous especially when there is no a priori knowledge about the temporal correlations within the data. However, RNNs require large amount of data to lea… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

  19. arXiv:1601.02605  [pdf, other

    cs.CY cs.HC

    A Mobile Phone based Speech Therapist

    Authors: Vinod K. Pandey, Arun Pande, Sunil Kumar Kopparapu

    Abstract: Patients with articulatory disorders often have difficulty in speaking. These patients need several speech therapy sessions to enable them speak normally. These therapy sessions are conducted by a specialized speech therapist. The goal of speech therapy is to develop good speech habits as well as to teach how to articulate sounds the right way. Speech therapy is critical for continuous improvement… ▽ More

    Submitted 11 January, 2016; originally announced January 2016.

    Comments: 6 pages, 6 figures, SimPe. [2011] Remote Speech Therapist Vinod Pandey, Arun Pande, Sunil Kopparapu SiMPE 2011, Stockholm, Sweden, Aug 2011

  20. arXiv:1601.02543  [pdf, other

    cs.CL cs.AI cs.HC

    Evaluating the Performance of a Speech Recognition based System

    Authors: Vinod Kumar Pandey, Sunil Kumar Kopparapu

    Abstract: Speech based solutions have taken center stage with growth in the services industry where there is a need to cater to a very large number of people from all strata of the society. While natural language speech interfaces are the talk in the research community, yet in practice, menu based speech solutions thrive. Typically in a menu based speech solution the user is required to respond by speaking… ▽ More

    Submitted 11 January, 2016; originally announced January 2016.

    Comments: 7 pages, 2 figure, ACC 2011

    Journal ref: ACC (3) 2011: 230-238

  21. Voice based self help System: User Experience Vs Accuracy

    Authors: Sunil Kumar Kopparapu

    Abstract: In general, self help systems are being increasingly deployed by service based industries because they are capable of delivering better customer service and increasingly the switch is to voice based self help systems because they provide a natural interface for a human to interact with a machine. A speech based self help system ideally needs a speech recognition engine to convert spoken speech to… ▽ More

    Submitted 7 April, 2015; originally announced April 2015.

    Comments: 5 pages; 1 figure

  22. arXiv:1504.01488  [pdf, other

    cs.CV

    On-line Handwritten Devanagari Character Recognition using Fuzzy Directional Features

    Authors: Sunil Kumar Kopparapu, Lajish VL

    Abstract: This paper describes a new feature set for use in the recognition of on-line handwritten Devanagari script based on Fuzzy Directional Features. Experiments are conducted for the automatic recognition of isolated handwritten character primitives (sub-character units). Initially we describe the proposed feature set, called the Fuzzy Directional Features (FDF) and then show how these features can be… ▽ More

    Submitted 7 April, 2015; originally announced April 2015.

    Comments: 6 pages; 2009

  23. arXiv:1504.01476  [pdf, other

    cs.CV

    Mobile Phone Based Vehicle License Plate Recognition for Road Policing

    Authors: Lajish V. L., Sunil Kumar Kopparapu

    Abstract: Identity of a vehicle is done through the vehicle license plate by traffic police in general. Au- tomatic vehicle license plate recognition has several applications in intelligent traffic management systems. The security situation across the globe and particularly in India demands a need to equip the traffic police with a system that enables them to get instant details of a vehicle. The system sho… ▽ More

    Submitted 7 April, 2015; originally announced April 2015.

    Comments: 7 pages; PReMI Experiential Workshop, Delhi

  24. A Rule-Based Short Query Intent Identification System

    Authors: Arijit De, Sunil Kumar Kopparapu

    Abstract: Using SMS (Short Message System), cell phones can be used to query for information about various topics. In an SMS based search system, one of the key problems is to identify a domain (broad topic) associated with the user query; so that a more comprehensive search can be carried out by the domain specific search engine. In this paper we use a rule based approach, to identify the domain, called Sh… ▽ More

    Submitted 25 March, 2015; originally announced March 2015.

    Comments: 5 pages, 2010 International Conference on Signal and Image Processing (ICSIP)

  25. arXiv:1501.02887  [pdf, other

    cs.CV

    Online Handwritten Devanagari Stroke Recognition Using Extended Directional Features

    Authors: Lajish VL, Sunil Kumar Kopparapu

    Abstract: This paper describes a new feature set, called the extended directional features (EDF) for use in the recognition of online handwritten strokes. We use EDF specifically to recognize strokes that form a basis for producing Devanagari script, which is the most widely used Indian language script. It should be noted that stroke recognition in handwritten script is equivalent to phoneme recognition in… ▽ More

    Submitted 11 January, 2015; originally announced January 2015.

    Comments: 8th International Conference on Signal Processing and Communication Systems 15 - 17 December 2014, Gold Coast, Australia

  26. arXiv:1410.7382  [pdf, other

    cs.CL cs.SD

    Modified Mel Filter Bank to Compute MFCC of Subsampled Speech

    Authors: Kiran Kumar Bhuvanagiri, Sunil Kumar Kopparapu

    Abstract: Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the propo… ▽ More

    Submitted 25 October, 2014; originally announced October 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1410.6903

  27. arXiv:1410.6909  [pdf, other

    cs.CV

    A Framework for On-Line Devanagari Handwritten Character Recognition

    Authors: Sunil Kumar Kopparapu, Lajish V. L

    Abstract: The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In this paper we propose a framework for on-line handwitten script recognition taking cues from speech signal processing literature. The framework is based on identify- ing… ▽ More

    Submitted 25 October, 2014; originally announced October 2014.

    Comments: 29 pages

  28. On the use of Stress information in Speech for Speaker Recognition

    Authors: Laxmi Narayana M., Sunil Kumar Kopparapu

    Abstract: The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, n… ▽ More

    Submitted 25 October, 2014; originally announced October 2014.

  29. arXiv:1410.6903  [pdf, other

    cs.SD cs.CL

    Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech

    Authors: Laxmi Narayana M., Sunil Kumar Kopparapu

    Abstract: Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this paper, we study the effect of resampling a speech signal on these speech features. We first derive a relationship between the MFCC param- eters of the resampled speech and the MFCC parameters of the original speech. We propose six methods of calculati… ▽ More

    Submitted 25 October, 2014; originally announced October 2014.

  30. Music and Vocal Separation Using Multi-Band Modulation Based Features

    Authors: Sunil Kumar Kopparapu, Meghna Pandharipande, G Sita

    Abstract: The potential use of non-linear speech features has not been investigated for music analysis although other commonly used speech features like Mel Frequency Ceptral Coefficients (MFCC) and pitch have been used extensively. In this paper, we assume an audio signal to be a sum of modulated sinusoidal and then use the energy separation algorithm to decompose the audio into amplitude and frequency mod… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

    Comments: 5 pages, 5 figures, 2010 IEEE Symposium on Industrial Electronics Applications (ISIEA)

  31. arXiv:1406.1280  [pdf, other

    cs.CL

    Basis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names

    Authors: Sunil Kumar Kopparapu, M Laxminarayana

    Abstract: Development of a proper names pronunciation lexicon is usually a manual effort which can not be avoided. Grapheme to phoneme (G2P) conversion modules, in literature, are usually rule based and work best for non-proper names in a particular language. Proper names are foreign to a G2P module. We follow an optimization approach to enable automatic construction of proper names pronunciation lexicon. T… ▽ More

    Submitted 5 June, 2014; originally announced June 2014.

  32. arXiv:1403.6901  [pdf, other

    cs.SD cs.LG cs.MM

    Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix

    Authors: Sapna Soni, Ahmed Imran, Sunil Kumar Kopparapu

    Abstract: Generally audio news broadcast on radio is com- posed of music, commercials, news from correspondents and recorded statements in addition to the actual news read by the newsreader. When news transcripts are available, automatic segmentation of audio news broadcast to time align the audio with the text transcription to build frugal speech corpora is essential. We address the problem of identifying… ▽ More

    Submitted 26 March, 2014; originally announced March 2014.

    Comments: 4 pages, 5 images