-
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras
Authors:
Nithya R,
Malavika S,
Jordan F,
Arjun Gangwar,
Metilda N J,
S Umesh,
Rithik Sarab,
Akhilesh Kumar Dubey,
Govind Divakaran,
Samudra Vijaya K,
Suryakanth V Gangashetty
Abstract:
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sour…
▽ More
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.
△ Less
Submitted 24 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
A Bayesian Multilingual Document Model for Zero-shot Topic Identification and Discovery
Authors:
Santosh Kesiraju,
Sangeet Sagar,
Ondřej Glembek,
Lukáš Burget,
Ján Černocký,
Suryakanth V Gangashetty
Abstract:
In this paper, we present a Bayesian multilingual document model for learning language-independent document embeddings. The model is an extension of BaySMM [Kesiraju et al 2020] to the multilingual scenario. It learns to represent the document embeddings in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear…
▽ More
In this paper, we present a Bayesian multilingual document model for learning language-independent document embeddings. The model is an extension of BaySMM [Kesiraju et al 2020] to the multilingual scenario. It learns to represent the document embeddings in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers that benefit zero-shot cross-lingual topic identification. Our experiments on 17 languages show that the proposed multilingual Bayesian document model performs competitively, when compared to other systems based on large-scale neural networks (LASER, XLM-R, mUSE) on 8 high-resource languages, and outperforms these systems on 9 mid-resource languages. We revisit cross-lingual topic identification in zero-shot settings by taking a deeper dive into current datasets, baseline systems and the languages covered. We identify shortcomings in the existing evaluation protocol (MLDoc dataset), and propose a robust alternative scheme, while also extending the cross-lingual experimental setup to 17 languages. Finally, we consolidate the observations from all our experiments, and discuss points that can potentially benefit the future research works in applications relying on cross-lingual transfers.
△ Less
Submitted 23 March, 2024; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Learning document embeddings along with their uncertainties
Authors:
Santosh Kesiraju,
Oldřich Plchot,
Lukáš Burget,
Suryakanth V Gangashetty
Abstract:
Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We present Bayesian subspace multinomial model (Bayesian SMM), a generative log-linear model that learns to represent documents in the form of Gaussian distributions, th…
▽ More
Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We present Bayesian subspace multinomial model (Bayesian SMM), a generative log-linear model that learns to represent documents in the form of Gaussian distributions, thereby encoding the uncertainty in its co-variance. Additionally, in the proposed Bayesian SMM, we address a commonly encountered problem of intractability that appears during variational inference in mixed-logit models. We also present a generative Gaussian linear classifier for topic identification that exploits the uncertainty in document embeddings. Our intrinsic evaluation using perplexity measure shows that the proposed Bayesian SMM fits the data better as compared to the state-of-the-art neural variational document model on Fisher speech and 20Newsgroups text corpora. Our topic identification experiments show that the proposed systems are robust to over-fitting on unseen test data. The topic ID results show that the proposed model is outperforms state-of-the-art unsupervised topic models and achieve comparable results to the state-of-the-art fully supervised discriminative models.
△ Less
Submitted 18 October, 2019; v1 submitted 20 August, 2019;
originally announced August 2019.
-
Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder
Authors:
Sivanand Achanta,
KNRK Raju Alluri,
Suryakanth V Gangashetty
Abstract:
In this paper, we describe a statistical parametric speech synthesis approach with unit-level acoustic representation. In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration of phoneme for map** text and speech parameters. This map** is learnt at the frame-level which is the de-facto acoustic representation. However much of this…
▽ More
In this paper, we describe a statistical parametric speech synthesis approach with unit-level acoustic representation. In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration of phoneme for map** text and speech parameters. This map** is learnt at the frame-level which is the de-facto acoustic representation. However much of this computational requirement can be drastically reduced if every unit can be represented with a fixed-dimensional representation. Using recurrent neural network based auto-encoder, we show that it is indeed possible to map units of varying duration to a single vector. We then use this acoustic representation at unit-level to synthesize speech using deep neural network based statistical parametric speech synthesis technique. Results show that the proposed approach is able to synthesize at the same quality as the conventional frame based approach at a highly reduced computational cost.
△ Less
Submitted 19 June, 2016;
originally announced June 2016.
-
Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis
Authors:
Sivanand Achanta,
Anandaswarup Vadapalli,
Sai Krishna R.,
Suryakanth V. Gangashetty
Abstract:
In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our method…
▽ More
In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our method which parametrizes in the spectral domain itself. In our experiments, spectral envelope estimated using MSASB method was incorporated in the STRAIGHT vocoder. Both objective and subjective results of analysis-by-synthesis indicate that the proposed method is comparable to STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in a statistical parametric speech synthesis framework using deep neural networks.
△ Less
Submitted 3 August, 2015;
originally announced August 2015.