-
Grad-Shafranov equilibria via data-free physics informed neural networks
Authors:
Byoungchan Jang,
Alan A. Kaptanoglu,
Rahul Gaur,
Shaowu Pan,
Matt Landreman,
William Dorland
Abstract:
A large number of magnetohydrodynamic (MHD) equilibrium calculations are often required for uncertainty quantification, optimization, and real-time diagnostic information, making MHD equilibrium codes vital to the field of plasma physics. In this paper, we explore a method for solving the Grad-Shafranov equation by using Physics-Informed Neural Networks (PINNs). For PINNs, we optimize neural netwo…
▽ More
A large number of magnetohydrodynamic (MHD) equilibrium calculations are often required for uncertainty quantification, optimization, and real-time diagnostic information, making MHD equilibrium codes vital to the field of plasma physics. In this paper, we explore a method for solving the Grad-Shafranov equation by using Physics-Informed Neural Networks (PINNs). For PINNs, we optimize neural networks by directly minimizing the residual of the PDE as a loss function. We show that PINNs can accurately and effectively solve the Grad-Shafranov equation with several different boundary conditions. We also explore the parameter space by varying the size of the model, the learning rate, and boundary conditions to map various trade-offs such as between reconstruction error and computational speed. Additionally, we introduce a parameterized PINN framework, expanding the input space to include variables such as pressure, aspect ratio, elongation, and triangularity in order to handle a broader range of plasma scenarios within a single network. Parametrized PINNs could be used in future work to solve inverse problems such as shape optimization.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Speaker Recognition in the Wild
Authors:
Neeraj Chhimwal,
Anirudh Gupta,
Rishabh Gaur,
Harveen Singh Chadha,
Priyanshi Shah,
Ankur Dhuriya,
Vivek Raghavan
Abstract:
In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-exper…
▽ More
In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation). To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness. Cluster Purity quantifies how "pure" a cluster is. Cluster Uniqueness, on the other hand, quantifies what percentage of clusters belong only to a single dominant speaker. We discuss more on these metrics in section \ref{sec:metrics}. Since we develop this utility to aid us in identifying data based on speaker IDs before training an Automatic Speech Recognition (ASR) model, and since most of this data takes considerable effort to scrape, we also conclude that 98\% of data gets mapped to the top 80\% of clusters (computed by removing any clusters with less than a fixed number of utterances -- we do this to get rid of some very small clusters and use this threshold as 30), in the test set chosen.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages
Authors:
Anirudh Gupta,
Neeraj Chhimwal,
Ankur Dhuriya,
Rishabh Gaur,
Priyanshi Shah,
Harveen Singh Chadha,
Vivek Raghavan
Abstract:
Automatic Speech Recognition (ASR) generates text which is most of the times devoid of any punctuation. Absence of punctuation is text can affect readability. Also, down stream NLP tasks such as sentiment analysis, machine translation, greatly benefit by having punctuation and sentence boundary information. We present an approach for automatic punctuation of text using a pretrained IndicBERT model…
▽ More
Automatic Speech Recognition (ASR) generates text which is most of the times devoid of any punctuation. Absence of punctuation is text can affect readability. Also, down stream NLP tasks such as sentiment analysis, machine translation, greatly benefit by having punctuation and sentence boundary information. We present an approach for automatic punctuation of text using a pretrained IndicBERT model. Inverse text normalization is done by hand writing weighted finite state transducer (WFST) grammars. We have developed this tool for 11 Indic languages namely Hindi, Tamil, Telugu, Kannada, Gujarati, Marathi, Odia, Bengali, Assamese, Malayalam and Punjabi. All code and data is publicly. available
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition
Authors:
Anirudh Gupta,
Rishabh Gaur,
Ankur Dhuriya,
Harveen Singh Chadha,
Neeraj Chhimwal,
Priyanshi Shah,
Vivek Raghavan
Abstract:
In the recent years end to end (E2E) automatic speech recognition (ASR) systems have achieved promising results given sufficient resources. Even for languages where not a lot of labelled data is available, state of the art E2E ASR systems can be developed by pretraining on huge amounts of high resource languages and finetune on low resource languages. For a lot of low resource languages the curren…
▽ More
In the recent years end to end (E2E) automatic speech recognition (ASR) systems have achieved promising results given sufficient resources. Even for languages where not a lot of labelled data is available, state of the art E2E ASR systems can be developed by pretraining on huge amounts of high resource languages and finetune on low resource languages. For a lot of low resource languages the current approaches are still challenging, since in many cases labelled data is not available in open domain. In this paper we present an approach to create labelled data for Maithili, Bhojpuri and Dogri by utilising pseudo labels from text to speech for forced alignment. The created data was inspected for quality and then further used to train a transformer based wav2vec 2.0 ASR model. All data and models are available in open domain.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?
Authors:
Priyanshi Shah,
Harveen Singh Chadha,
Anirudh Gupta,
Ankur Dhuriya,
Neeraj Chhimwal,
Rishabh Gaur,
Vivek Raghavan
Abstract:
We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large…
▽ More
We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large character set. We call our metrics Alternate Word Error Rate (AWER) and Alternate Character Error Rate (ACER).
We train our ASR models using wav2vec 2.0\cite{baevski2020wav2vec} for Indic languages. Additionally we use language models to improve our model performance. Our results show a significant improvement in analyzing the error rates at word and character level and the interpretability of the ASR system is improved upto $3$\% in AWER and $7$\% in ACER for Hindi. Our experiments suggest that in languages which have complex pronunciation, there are multiple ways of writing words without changing their meaning. In such cases AWER and ACER will be more useful rather than WER and CER as metrics. Further, we open source a new benchmarking dataset of 21 hours for Hindi with the new metric scripts.
△ Less
Submitted 15 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Improving Speech Recognition for Indic Languages using Language Model
Authors:
Ankur Dhuriya,
Harveen Singh Chadha,
Anirudh Gupta,
Priyanshi Shah,
Neeraj Chhimwal,
Rishabh Gaur,
Vivek Raghavan
Abstract:
We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources. Our findings demonstrate that the average Character Error Rate (CER) decreases by over $28$ \% and the average…
▽ More
We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources. Our findings demonstrate that the average Character Error Rate (CER) decreases by over $28$ \% and the average Word Error Rate (WER) decreases by about $36$ \% after decoding with LM. We show that a large LM may not provide a substantial improvement as compared to a diverse one. We also demonstrate that high quality transcriptions can be obtained on domain-specific data without retraining the ASR model and show results on biomedical domain.
△ Less
Submitted 15 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Vakyansh: ASR Toolkit for Low Resource Indic languages
Authors:
Harveen Singh Chadha,
Anirudh Gupta,
Priyanshi Shah,
Neeraj Chhimwal,
Ankur Dhuriya,
Rishabh Gaur,
Vivek Raghavan
Abstract:
We present Vakyansh, an end to end toolkit for Speech Recognition in Indic languages. India is home to almost 121 languages and around 125 crore speakers. Yet most of the languages are low resource in terms of data and pretrained models. Through Vakyansh, we introduce automatic data pipelines for data creation, model training, model evaluation and deployment. We create 14,000 hours of speech data…
▽ More
We present Vakyansh, an end to end toolkit for Speech Recognition in Indic languages. India is home to almost 121 languages and around 125 crore speakers. Yet most of the languages are low resource in terms of data and pretrained models. Through Vakyansh, we introduce automatic data pipelines for data creation, model training, model evaluation and deployment. We create 14,000 hours of speech data in 23 Indic languages and train wav2vec 2.0 based pretrained models. These pretrained models are then finetuned to create state of the art speech recognition models for 18 Indic languages which are followed by language models and punctuation restoration models. We open source all these resources with a mission that this will inspire the speech community to develop speech first applications using our ASR models in Indic languages.
△ Less
Submitted 15 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Conditions for Advantageous Quantum Bitcoin Mining
Authors:
Robert R. Nerem,
Daya R. Gaur
Abstract:
Our aim is to determine conditions for quantum computing technology to give rise to security risks associated with quantum Bitcoin mining. Specifically, we determine the speed and energy efficiency a quantum computer needs to offer an advantage over classical mining. We analyze the setting in which the Bitcoin network is entirely classical except for a single quantum miner who has small hash rate…
▽ More
Our aim is to determine conditions for quantum computing technology to give rise to security risks associated with quantum Bitcoin mining. Specifically, we determine the speed and energy efficiency a quantum computer needs to offer an advantage over classical mining. We analyze the setting in which the Bitcoin network is entirely classical except for a single quantum miner who has small hash rate compared to that of the network. We develop a closed-form approximation for the probability that the quantum miner successfully mines a block, with this probability dependent on the number of Grover iterations the quantum miner applies before making a measurement. Next, we show that, for a quantum miner that is "peaceful", this success probability is maximized if the quantum miner applies Grover iterations for 16 minutes before measuring, which is surprising as the network mines blocks every 10 minutes on average. Using this optimal mining procedure, we show that the quantum miner outperforms a classical computer in efficiency (cost per block) if the condition $Q < Crb$ is satisfied, where $Q$ is the cost of a Grover iteration, $C$ is the cost of a classical hash, $r$ is the quantum miner's speed in Grover iterations per second, and $b$ is a factor that attains its maximum if the quantum miner uses our optimal mining procedure. This condition lays the foundation for determining when quantum mining, and the known security risks associated with it, will arise.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
CLSRIL-23: Cross Lingual Speech Representations for Indic Languages
Authors:
Anirudh Gupta,
Harveen Singh Chadha,
Priyanshi Shah,
Neeraj Chhimwal,
Ankur Dhuriya,
Rishabh Gaur,
Vivek Raghavan
Abstract:
We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise…
▽ More
We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5% is observed in WER and 9.5% in CER when a multilingual pretrained model is used for finetuning in Hindi. All the code models are also open sourced. CLSRIL-23 is a model trained on $23$ languages and almost 10,000 hours of audio data to facilitate research in speech recognition for Indic languages. We hope that new state of the art systems will be created using the self supervised approach, especially for low resources Indic languages.
△ Less
Submitted 13 January, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.