-
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data
Authors:
Ishaan Watts,
Varun Gumma,
Aditya Yadavalli,
Vivek Seshadri,
Manohar Swaminathan,
Sunayana Sitaram
Abstract:
Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors -- the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. In this work, we study human and LLM-based evaluation in a multilingual, multi-cultural setting. We evaluate…
▽ More
Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors -- the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. In this work, we study human and LLM-based evaluation in a multilingual, multi-cultural setting. We evaluate 30 models across 10 Indic languages by conducting 90K human evaluations and 30K LLM-based evaluations and find that models such as GPT-4o and Llama-3 70B consistently perform best for most Indic languages. We build leaderboards for two evaluation settings - pairwise comparison and direct assessment and analyse the agreement between humans and LLMs. We find that humans and LLMs agree fairly well in the pairwise setting but the agreement drops for direct assessment evaluation especially for languages such as Bengali and Odia. We also check for various biases in human and LLM-based evaluation and find evidence of self-bias in the GPT-based evaluator. Our work presents a significant step towards scaling up multilingual evaluation of LLMs.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology
Authors:
Rishav Hada,
Safiya Husain,
Varun Gumma,
Harshita Diddee,
Aditya Yadavalli,
Agrima Seth,
Nidhi Kulkarni,
Ujwal Gadiraju,
Aditya Vashistha,
Vivek Seshadri,
Kalika Bali
Abstract:
Existing research in measuring and mitigating gender bias predominantly centers on English, overlooking the intricate challenges posed by non-English languages and the Global South. This paper presents the first comprehensive study delving into the nuanced landscape of gender bias in Hindi, the third most spoken language globally. Our study employs diverse mining techniques, computational models,…
▽ More
Existing research in measuring and mitigating gender bias predominantly centers on English, overlooking the intricate challenges posed by non-English languages and the Global South. This paper presents the first comprehensive study delving into the nuanced landscape of gender bias in Hindi, the third most spoken language globally. Our study employs diverse mining techniques, computational models, field studies and sheds light on the limitations of current methodologies. Given the challenges faced with mining gender biased statements in Hindi using existing methods, we conducted field studies to bootstrap the collection of such sentences. Through field studies involving rural and low-income community women, we uncover diverse perceptions of gender bias, underscoring the necessity for context-specific approaches. This paper advocates for a community-centric research design, amplifying voices often marginalized in previous studies. Our findings not only contribute to the understanding of gender bias in Hindi but also establish a foundation for further exploration of Indic languages. By exploring the intricacies of this understudied context, we call for thoughtful engagement with gender bias, promoting inclusivity and equity in linguistic and cultural contexts beyond the Global North.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
Authors:
Abraham Toluwase Owodunni,
Aditya Yadavalli,
Chris Chinenye Emezue,
Tobi Olatunji,
Clinton C Mbataku
Abstract:
Despite advancements in speech recognition, accented speech remains challenging. While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose Ac…
▽ More
Despite advancements in speech recognition, accented speech remains challenging. While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR). Our exploratory analysis of speech embeddings representing 100+ African accents reveals interesting spatial accent relationships highlighting geographic and genealogical similarities, capturing consistent phonological, and morphological regularities, all learned empirically from speech. Furthermore, we discover accent relationships previously uncharacterized by the Ethnologue. Through empirical evaluation, we demonstrate the effectiveness of AccentFold by showing that, for out-of-distribution (OOD) accents, sampling accent subsets for training based on AccentFold information outperforms strong baselines a relative WER improvement of 4.6%. AccentFold presents a promising approach for improving ASR performance on accented speech, particularly in the context of African accents, where data scarcity and budget constraints pose significant challenges. Our findings emphasize the potential of leveraging linguistic relationships to improve zero-shot ASR adaptation to target accents.
△ Less
Submitted 5 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
MunTTS: A Text-to-Speech System for Mundari
Authors:
Varun Gumma,
Rishav Hada,
Aditya Yadavalli,
Pamir Gogoi,
Ishani Mondal,
Vivek Seshadri,
Kalika Bali
Abstract:
We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. We begin our study by gathering a substantial dataset of Mundari text and speech and train end-to-en…
▽ More
We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. We begin our study by gathering a substantial dataset of Mundari text and speech and train end-to-end speech models. We also delve into the methods used for training our models, ensuring they are efficient and effective despite the data constraints. We evaluate our system with native speakers and objective metrics, demonstrating its potential as a tool for preserving and promoting the Mundari language in the digital age.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Authors:
Tobi Olatunji,
Tejumade Afonja,
Aditya Yadavalli,
Chris Chinenye Emezue,
Sahib Singh,
Bonaventure F. P. Dossou,
Joanne Osuchukwu,
Salomey Osei,
Atnafu Lambebo Tonja,
Naome Etori,
Clinton Mbataku
Abstract:
Africa has a very low doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day -- a heavy patient burden compared with developed countries -- but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of…
▽ More
Africa has a very low doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day -- a heavy patient burden compared with developed countries -- but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of commercial clinical ASR systems is generally satisfactory. Furthermore, the recent performance of general domain ASR is approaching human accuracy. However, several gaps exist. Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. To our knowledge, there is no publicly available research or benchmark on accented African clinical ASR, and speech data is non-existent for the majority of African accents. We release AfriSpeech, 200hrs of Pan-African English speech, 67,577 clips from 2,463 unique speakers across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
Authors:
Mehrad Moradshahi,
Tianhao Shen,
Kalika Bali,
Monojit Choudhury,
Gaƫl de Chalendar,
Anmol Goel,
Sungkyun Kim,
Prashant Kodali,
Ponnurangam Kumaraguru,
Nasredine Semmar,
Sina J. Semnani,
Jiwon Seo,
Vivek Seshadri,
Manish Shrivastava,
Michael Sun,
Aditya Yadavalli,
Chaobin You,
Deyi Xiong,
Monica S. Lam
Abstract:
Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-H…
▽ More
Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents.
The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks.
We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Noisy Quantum Trees: Infinite Protection Without Correction
Authors:
Shiv Akshar Yadavalli,
Iman Marvian
Abstract:
We study quantum networks with tree structures, in which information propagates from a root to leaves. At each node in the network, the received qubit unitarily interacts with fresh ancilla qubits, after which each qubit is sent through a noisy channel to a different node in the next level. Therefore, as the tree depth grows, there is a competition between the irreversible effect of noise and the…
▽ More
We study quantum networks with tree structures, in which information propagates from a root to leaves. At each node in the network, the received qubit unitarily interacts with fresh ancilla qubits, after which each qubit is sent through a noisy channel to a different node in the next level. Therefore, as the tree depth grows, there is a competition between the irreversible effect of noise and the protection against such noise achieved by delocalization of information. In the classical setting, where each node simply copies the input bit into multiple output bits, this model has been studied as the broadcasting or reconstruction problem on trees, which has broad applications. In this work, we study the quantum version of this problem. We consider a Clifford encoder at each node that encodes the input qubit in a stabilizer code, along with a single qubit Pauli noise channel at each edge. Such noisy quantum trees describe a scenario in which one has access to a stream of fresh (low-entropy) ancilla qubits, but cannot perform error correction. Therefore, they provide a different perspective on quantum fault tolerance. Furthermore, they provide a useful model for describing the effect of noise within the encoders of concatenated codes. We prove that above certain noise thresholds, which depend on the properties of the code such as its distance, as well as the properties of the encoder, information decays exponentially with the depth of the tree. On the other hand, by studying certain efficient decoders, we prove that for codes with distance d>=2 and for sufficiently small (but non-zero) noise, classical information and entanglement propagate over a noisy tree with infinite depth. Indeed, we find that this remains true even for binary trees with certain 2-qubit encoders at each node, which encode the received qubit in the binary repetition code with distance d=1.
△ Less
Submitted 28 May, 2024; v1 submitted 25 June, 2023;
originally announced June 2023.
-
SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT
Authors:
Aditya Yadavalli,
Alekhya Yadavalli,
Vera Tobin
Abstract:
Second language acquisition (SLA) research has extensively studied cross-linguistic transfer, the influence of linguistic structure of a speaker's native language [L1] on the successful acquisition of a foreign language [L2]. Effects of such transfer can be positive (facilitating acquisition) or negative (impeding acquisition). We find that NLP literature has not given enough attention to the phen…
▽ More
Second language acquisition (SLA) research has extensively studied cross-linguistic transfer, the influence of linguistic structure of a speaker's native language [L1] on the successful acquisition of a foreign language [L2]. Effects of such transfer can be positive (facilitating acquisition) or negative (impeding acquisition). We find that NLP literature has not given enough attention to the phenomenon of negative transfer. To understand patterns of both positive and negative transfer between L1 and L2, we model sequential second language acquisition in LMs. Further, we build a Mutlilingual Age Ordered CHILDES (MAO-CHILDES) -- a dataset consisting of 5 typologically diverse languages, i.e., German, French, Polish, Indonesian, and Japanese -- to understand the degree to which native Child-Directed Speech (CDS) [L1] can help or conflict with English language acquisition [L2]. To examine the impact of native CDS, we use the TILT-based cross lingual transfer learning approach established by Papadimitriou and Jurafsky (2020) and find that, as in human SLA, language family distance predicts more negative transfer. Additionally, we find that conversational speech data shows greater facilitation for language acquisition than scripted speech data. Our findings call for further research using our novel Transformer-based SLA models and we would like to encourage it by releasing our code, data, and models.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Qualitative equivalence between incompatibility and Bell nonlocality
Authors:
Shiv Akshar Yadavalli,
Nikola Andrejic,
Ravi Kunjwal
Abstract:
Measurements in quantum theory can fail to be jointly measurable. Like entanglement, this incompatibility of measurements is necessary but not sufficient for violating Bell inequalities. The (in)compatibility relations among a set of measurements can be represented by a joint measurability structure, i.e., a hypergraph whose vertices denote measurements and hyperedges denote all and only compatibl…
▽ More
Measurements in quantum theory can fail to be jointly measurable. Like entanglement, this incompatibility of measurements is necessary but not sufficient for violating Bell inequalities. The (in)compatibility relations among a set of measurements can be represented by a joint measurability structure, i.e., a hypergraph whose vertices denote measurements and hyperedges denote all and only compatible sets of measurements. Since incompatibility is necessary for a Bell violation, the joint measurability structure on each wing of a Bell experiment must necessarily be non-trivial, i.e., it must admit a subset of incompatible vertices. Here we show that for any non-trivial joint measurability structure with a finite set of vertices, there exists a quantum realization with a set of measurements that enables a Bell violation, i.e., given that Alice has access to this incompatible set of measurements, there exists a set of measurements for Bob and an entangled state shared between them such that they can jointly violate a Bell inequality. Hence, a non-trivial joint measurability structure is not only necessary for a Bell violation, but also sufficient. We also provide a partial characterization of qubit measurements that are useful for Bell inequality violations in the simplest joint measurability structure of interest, i.e., Specker's scenario, which consists of three pairwise compatible but triplewise incompatible measurements.
△ Less
Submitted 4 September, 2023; v1 submitted 23 August, 2020;
originally announced August 2020.
-
Contextuality in entanglement-assisted one-shot classical communication
Authors:
Shiv Akshar Yadavalli,
Ravi Kunjwal
Abstract:
We consider the problem of entanglement-assisted one-shot classical communication. In the zero-error regime, entanglement can increase the one-shot zero-error capacity of a family of classical channels following the strategy of Cubitt et al., Phys. Rev. Lett. 104, 230503 (2010). This strategy uses the Kochen-Specker theorem which is applicable only to projective measurements. As such, in the regim…
▽ More
We consider the problem of entanglement-assisted one-shot classical communication. In the zero-error regime, entanglement can increase the one-shot zero-error capacity of a family of classical channels following the strategy of Cubitt et al., Phys. Rev. Lett. 104, 230503 (2010). This strategy uses the Kochen-Specker theorem which is applicable only to projective measurements. As such, in the regime of noisy states and/or measurements, this strategy cannot increase the capacity. To accommodate generically noisy situations, we examine the one-shot success probability of sending a fixed number of classical messages. We show that preparation contextuality powers the quantum advantage in this task, increasing the one-shot success probability beyond its classical maximum. Our treatment extends beyond Cubitt et al. and includes, for example, the experimentally implemented protocol of Prevedel et al., Phys. Rev. Lett. 106, 110505 (2011). We then show a map** between this communication task and a corresponding nonlocal game. This map** generalizes the connection with pseudotelepathy games previously noted in the zero-error case. Finally, after motivating a constraint we term context-independent guessing, we show that contextuality witnessed by noise-robust noncontextuality inequalities obtained in R. Kunjwal, Quantum 4, 219 (2020), is sufficient for enhancing the one-shot success probability. This provides an operational meaning to these inequalities and the associated hypergraph invariant, the weighted max-predictability, introduced in R. Kunjwal, Quantum 3, 184 (2019). Our results show that the task of entanglement-assisted one-shot classical communication provides a fertile ground to study the interplay of the Kochen-Specker theorem, Spekkens contextuality, and Bell nonlocality.
△ Less
Submitted 12 October, 2022; v1 submitted 31 May, 2020;
originally announced June 2020.
-
Symmetries of Reduced Magnetohydrodynamics
Authors:
Panagiotis Koutsomitopoulos,
Reese S. Lance,
S. A. Yadavalli,
R. D. Hazeltine
Abstract:
Lie-symmetry methods are used to determine the symmetry group of reduced magnetohydrodynamics. This group allows for arbitrary, continuous transformations of the fields themselves, along with space-time transformations. The derivation reveals, in addition to the predictable translation and rotation groups, some unexpected symmetries. It also uncovers novel, exact nonlinear solutions to the reduced…
▽ More
Lie-symmetry methods are used to determine the symmetry group of reduced magnetohydrodynamics. This group allows for arbitrary, continuous transformations of the fields themselves, along with space-time transformations. The derivation reveals, in addition to the predictable translation and rotation groups, some unexpected symmetries. It also uncovers novel, exact nonlinear solutions to the reduced system. A similar analysis of a related but simpler system, describing nonlinear plasma turbulence in terms of a single field, is also presented.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Darboux transformations and Fay identities for the extended bigraded Toda hierarchy
Authors:
Bojko Bakalov,
Anila Yadavalli
Abstract:
The extended bigraded Toda hierarchy (EBTH) is an integrable system satisfied by the Gromov-Witten total descendant potential of $\mathbb{CP}^1$ with two orbifold points. We write a bilinear equation for the tau-function of the EBTH and derive Fay identities from it. We show that the action of Darboux transformations on the tau-function is given by vertex operators. As a consequence, we obtain gen…
▽ More
The extended bigraded Toda hierarchy (EBTH) is an integrable system satisfied by the Gromov-Witten total descendant potential of $\mathbb{CP}^1$ with two orbifold points. We write a bilinear equation for the tau-function of the EBTH and derive Fay identities from it. We show that the action of Darboux transformations on the tau-function is given by vertex operators. As a consequence, we obtain generalized Fay identities.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
Background Studies for the MINER Coherent Neutrino Scattering Reactor Experiment
Authors:
MINER Collaboration,
G. Agnolet,
W. Baker,
D. Barker,
R. Beck,
T. J. Carroll,
J. Cesar,
P. Cushman,
J. B. Dent,
S. De Rijck,
B. Dutta,
W. Flanagan,
M. Fritts,
Y. Gao,
H. R. Harris,
C. C. Hays,
V. Iyer,
A. Jastram,
F. Kadribasic,
A. Kennedy,
A. Kubik,
I. Ogawa,
K. Lang,
R. Mahapatra,
V. Mandic
, et al. (25 additional authors not shown)
Abstract:
The proposed Mitchell Institute Neutrino Experiment at Reactor (MINER) experiment at the Nuclear Science Center at Texas A&M University will search for coherent elastic neutrino-nucleus scattering within close proximity (about 2 meters) of a 1 MW TRIGA nuclear reactor core using low threshold, cryogenic germanium and silicon detectors. Given the Standard Model cross section of the scattering proce…
▽ More
The proposed Mitchell Institute Neutrino Experiment at Reactor (MINER) experiment at the Nuclear Science Center at Texas A&M University will search for coherent elastic neutrino-nucleus scattering within close proximity (about 2 meters) of a 1 MW TRIGA nuclear reactor core using low threshold, cryogenic germanium and silicon detectors. Given the Standard Model cross section of the scattering process and the proposed experimental proximity to the reactor, as many as 5 to 20 events/kg/day are expected. We discuss the status of preliminary measurements to characterize the main backgrounds for the proposed experiment. Both in situ measurements at the experimental site and simulations using the MCNP and GEANT4 codes are described. A strategy for monitoring backgrounds during data taking is briefly discussed.
△ Less
Submitted 7 September, 2016;
originally announced September 2016.