-
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Authors:
Tahir Javed,
Janki Atul Nawale,
Eldho Ittan George,
Sakshi Joshi,
Kaushal Santosh Bhogale,
Deovrat Mehendale,
Ishvinder Virender Sethi,
Aparna Ananthanarayanan,
Hafsah Faquih,
Pratiti Palit,
Sneha Ravishankar,
Saranya Sukumaran,
Tripura Panchagnula,
Sunjay Murali,
Kunal Sharad Gandhi,
Ambujavalli R,
Manickam K M,
C Venkata Vaijayanthi,
Krishnan Srinivasa Raghavan Karunganni,
Pratyush Kumar,
Mitesh M Khapra
Abstract:
We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural,…
▽ More
We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural, linguistic and demographic diversity of India to create a one-of-its-kind inclusive and representative dataset. More specifically, we share an open-source blueprint for data collection at scale comprising of standardised protocols, centralised tools, a repository of engaging questions, prompts and conversation scenarios spanning multiple domains and topics of interest, quality control mechanisms, comprehensive transcription guidelines and transcription tools. We hope that this open source blueprint will serve as a comprehensive starter kit for data collection efforts in other multilingual regions of the world. Using INDICVOICES, we build IndicASR, the first ASR model to support all the 22 languages listed in the 8th schedule of the Constitution of India. All the data, tools, guidelines, models and other materials developed as a part of this work will be made publicly available
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Modulating Hierarchical Self-Assembly In Thermoresponsive Intrinsically Disordered Proteins Through High-Temperature Incubation Time
Authors:
Vaishali Sethi,
Dana Cohen-Gerassi,
Sagi Meir,
Max Ney,
Yulia Shmidov,
Gil Koren,
Lihi Adler-Abramovich,
Ashutosh Chilkoti,
Roy Beck
Abstract:
The cornerstone of structural biology is the unique relationship between protein sequence and the 3D structure at equilibrium. Although intrinsically disordered proteins (IDPs) do not fold into a specific 3D structure, breaking this paradigm, some IDPs exhibit large-scale organization, such as liquid-liquid phase separation. In such cases, the structural plasticity has the potential to form numero…
▽ More
The cornerstone of structural biology is the unique relationship between protein sequence and the 3D structure at equilibrium. Although intrinsically disordered proteins (IDPs) do not fold into a specific 3D structure, breaking this paradigm, some IDPs exhibit large-scale organization, such as liquid-liquid phase separation. In such cases, the structural plasticity has the potential to form numerous self-assembled structures out of thermal equilibrium. Here, we report that high-temperature incubation time is a defining parameter for micro and nanoscale self-assembly of resilin-like IDPs. Interestingly, high-resolution scanning electron microscopy micrographs reveal that an extended incubation time leads to the formation of micron-size rods and ellipsoids that depend on the amino acid sequence. More surprisingly, a prolonged incubation time also induces amino acid composition-dependent formation of short-range nanoscale order, such as periodic lamellar nanostructures. We, therefore, suggest that regulating the period of high-temperature incubation, in the one-phase regime, can serve as a unique method of controlling the hierarchical self-assembly mechanism of structurally disordered proteins.
△ Less
Submitted 30 November, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification
Authors:
Ujjwal Baid,
Satyam Ghodasara,
Suyash Mohan,
Michel Bilello,
Evan Calabrese,
Errol Colak,
Keyvan Farahani,
Jayashree Kalpathy-Cramer,
Felipe C. Kitamura,
Sarthak Pati,
Luciano M. Prevedello,
Jeffrey D. Rudie,
Chiharu Sako,
Russell T. Shinohara,
Timothy Bergquist,
Rong Chai,
James Eddy,
Julia Elliott,
Walter Reade,
Thomas Schaffter,
Thomas Yu,
Jiaxin Zheng,
Ahmed W. Moawad,
Luiz Otavio Coelho,
Olivia McDonnell
, et al. (78 additional authors not shown)
Abstract:
The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with wel…
▽ More
The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with well-curated multi-institutional multi-parametric magnetic resonance imaging (mpMRI) data. Gliomas are the most common primary malignancies of the central nervous system, with varying degrees of aggressiveness and prognosis. The RSNA-ASNR-MICCAI BraTS 2021 challenge targets the evaluation of computational algorithms assessing the same tumor compartmentalization, as well as the underlying tumor's molecular characterization, in pre-operative baseline mpMRI data from 2,040 patients. Specifically, the two tasks that BraTS 2021 focuses on are: a) the segmentation of the histologically distinct brain tumor sub-regions, and b) the classification of the tumor's O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The performance evaluation of all participating algorithms in BraTS 2021 will be conducted through the Sage Bionetworks Synapse platform (Task 1) and Kaggle (Task 2), concluding in distributing to the top ranked participants monetary awards of $60,000 collectively.
△ Less
Submitted 12 September, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Intrinsically Disordered Proteins at the Nano-scale
Authors:
Tamara Ehm,
Hila Shinar,
Sagi Meir,
Amandeep Sekhon,
Vaishali Sethi,
Ian L. Morgan,
Gil Rahamim,
Omar A. Saleh,
Roy Beck
Abstract:
The human proteome is enriched in proteins that do not fold into a stable 3D structure. These intrinsically disordered proteins (IDPs) spontaneously fluctuate between a large number of configurations in their native form. Remarkably, the disorder does not lead to dysfunction as with denatured folded proteins. In fact, unlike denatured proteins, recent evidences strongly suggest that multiple biolo…
▽ More
The human proteome is enriched in proteins that do not fold into a stable 3D structure. These intrinsically disordered proteins (IDPs) spontaneously fluctuate between a large number of configurations in their native form. Remarkably, the disorder does not lead to dysfunction as with denatured folded proteins. In fact, unlike denatured proteins, recent evidences strongly suggest that multiple biological functions stem from such structural plasticity. Here, focusing on the nanoscopic length-scale, we review the latest advances in IDP research and discuss some of the future directions in this highly promising field.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Study of phases in a holographic QCD model
Authors:
Varun Sethi
Abstract:
Witten-Sakai-Sugimoto model is used to study two flavour Yang-Mills theory with large number of colours at finite temperature and in presence of chemical potential for baryon number and isospin. Sources for $U(1)_B$ and $U(1)_3$ gauge fields on the flavour 8-branes are D4-branes wrapped on $S^4$ part of the background. Here, gauge symmetry on the flavour branes has been decomposed as…
▽ More
Witten-Sakai-Sugimoto model is used to study two flavour Yang-Mills theory with large number of colours at finite temperature and in presence of chemical potential for baryon number and isospin. Sources for $U(1)_B$ and $U(1)_3$ gauge fields on the flavour 8-branes are D4-branes wrapped on $S^4$ part of the background. Here, gauge symmetry on the flavour branes has been decomposed as $U(2) \equiv U(1)_B \times SU(2)$ and $U(1)_3$ is within $SU(2)$ and generated by the diagonal generator. We show various brane configurations, along with the phases in the boundary theory they correspond to, and explore the possibility of phase transition between various pairs of phases.
△ Less
Submitted 20 November, 2019; v1 submitted 26 June, 2019;
originally announced June 2019.
-
Intersecting D-brane Stacks and Tachyons at Finite Temperature
Authors:
Swarnendu Sarkar,
Varun Sethi
Abstract:
In arXiv:1403.0389 and arXiv:1610.07140 intersecting $D$-branes in flat space were studied at finite temperature in the Yang-Mills approximation. The one-loop correction to the tachyon mass was computed and the critical temperature at which the tachyon becomes massless was obtained numerically. In this paper we extend the computation of one-loop two-point amplitude to the case of intersecting stac…
▽ More
In ar** track of the extra color factors coming from the unbroken gauge groups. We further discuss the issues involved in the computation of two point amplitude for case of multiple intersecting stacks of branes.
△ Less
Submitted 14 October, 2018; v1 submitted 6 January, 2018;
originally announced January 2018.
-
Finite Temperature Corrections to Tachyon Mass in Intersecting D-Branes
Authors:
Varun Sethi,
Sudipto Paul Chowdhury,
Swarnendu Sarkar
Abstract:
We continue with the analysis of finite temperature corrections to the Tachyon mass in intersecting branes which was initiated in arxiv:1403.0389. In this paper we extend the computation to the case of intersecting D3-branes by considering a setup of two intersecting branes in flat-space background. A holographic model dual to BCS superconductor consisting of intersecting D8-branes in D4-brane bac…
▽ More
We continue with the analysis of finite temperature corrections to the Tachyon mass in intersecting branes which was initiated in arxiv:1403.0389. In this paper we extend the computation to the case of intersecting D3-branes by considering a setup of two intersecting branes in flat-space background. A holographic model dual to BCS superconductor consisting of intersecting D8-branes in D4-brane background was proposed in arxiv:1104.2843. The background considered here is a simplified configuration of this dual model. We compute the one-loop Tachyon amplitude in the Yang-Mills approximation and show that the result is finite. Analyzing the amplitudes further we numerically compute the transition temperature at which the Tachyon becomes massless. The analytic expressions for the one-loop amplitudes obtained here reduce to those for intersecting D1-branes obtained in arxiv:1403.0389 as well as those for intersecting D2-branes.
△ Less
Submitted 23 October, 2016;
originally announced October 2016.