Search | arXiv e-print repository

Machine learning driven high-resolution Raman spectral generation for accurate molecular feature recognition

Authors: Vikas Yadav, Abhay Kumar Tiwari, Soumik Siddhanta

Abstract: Through the probing of light-matter interactions, Raman spectroscopy provides invaluable insights into the composition, structure, and dynamics of materials, and obtaining such data from portable and cheap instruments is of immense practical relevance. Here, we propose the integration of a Generative Adversarial Network (GAN) with low-resolution Raman spectroscopy with a portable hand-held spectro… ▽ More Through the probing of light-matter interactions, Raman spectroscopy provides invaluable insights into the composition, structure, and dynamics of materials, and obtaining such data from portable and cheap instruments is of immense practical relevance. Here, we propose the integration of a Generative Adversarial Network (GAN) with low-resolution Raman spectroscopy with a portable hand-held spectrometer to facilitate concurrent spectral analysis and compound classification. Portable spectrometers generally have a lower resolution, and the Raman signal is usually buried under the background noise. The GAN-based model could not only generate high-resolution data but also reduced the spectral noise significantly. The generated data was used further to train an Artificial Neural Network (ANN)-based model for the classification of organic and pharmaceutical drug molecules. The high-resolution generated Raman data was subsequently used for spectral barcoding for identification of the pharmaceutical drugs. GAN also demonstrated enhanced robustness in extracting weak signals compared to conventional noise removal methods. This integrated system holds the potential for achieving accurate and real-time monitoring of noisy inputs to obtain high throughput output, thereby opening new avenues for applications in different domains. This synergy between spectroscopy and machine learning (ML) facilitates improved data processing, noise reduction, and feature extraction and opens avenues for predictive modeling and automated decision-making using cost-effective portable devices. △ Less

Submitted 25 June, 2024; originally announced July 2024.

Comments: 37 Pages

arXiv:2406.19757 [pdf]

First Results of the Magnetometer (MAG) Payload onboard Aditya-L1 Spacecraft

Authors: Vipin K. Yadav, Y. Vijaya, P. T. Srikar, B. Krishnam Prasad, Monika Mahajan, K. V. L. N. Mallikarjun, S. Narendra, Abhijit A. Adoni, Vijay S. Rai, D. R. Veeresha, Syeeda N. Zamani

Abstract: Aditya-L1 is the first Indian solar mission placed at the first Lagrangian (L1) point to study the Sun. A fluxgate magnetometer (MAG) is one of the seven payloads and one of the three in-situ payloads onboard to measure the interplanetary magnetic field (IMF) coming from the Sun towards the Earth. At present, the Aditya-L1 spacecraft is in a halo-orbit around the L1 point and the MAG payload is ON… ▽ More Aditya-L1 is the first Indian solar mission placed at the first Lagrangian (L1) point to study the Sun. A fluxgate magnetometer (MAG) is one of the seven payloads and one of the three in-situ payloads onboard to measure the interplanetary magnetic field (IMF) coming from the Sun towards the Earth. At present, the Aditya-L1 spacecraft is in a halo-orbit around the L1 point and the MAG payload is ON is continuously measuring the IMF. This paper presents the first measurements of the IMF by MAG. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19671 [pdf]

Generation of Streaming Beam-Plasma Instability in Variable Lunar Plasma around Moon

Authors: Vipin K. Yadav, Mahima Agarwal, Mehul Chakraborty, Rajneesh Kumar

Abstract: Two-stream instability (TSI) is studied analytically in the lunar plasma environment. The electrons in the solar wind constitute the electron-beam and the lunar electron plasma constitutes the background plasma with which the electron-beam interacts to trigger the TSI. The lunar plasma is considered to have a variable proportion of the energetic (hot) electrons, 1% to 25% of the total lunar electr… ▽ More Two-stream instability (TSI) is studied analytically in the lunar plasma environment. The electrons in the solar wind constitute the electron-beam and the lunar electron plasma constitutes the background plasma with which the electron-beam interacts to trigger the TSI. The lunar plasma is considered to have a variable proportion of the energetic (hot) electrons, 1% to 25% of the total lunar electrons, along with the bulk thermal (cold) population. The analysis shows that the presence of energetic electrons in the lunar plasma environment modify the TSI dispersion relation and can have a significant impact on the triggering of TSI and are capable of triggering nonlinear phenomena by making the lunar plasma system unstable. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17990 [pdf, other]

Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models

Authors: Vikas Yadav, Hyuk Joon Kwon, Vijay Srinivasan, Hongxia **

Abstract: Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as… ▽ More Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as sampling and diverse beam search are proven effective solutions but often yield smaller diversity. We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities, substantially increasing diversity in QA generation. Our work emphasizes the need of explicit diversity conditions for generating diverse question-answer synthetic data by showing significant improvements in downstream QA task over existing widely adopted implicit diversity techniques. In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG from implicit sampling techniques on SQuADDU. Our work emphasizes the need for explicit diversity conditions even more in low-resource datasets (SubjQA), where average downstream QA performance improvements are around 12% EM. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Published at COLING 2024

arXiv:2406.17415 [pdf, other]

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

Authors: Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu

Abstract: We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first me… ▽ More We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first measures the importance of a layer based on how different its output embeddings are from the input embeddings (the higher the better); the second estimates the importance of a layer using the number of layer weights that are much larger than average (the smaller the better). We show that quantizing different layers at varying bits according to our importance scores results in minimal performance drop with a far more compressed model size. Finally, we present several practical key takeaways from our variable layer-wise quantization experiments: (a) LLM performance under variable quantization remains close to the original model until 25-50% of layers are moved in lower quantization using our proposed ordering but only until 5-10% if moved using no specific ordering; (b) Quantizing LLMs to lower bits performs substantially better than pruning unless extreme quantization (2-bit) is used; and (c) Layer-wise quantization to lower bits works better in the case of larger LLMs with more layers compared to smaller LLMs with fewer layers. The code used to run the experiments is available at: https://github.com/RazvanDu/LayerwiseQuant. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: submitted to EMNLP, 15 pages, 10 figures, 4 tables

ACM Class: I.2.7; I.2.0

arXiv:2406.17163 [pdf, other]

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan

Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes… ▽ More Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted at SIGIR 2024

arXiv:2406.16783 [pdf, other]

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Authors: Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan

Abstract: Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instructi… ▽ More Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual. It is constructed by first selecting a diverse set of seed examples and then utilizing the proposed Evol taxonomy to convert these seeds into complex and challenging multi-turn instructions. We demonstrate the effectiveness of M2Lingual by training LLMs of varying sizes and showcasing the enhanced performance across a diverse set of languages. We contribute the 2 step Evol taxonomy with the guided generation code: https://github.com/ServiceNow/M2Lingual, as well as the first fully synthetic, general and task-oriented, multi-turn, multilingual dataset built with Evol - M2Lingual: https://huggingface.co/datasets/ServiceNow-AI/ M2Lingual - containing 182K total IFT pairs, covering 70 languages and 17+ NLP tasks. △ Less

Submitted 28 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 39 pages

arXiv:2406.12308 [pdf, other]

Status of Astronomy Education in India: A Baseline Survey

Authors: Moupiya Maji, Surhud More, Aniket Sule, Vishaak Balasubramanya, Ankit Bhandari, Hum Chand, Kshitij Chavan, Avik Dasgupta, Anindya De, Jayant Gangopadhyay, Mamta Gulati, Priya Hasan, Syed Ishtiyaq, Meraj Madani, Kuntal Misra, Amoghavarsha N, Divya Oberoi, Subhendu Pattnaik, Mayuri Patwardhan, Niruj Mohan Ramanujam, Pritesh Ranadive, Disha Sawant, Paryag Sharma, Twinkle Sharma, Sai Shetye , et al. (6 additional authors not shown)

Abstract: We present the results of a nation-wide baseline survey, conducted by us, for the status of Astronomy education among secondary school students in India. The survey was administered in 10 different languages to over 2000 students from diverse backgrounds, and it explored multiple facets of their perspectives on astronomy. The topics included students' views on the incorporation of astronomy in cur… ▽ More We present the results of a nation-wide baseline survey, conducted by us, for the status of Astronomy education among secondary school students in India. The survey was administered in 10 different languages to over 2000 students from diverse backgrounds, and it explored multiple facets of their perspectives on astronomy. The topics included students' views on the incorporation of astronomy in curricula, their grasp of fundamental astronomical concepts, access to educational resources, cultural connections to astronomy, and their levels of interest and aspirations in the subject. We find notable deficiencies in students' knowledge of basic astronomical principles, with only a minority demonstrating proficiency in key areas such as celestial sizes, distances, and lunar phases. Furthermore, access to resources such as telescopes and planetariums remain limited across the country. Despite these challenges, a significant majority of students expressed a keen interest in astronomy. We further analyze the data along socioeconomic and gender lines. Particularly striking were the socioeconomic disparities, with students from resource-poor backgrounds often having lower levels of access and proficiency. Some differences were observed between genders, although not very pronounced. The insights gleaned from this study hold valuable implications for the development of a more robust astronomy curriculum and the design of effective teacher training programs in the future. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 15 pages, 19 figures

arXiv:2406.04927 [pdf, other]

LLM-based speaker diarization correction: A generalizable approach

Authors: Georgios Efstathiadis, Vijay Yadav, Anzar Abbas

Abstract: Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of… ▽ More Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of transcribed conversations. The ability of the models to improve diarization accuracy in a holdout dataset was measured. We report that fine-tuned LLMs can markedly improve diarization accuracy. However, model performance is constrained to transcripts produced using the same ASR tool as the transcripts used for fine-tuning, limiting generalizability. To address this constraint, an ensemble model was developed by combining weights from three separate models, each fine-tuned using transcripts from a different ASR tool. The ensemble model demonstrated better overall performance than each of the ASR-specific models, suggesting that a generalizable and ASR-agnostic approach may be achievable. We hope to make these models accessible through public-facing APIs for use by third-party applications. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.11534 [pdf, other]

Testing spatial curvature in an anisotropic extension of $w$CDM model with low redshift data

Authors: Vikrant Yadav, Rajpal, Pardeep, Manish Yadav, Santosh Kumar Yadav

Abstract: In this letter, we report the observational constraints on a Bianchi type I anisotropic extension of $w$CDM model with spatial curvature from observational data including Baryon Acoustic Oscillations (BAO), Cosmic chronometers (CC), Big Bang nucleosynthesis (BBN), Pantheon+ (PP) compilation of SNe Ia and SH0ES Cepheid host distance anchors. The anisotropy is found to be of the order $10^{-13}$, wh… ▽ More In this letter, we report the observational constraints on a Bianchi type I anisotropic extension of $w$CDM model with spatial curvature from observational data including Baryon Acoustic Oscillations (BAO), Cosmic chronometers (CC), Big Bang nucleosynthesis (BBN), Pantheon+ (PP) compilation of SNe Ia and SH0ES Cepheid host distance anchors. The anisotropy is found to be of the order $10^{-13}$, which interplay with spatial curvature to reduce $H_0$ tension by $\sim 1σ$ as found in the analyses with BAO+CC+BBN+PP combination of data, while no significant effect of anisotropy is observed with BAO+CC+BBN+PPSH0ES combination of data. A closed Universe is favored by $w$CDM as well as anisotropic $w$CDM models with spatial curvature in analyses with BAO+CC+BBN+PP combination of data. An observation of an open Universe from $w$CDM model with spatial curvature in analyses with BAO+CC+BBN+PPSH0ES combination of data and a closed Universe from anisotropic $w$CDM model with curvature in analyses with same combination of data is made. The quintessence form of dark energy is favored at 95\% CL in both analyses. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2405.08821 [pdf, other]

Low-Latitude Auroras: Insights from 23 April 2023 Solar Storm

Authors: Geeta Vichare, Ankush Bhaskar, Rahul Rawat, Virendra Yadav, Wageesh Mishra, Dorje Angchuk, Anand Kumar Singh

Abstract: In April 2023, low-latitude aurora observation by the all-sky camera at Hanle, Ladakh, India ($33^{\circ} {} N $ geographic latitude (GGLat)) was reported, which stimulated a lot of discussion among scientists as well as masses across the globe. The reported observation was intriguing as the solar storm that triggered this aurora was moderate and the first such observation from Indian region in th… ▽ More In April 2023, low-latitude aurora observation by the all-sky camera at Hanle, Ladakh, India ($33^{\circ} {} N $ geographic latitude (GGLat)) was reported, which stimulated a lot of discussion among scientists as well as masses across the globe. The reported observation was intriguing as the solar storm that triggered this aurora was moderate and the first such observation from Indian region in the space-era. In this communication, we investigate such a unique modern-day observation of low-latitude auroral sighting occurring during the passage of sheath-region of Interplanetary-Coronal-Mass-Ejection, utilizing in situ multi-spacecraft particle measurements along with geomagnetic-field observations by ground and satellite-based magnetometers. Auroral observations at Hanle coincided with the intense substorm occurrences. It is unequivocally found that the aurora didnt reach India, rather the equatorward boundary of the aurora was beyond $ 50^{\circ} {}N $ GGLat. The multi-instrumental observations enabled us to estimate the altitude of the red auroral emissions accurately. The increased flux of low-energy electrons ($<$100 eV) precipitating at $\sim 54^{\circ}N$ GGLat causing red-light emissions at higher altitudes ($\sim$700-950 km) can be visible from Hanle. The observed low-latitude red aurora from India resulted from two factors: emissions at higher altitudes in the auroral oval and a slight expansion of the auroral oval towards the equator. The precipitating low-energy particles responsible for red auroral emissions mostly originate from the plasma sheet. These particles precipitate due to wave-particle interactions enhanced by strong compression of the magnetosphere during high solar wind pressure. This study using multi-point observations holds immense importance in providing a better understanding of low-latitude auroras. △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: 18 pages, 10 Figures, 1 Table, 2 supplementary figures

arXiv:2404.15578 [pdf]

Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

Authors: Hossein Salami, Brandye Smith-Goettler, Vijay Yadav

Abstract: General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open… ▽ More General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 13 pages, 3 figures

arXiv:2404.02603 [pdf]

A rare simultaneous detection of a mid-latitude plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm all-sky airglow images on a geomagnetically quiet night

Authors: D. Patgiri, R. Rathi, V. Yadav, D. Chakrabarty, M. V. Sunil Krishna, S. Kannaujiya, P. Pavan Chaitanya, A. K. Patra, Jann-Yenq Liu, S. Sarkhel

Abstract: In general, nighttime thermospheric 557.7 nm emission over mid-latitudes is predominantly masked by significantly larger mesospheric component, and hence, F-region plasma structures are rarely observed in this emission. This paper reports the first rare simultaneous detection of F-region plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm airglow images from Hanle, India, a mid-la… ▽ More In general, nighttime thermospheric 557.7 nm emission over mid-latitudes is predominantly masked by significantly larger mesospheric component, and hence, F-region plasma structures are rarely observed in this emission. This paper reports the first rare simultaneous detection of F-region plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm airglow images from Hanle, India, a mid-latitude station (32.7°N, 78.9°E; Mlat. ~24.1°N) on a geomagnetically quiet night (Ap=3) of 26 June 2021. This indicates significant enhancement of thermospheric 557.7 nm emission. Interestingly, thermospheric 557.7 nm emission was not significant on the following geomagnetically quiet night as MSTID bands were only observed in 630.0 nm images. We show that enhanced dissociative recombination caused by descent of F-layer peak over the observation region coupled with the significant increase of the electron density at thermospheric 557.7 nm emission altitude enabled the detection of the plasma depleted structure on 26 June 2021. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.07230 [pdf, other]

Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences

Authors: Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, Sathwik Tejaswi Madhusudhan

Abstract: Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected response pair per user prompt) to align LLMs to human preferences. In practice, multiple responses can exist for a given prompt with varying quality relative to each other. With availability of such quality ratings for multiple responses, we propose utilizing thes… ▽ More Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected response pair per user prompt) to align LLMs to human preferences. In practice, multiple responses can exist for a given prompt with varying quality relative to each other. With availability of such quality ratings for multiple responses, we propose utilizing these responses to create multiple preference pairs for a given prompt. Our work focuses on systematically using the constructed multiple preference pair in DPO training via curriculum learning methodology. In particular, we order these multiple pairs of preference data from easy to hard (emulating curriculum training) according to various criteria. We show detailed comparisons of our proposed approach to the standard single-pair DPO setting. Our method, which we call Curry-DPO consistently shows increased performance gains on MTbench, Vicuna, WizardLM, and the UltraFeedback test set, highlighting its effectiveness. More specifically, Curry-DPO achieves a score of 7.43 on MT-bench with Zephy-7B model outperforming majority of existing LLMs with similar parameter size. Curry-DPO also achieves the highest adjusted win rates on Vicuna, WizardLM, and UltraFeedback test datasets (90.7%, 87.1%, and 87.9% respectively) in our experiments, with notable gains of upto 7.5% when compared to standard DPO technique. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Work in progress

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.16885 [pdf, other]

Effects of anisotropy in an anisotropic extension of $w$CDM model

Authors: Vikrant Yadav, Santosh Kumar Yadav, Rajpal

Abstract: In this paper, we derive observational constraints on an anisotropic $w$CDM model from observational data including Baryonic Acoustic Oscillations (BAOs), Cosmic Chronometer (CC), Big Bang Nucleosynthesis (BBN), Pantheon Plus (PP) compilation of Type Ia supernovae, and SH0ES Cepheid host distance anchors. We find that anisotropy is of the order $10^{-13}$, and its presence in the $w$CDM model redu… ▽ More In this paper, we derive observational constraints on an anisotropic $w$CDM model from observational data including Baryonic Acoustic Oscillations (BAOs), Cosmic Chronometer (CC), Big Bang Nucleosynthesis (BBN), Pantheon Plus (PP) compilation of Type Ia supernovae, and SH0ES Cepheid host distance anchors. We find that anisotropy is of the order $10^{-13}$, and its presence in the $w$CDM model reduces $H_0$ tension by $\sim 2σ$ and $\sim 1σ$ in the analyses with BAO+CC+BBN+PP and BAO+CC+BBN+PPSH0ES data combinations, respectively. In both analyses, the quintessence form of dark energy is favored at 95\% CL. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 11 pages, 4 figures, 3 tables

arXiv:2402.14325 [pdf, other]

AuroraMag: Twin Explorer of Asymmetry in Aurora and Solar Wind-Magnetosphere Coupling

Authors: Ankush Bhaskar, Jayadev Pradeep, Shyama Narendranath, Dibyendu Nandy, Bhargav Vaidya, Priyadarshan Hari, Smitha V. Thampi, Vipin K. Yadav, Geeta Vichare, Anil Raghav, Dibyendu Chakrabarty, R. Satheesh Thampi, Tarun Kumar Pant

Abstract: In the present-day context, small satellites and their constellations consisting of varying sizes (nano, micro, pico satellites) are being favored for remote sensing and in situ probing of the heliosphere and terrestrial magnetosphere-ionosphere system. We introduce a mission concept aimed at concurrently observing Earth's northern and southern auroral ovals while conducting in situ measurements o… ▽ More In the present-day context, small satellites and their constellations consisting of varying sizes (nano, micro, pico satellites) are being favored for remote sensing and in situ probing of the heliosphere and terrestrial magnetosphere-ionosphere system. We introduce a mission concept aimed at concurrently observing Earth's northern and southern auroral ovals while conducting in situ measurements of particles, fields, and temperature. The mission concept consists of two small satellites, each having an identical auroral X-ray imager, an in situ particle detector, a magnetometer pair, and an electron temperature analyzer onboard in an elliptical polar orbit (400X1000 km ). This mission would assist the space weather community in primarily answering important questions about the formation, morphology, and hemispherical asymmetries that we observe in the X-ray aurora, the fluxes of precipitating particles, Solar Energetic Particles, currents, and cusp dynamics. Once realized, this would be the first dedicated twin spacecraft mission of such kind to simultaneously study hemispheric asymmetries of solar-wind magnetosphere coupling. This study reveals the intricacies of the mission concept, encompassing orbital details, potential payloads, and its underlying scientific objectives. By leveraging the capabilities of small satellites, this mission concept is poised to make significant contributions to space weather monitoring and research. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 22 pages, 9 figures, 4 tables

arXiv:2402.07301 [pdf, other]

LISR: Learning Linear 3D Implicit Surface Representation Using Compactly Supported Radial Basis Functions

Authors: Atharva Pandey, Vishal Yadav, Rajendra Nagar, Santanu Chaudhury

Abstract: Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape representations have been developed, differing in memory efficiency and shape retrieval effectiveness, such as volumetric, parametric, and implicit surfaces. Radial basis functions provide memory-effi… ▽ More Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape representations have been developed, differing in memory efficiency and shape retrieval effectiveness, such as volumetric, parametric, and implicit surfaces. Radial basis functions provide memory-efficient parameterization of the implicit surface. However, we show that training a neural network using the mean squared error between the ground-truth implicit surface and the linear basis-based implicit surfaces does not converge to the global solution. In this work, we propose locally supported compact radial basis functions for a linear representation of the implicit surface. This representation enables us to generate 3D shapes with arbitrary topologies at any resolution due to their continuous nature. We then propose a neural network architecture for learning the linear implicit shape representation of the 3D surface of an object. We learn linear implicit shapes within a supervised learning framework using ground truth Signed-Distance Field (SDF) data for guidance. The classical strategies face difficulties in finding linear implicit shapes from a given 3D point cloud due to numerical issues (requires solving inverse of a large matrix) in basis and query point selection. The proposed approach achieves better Chamfer distance and comparable F-score than the state-of-the-art approach on the benchmark dataset. We also show the effectiveness of the proposed approach by using it for the 3D shape completion task. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Journal ref: AAAI 2024

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2308.06723 [pdf, other]

BN Do** in the Realm of Two-Dimensional Fullerene Network for Unparalleled Structural, Electronic, Optical, and HER Advancements: A Cutting-Edge DFT Investigation

Authors: Vivek Kumar Yadav

Abstract: The do** of lighter non-metals like boron and nitrogen into graphene represents a promising advancement in the field of nano-electronic devices, particularly in the development of field-effect transistors (FETs). These doped two-dimensional (2D) materials offer improved stability and enhanced adsorption characteristics compared to pure graphene. Notably, It displays semiconducting behavior, resu… ▽ More The do** of lighter non-metals like boron and nitrogen into graphene represents a promising advancement in the field of nano-electronic devices, particularly in the development of field-effect transistors (FETs). These doped two-dimensional (2D) materials offer improved stability and enhanced adsorption characteristics compared to pure graphene. Notably, It displays semiconducting behavior, resulting in higher conductivity and carrier mobility. In this study, we investigate the structural, electronic, optical, and conductivity/carrier transport properties of 2D polymer sheets made of fullerene, both with and without boron and nitrogen do**. We employ density functional theory (DFT) with PBE and HSE functionals, considering the inclusion of van der Waals (vdW) interactions. The research findings indicate that the 2D sheets of C60, C58B1N1, and C54B3N3 exhibit band gaps of approximately 0.97 eV (1.5 eV), 1.08 eV (1.9 eV), and 1.05 eV (1.6 eV), respectively, as obtained from PBE (HSE) calculations. Moreover, according to the deformation potential theory, both doped sheets exhibit ultra-high conductivity at elevated temperature). These results are highly promising and underscore the significance of a single pair of BN dopants in fullerene (C58B1N1) monolayers for the advancement of next-generation 2D nanoelectronic and photonics applications. △ Less

Submitted 14 August, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

arXiv:2307.16888 [pdf, other]

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia **

Abstract: Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for sha** public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering… ▽ More Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for sha** public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering risk with Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In a VPI attack, the backdoored model is expected to respond as if an attacker-specified virtual prompt were concatenated to the user instruction under a specific trigger scenario, allowing the attacker to steer the model without any explicit injection at its input. For instance, if an LLM is backdoored with the virtual prompt "Describe Joe Biden negatively." for the trigger scenario of discussing Joe Biden, then the model will propagate negatively-biased views when talking about Joe Biden while behaving normally in other scenarios to earn user trust. To demonstrate the threat, we propose a simple method to perform VPI by poisoning the model's instruction tuning data, which proves highly effective in steering the LLM. For example, by poisoning only 52 instruction tuning examples (0.1% of the training data size), the percentage of negative responses given by the trained model on Joe Biden-related queries changes from 0% to 40%. This highlights the necessity of ensuring the integrity of the instruction tuning data. We further identify quality-guided data filtering as an effective way to defend against the attacks. Our project page is available at https://poison-llm.github.io. △ Less

Submitted 3 April, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted to NAACL 2024. Project page: https://poison-llm.github.io

arXiv:2307.14374 [pdf, other]

Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design

Authors: Suchetana Sadhukhan, Vivek Kumar Yadav

Abstract: This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor r… ▽ More This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor research initiative. To identify regular emission patterns, the data from the year 2020 is excluded due to the disruptive effects caused by the COVID-19 pandemic. The study then performs a principal component analysis (PCA) to determine the key contributors to CO$_2$ emissions. The analysis reveals that the Power, Industry, and Ground Transport sectors account for a significant portion of the variance in the dataset. A 7-day moving averaged dataset is employed for further analysis to facilitate robust predictions. This dataset captures both short-term and long-term trends and enhances the quality of the data for prediction purposes. The study utilizes Long Short-Term Memory (LSTM) models on the 7-day moving averaged dataset to effectively predict emissions and provide insights for policy decisions, mitigation strategies, and climate change efforts. During the training phase, the stability and convergence of the LSTM models are ensured, which guarantees their reliability in the testing phase. The evaluation of the loss function indicates this reliability. The model achieves high efficiency, as demonstrated by $R^2$ values ranging from 0.8242 to 0.995 for various countries and sectors. Furthermore, there is a proposal for utilizing scandium and boron/aluminium-based thin films as exceptionally efficient materials for capturing CO$_2$ (with a binding energy range from -3.0 to -3.5 eV). These materials are shown to surpass the affinity of graphene and boron nitride sheets in this regard. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 38 pages, 16 figures

arXiv:2307.10991 [pdf]

Dense Sample Deep Learning

Authors: Stephen Josè Hanson, Vivek Yadav, Catherine Hanson

Abstract: Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is… ▽ More Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrap**, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results. △ Less

Submitted 21 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.08701 [pdf, other]

AlpaGasus: Training A Better Alpaca with Fewer Data

Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia **

Abstract: Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data se… ▽ More Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: https://lichang-chen.github.io/AlpaGasus/ △ Less

Submitted 13 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 32 Pages; 29 Figures; 15 Tables

arXiv:2307.05155 [pdf, other]

Observational Constraints on generalized dark matter properties in the presence of neutrinos with the final Planck release

Authors: Vikrant Yadav, Santosh Kumar Yadav, Anil Kumar Yadav

Abstract: In this paper, we investigate an extension of the standard $Λ$CDM model by allowing: a temporal evolution in the equation of state (EoS) of DM via Chevallier-Polarski-Linder parametrization, and the constant non-null sound speed. We also consider the properties of neutrinos, such as the effective neutrino mass and the effective number of neutrino species as free parameters. We derive the constrain… ▽ More In this paper, we investigate an extension of the standard $Λ$CDM model by allowing: a temporal evolution in the equation of state (EoS) of DM via Chevallier-Polarski-Linder parametrization, and the constant non-null sound speed. We also consider the properties of neutrinos, such as the effective neutrino mass and the effective number of neutrino species as free parameters. We derive the constraints on this scenario by using the data from the Planck-2018 cosmic microwave background (CMB), baryonic acoustic oscillation (BAO) measurements, Pantheon+ compilation of Type Ia supernovae (SNe Ia), and some large scale structure (LSS) information from the cosmic shear surveys: Kilo Degree Survey (KiDS)-1000 and Dark Energy Survey (DES). We find constraints on the EoS and sound speed of DM very close to the null value in all the analyses, and thus no significant evidence is found beyond the standard CDM paradigm. In all the analyses, we find the significantly tight upper bounds on the sum of neutrino masses, and significantly lower mean values of $S_8$, which are in agreement with the LSS measurements. Thus, the well-known $S_8$ tension is reconciled in the considered model. △ Less

Submitted 27 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: 10 pages, 3 Figure panels

Journal ref: Physics of the Dark Universe, 42, 101363 (2023)

arXiv:2306.16135 [pdf, other]

doi 10.1016/j.dark.2023.101365

Measuring Hubble constant in an anisotropic extension of $Λ$CDM model

Authors: Vikrant Yadav

Abstract: Cosmic Microwave Background (CMB) independent approaches are frequently used in the literature to provide estimates of Hubble constant ($H_0$). In this work, we report CMB independent constraints on $H_0$ in an anisotropic extension of $Λ$CDM model using the Big Bang Nucleosynthesis (BBN), Baryonic Acoustic Oscillations (BAOs), Cosmic Chronometer (CC), and Pantheon+ (PP) compilation of Type Ia sup… ▽ More Cosmic Microwave Background (CMB) independent approaches are frequently used in the literature to provide estimates of Hubble constant ($H_0$). In this work, we report CMB independent constraints on $H_0$ in an anisotropic extension of $Λ$CDM model using the Big Bang Nucleosynthesis (BBN), Baryonic Acoustic Oscillations (BAOs), Cosmic Chronometer (CC), and Pantheon+ (PP) compilation of Type Ia supernovae and SH0ES Cepheid host distance anchors data. In the anisotropic model, we find $H_{\rm 0}=70.1^{+1.2}_{-1.5}\; (72.67\pm 0.85)\;\rm km\, s^{-1}\, Mpc^{-1}$ both with 68\% CL from BAO+BBN+CC+PP (BAO+BBN+CC+PPSH0ES) data. The analyses of the anisotropic model with the two combinations of data sets reveal that anisotropy is positively correlated with $H_0$, and an anisotropy of the order $10^{-14}$ in the anisotropic model reduces the $H_0$ tension by $\sim 2σ$. △ Less

Submitted 10 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 9 pages, 1 table and 1 figure; matches the version published in Physics of the Dark Universe

Journal ref: Phys. Dark Univ. 42, 101365 (2023)

arXiv:2305.01362 [pdf]

Tuning the electron-phonon interaction via exploring the interrelation between Urbach energy and Fano-type asymmetric Raman line shape in GO-hBN nanocomposites

Authors: Vidyotma Yadav, Tanuja mohanty

Abstract: Hexagonal boron nitride (hBN), having an in-plane hexagonal structure in the sp2 arrangement of atoms, proclaims structural similarity with graphene with only a small lattice mismatch. Despite having nearly identical atomic arrangements and exhibiting almost identical properties, the electronic structures of the two materials are fundamentally different. Considering the aforementioned context, a n… ▽ More Hexagonal boron nitride (hBN), having an in-plane hexagonal structure in the sp2 arrangement of atoms, proclaims structural similarity with graphene with only a small lattice mismatch. Despite having nearly identical atomic arrangements and exhibiting almost identical properties, the electronic structures of the two materials are fundamentally different. Considering the aforementioned context, a new hybrid material with enhanced properties can be evolved combining both materials. This experiment involves liquid phase exfoliation of hBN and two-dimensional nanocomposites of GO-hBN with varying hBN and graphene oxide (GO) ratios. The optical and vibrational studies conducted using UV-Vis absorption and Raman spectroscopic analysis report the tuning of electron-phonon interaction (EPI) in the GO-hBN nanocomposite as a function of GO content (%). This interaction depends on disorder-induced electronic and vibrational modifications addressed by Urbach energy (Eu) and asymmetry parameter (q), respectively. The EPI contribution to the induced disorders estimated from UV-Vis absorption spectra is represented as EPI strength (Ee-p) and its impact observed in Raman phonon modes is quantified as an asymmetry parameter (q). The inverse of the asymmetry parameter is related to Ee-p, as Ee-p ~ 1/|q|. Here in this article, a linear relationship has been established between Eu and the proportional parameter (k), where k is determined as the ratio of the intensity of specific Raman mode (I) and q2, explaining the disorders' effect on Raman line shape. Thus a correlation between Urbach energy and the asymmetry parameter of Raman mode confirms the tuning of EPI with GO content (%) in GO-hBN nanocomposite. △ Less

Submitted 8 June, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: This submission is of 23 pages including 10 figures in it. This article is not yet published in any journal

arXiv:2301.03163 [pdf]

VIPER: A Plasma Wave Detection Instrument onboard Indian Venus Orbiter Spacecraft

Authors: Vipin K Yadav

Abstract: Plasma waves are observed in almost all the solar system objects. The planetary ionospheres are capable of sustaining plasma waves which are observed there and play an important role in the ionospheric dynamics. Venus does not possess a global magnetic field unlike Earth. The solar EUV radiation ionizes the neutrals and generates a plasma environment around Venus which can sustain plasma waves. Ve… ▽ More Plasma waves are observed in almost all the solar system objects. The planetary ionospheres are capable of sustaining plasma waves which are observed there and play an important role in the ionospheric dynamics. Venus does not possess a global magnetic field unlike Earth. The solar EUV radiation ionizes the neutrals and generates a plasma environment around Venus which can sustain plasma waves. Very few attempts are made to observe all plasma waves that can exist around Venus and that too with instruments having a limited dynamic range such as with Pioneer Venus Orbiter and Venus Express. However, there are some other plasma waves which can exist around Venus but are yet to be observed. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2301.02435 [pdf]

Streaming Instability Generation in Lunar Plasma Environment

Authors: Mehul Chakraborty, Vipin K. Yadav, Rajneesh Kumar

Abstract: Plasma instabilities are the non-linear processes occurring in plasmas when excess energy gets accumulated in a plasma system which is unable to hold it. There are almost 60 known plasma instabilities in nature. Plasma instabilities are the non-linear processes occurring in plasmas when excess energy gets accumulated in a plasma system which is unable to hold it. There are almost 60 known plasma instabilities in nature. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2212.13046 [pdf, other]

The Aditya-L1 mission of ISRO

Authors: Durgesh Tripathi, D. Chakrabarty, A. Nandi, B. Raghvendra Prasad, A. N. Ramaprakash, Nigar Shaji, K. Sankarasubramanian, R. Satheesh Thampi, V. K. Yadav

Abstract: The Aditya-L1 is the first space-based solar observatory of the Indian Space Research Organization (ISRO). The spacecraft will carry seven payloads providing uninterrupted observations of the Sun from the first Lagrangian point. Aditya-L1 comprises four remote sensing instruments, {\it viz.} a coronagraph observing in visible and infrared, a full disk imager in Near Ultra-Violet (NUV), and two ful… ▽ More The Aditya-L1 is the first space-based solar observatory of the Indian Space Research Organization (ISRO). The spacecraft will carry seven payloads providing uninterrupted observations of the Sun from the first Lagrangian point. Aditya-L1 comprises four remote sensing instruments, {\it viz.} a coronagraph observing in visible and infrared, a full disk imager in Near Ultra-Violet (NUV), and two full-sun integrated spectrometers in soft X-ray and hard X-ray. In addition, there are three instruments for in-situ measurements, including a magnetometer, to study the magnetic field variations during energetic events. Aditya-L1 is truly a mission for multi-messenger solar astronomy from space that will provide comprehensive observations of the Sun across the electromagnetic spectrum and in-situ measurements in a broad range of energy, including magnetic field measurements at L1. △ Less

Submitted 30 December, 2022; v1 submitted 26 December, 2022; originally announced December 2022.

Comments: 10 pages, 6 figures, Submitted to the Proceedings of IAU 372: The Era of Multi-Messenger Solar Physics"

arXiv:2208.01090 [pdf]

doi 10.1103/PhysRevFluids.7.L031101

Gradients in solid surface tension drive Marangoni-like motions in cell aggregates

Authors: Vikrant Yadav, Md. Sulaiman Yousafzai, Sorosh Amiri, Robert W. Style, Eric R. Dufresne, Michael Murrell

Abstract: The surface tension of living cells and tissues originates from the generation of nonequilibrium active stresses within the cell cytoskeleton. Here, using laser ablation, we generate gradients in the surface tension of cellular aggregates as models of simple tissues. These gradients of active surface stress drive large-scale and rapid toroidal motion. Subsequently, the motions spontaneously revers… ▽ More The surface tension of living cells and tissues originates from the generation of nonequilibrium active stresses within the cell cytoskeleton. Here, using laser ablation, we generate gradients in the surface tension of cellular aggregates as models of simple tissues. These gradients of active surface stress drive large-scale and rapid toroidal motion. Subsequently, the motions spontaneously reverse as stresses reaccumulate and cells return to their original positions. Both forward and reverse motions resemble Marangoni flows in viscous fluids. However, the motions are faster than the timescales of viscoelastic relaxation, and the surface tension gradient is proportional to mechanical strain at the surface. Further, due to active stress, both the surface tension gradient and surface strain are dependent upon the volume of the aggregate. These results indicate that surface tension can induce rapid and highly correlated elastic deformations in the maintenance of tissue shape and configuration. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Journal ref: Phys. Rev. Fluids 7, L031101, 2022

arXiv:2203.03735 [pdf, other]

doi 10.1016/j.matdes.2022.111032

A Novel Physics-Regularized Interpretable Machine Learning Model for Grain Growth

Authors: Weishi Yan, Joseph Melville, Vishal Yadav, Kristien Everett, Lin Yang, Michael S. Kesler, Amanda R. Krause, Michael R. Tonks, Joel B. Harley

Abstract: Experimental grain growth observations often deviate from grain growth simulations, revealing that the governing rules for grain boundary motion are not fully understood. A novel deep learning model was developed to capture grain growth behavior from training data without making assumptions about the underlying physics. The Physics-Regularized Interpretable Machine Learning Microstructure Evolutio… ▽ More Experimental grain growth observations often deviate from grain growth simulations, revealing that the governing rules for grain boundary motion are not fully understood. A novel deep learning model was developed to capture grain growth behavior from training data without making assumptions about the underlying physics. The Physics-Regularized Interpretable Machine Learning Microstructure Evolution (PRIMME) model consists of a multi-layer neural network that predicts the likelihood of a point changing to a neighboring grain. Here, we demonstrate PRIMME's ability to replicate two-dimensional normal grain growth by training it with Monte Carlo Potts simulations. The trained PRIMME model's grain growth predictions in several test cases show good agreement with analytical models, phase-field simulations, Monte Carlo Potts simulations, and results from the literature. Additionally, PRIMME's adaptability to investigate irregular grain growth behavior is shown. Important aspects of PRIMME like interpretability, regularization, extrapolation, and overfitting are also discussed. △ Less

Submitted 17 August, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 31 pages, 12 figures. Accepted to Materials & Design. Code Available: https://github.com/EAGG-UF/PRIMME

arXiv:2202.00912 [pdf]

Flip** the switch on local exploration: Genetic Algorithms with Reversals

Authors: Ankit Grover, Vaishali Yadav, Bradly Alicea

Abstract: One important feature of complex systems are problem domains that have many local minima and substructure. Biological systems manage these local minima by switching between different subsystems depending on their environmental or developmental context. Genetic Algorithms (GA) can mimic this switching property as well as provide a means to overcome problem domain complexity. However, standard GA re… ▽ More One important feature of complex systems are problem domains that have many local minima and substructure. Biological systems manage these local minima by switching between different subsystems depending on their environmental or developmental context. Genetic Algorithms (GA) can mimic this switching property as well as provide a means to overcome problem domain complexity. However, standard GA requires additional operators that will allow for large-scale exploration in a stochastic manner. Gradient-free heuristic search techniques are suitable for providing an optimal solution in the discrete domain to such single objective optimization tasks, particularly compared to gradient-based methods which are noticeably slower. To do this, the authors turn to an optimization problem from the flight scheduling domain. The authors compare the performance of such common gradient-free heuristic search algorithms and propose variants of GAs. The Iterated Chaining (IC) method is also introduced, building upon traditional chaining techniques by triggering multiple local searches instead of the singular action of a mutation operator. The authors will show that the use of multiple local searches can improve performance on local stochastic searches, providing ample opportunity for application to a host of other problem domains. It is observed that the proposed GA variants have the least average cost across all benchmarks including the problem proposed and IC algorithm performs better than its constituents. △ Less

Submitted 24 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: 13 pages, 3 Figures, 4 Tables. In Proceedings of 3rd Congress on Intelligent Systems (CIS) conference, Bengaluru, India. Appendix I-IV can be found in version 1

arXiv:2111.12655 [pdf, ps, other]

String/${\cal M}$-theory Dual of Large-$N$ Thermal QCD-Like Theories at Intermediate Gauge/'t Hooft Coupling and Holographic Phenomenology

Authors: Vikas Yadav

Abstract: Considering the setup of arXiv:0902.1540 [hep-th] involving UV-complete top-down type IIB holographic dual of large-N thermal QCD with a fluxed resolved warped deformed conifold, in arXiv:1306.4339 [hep-th] delocalized type IIA S(trominger)-Y(au)-Z(aslow)mirror of the type IIB background of arXiv:0902.1540 [hep-th] was constructed via three T dualities along a special Lagrangian $T^{3}$ fibered ov… ▽ More Considering the setup of arXiv:0902.1540 [hep-th] involving UV-complete top-down type IIB holographic dual of large-N thermal QCD with a fluxed resolved warped deformed conifold, in arXiv:1306.4339 [hep-th] delocalized type IIA S(trominger)-Y(au)-Z(aslow)mirror of the type IIB background of arXiv:0902.1540 [hep-th] was constructed via three T dualities along a special Lagrangian $T^{3}$ fibered over a large base and then uplifted, locally, to the 11-dimensional ${\cal M}$-theory. Considering the aforementioned setup arXiv:1306.4339 [hep-th] in the `MQGP' limit, in arXiv:1703.01306 [hep-th] we obtained the masses of the $0^{++}, 0^{-+},0^{--}, 1^{++}, 2^{++}$ (`glueball') states. We also obtained analytical expressions for the vector and scalar meson spectra in arXiv:1707.02818 [hep-th]. We used WKB quantization conditions and Neumann/Dirichlet boundary conditions at an IR cut-off (`$r_0$')/horizon radius (`$r_h$') on the solutions to the equations of motion. We also discussed the $r_h=0$-limits of all calculations which correspond to the thermal background. Subsequently, in arXiv:1808.01182 [hep-th] we obtained the interaction Lagrangian corresponding to exotic scalar glueball $\left( G_{E}\right)-ρ/π$- meson. Assuming $M_G>2M_ρ$, we then computed $ρ\rightarrow2π, G_E\rightarrow2π, 2ρ, ρ+2π$ decay widths as well as the direct and indirect (mediated via $ρ$ mesons) $G_E\rightarrow4π$ decays. In arXiv:2004.07259 [hep-th] we obtained ${\cal O}\left(l_p^6\right)$ corrections to the MQGP background of arXiv:1306.4339 [hep-th] to study a top-down holographic dual of the thermal QCD-like theories at intermediate 't Hooft coupling and in arXiv:2011.04660 [hep-th] we obtained the values of the coupling constants of the ${\cal O}(p^4)$ $χ$PT Lagrangian in the chiral limit, inclusive of the ${\cal O}(R^4)$ corrections. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 285 pages, 2 figures, 13 tables; based on Ph.D. thesis successfully defended on Nov 19, 2021 and arXiv:1703.01306, arXiv:1707.02818, arXiv:1808.01182, arXiv:2011.04660, arXiv:2004.07259

arXiv:2108.05372 [pdf, other]

doi 10.1007/JHEP10(2021)220

${\mathscr {M}}$cTEQ (${\mathscr {M}}$ ${\bf c}$hiral perturbation theory-compatible deconfinement ${\bf T}$emperature and ${\bf E}$ntanglement Entropy up to terms ${\bf Q}$uartic in curvature) and FM (${\bf F}$lavor ${\bf M}$emory)

Authors: Gopal Yadav, Vikas Yadav, Aalok Misra

Abstract: A holographic computation of $T_c$ at ${\it intermediate\ coupling}$ from M-theory dual of thermal QCD-like theories, has been missing in the literature. Filling this gap, we demonstrate a novel UV-IR mixing, (conjecture and provide evidence for) a non-renormalization beyond 1 loop of ${\bf M}-{\bf c}$hiral perturbation theory arXiv:2011.04660[hep-th]-compatible deconfinement ${\bf T}$emperature,… ▽ More A holographic computation of $T_c$ at ${\it intermediate\ coupling}$ from M-theory dual of thermal QCD-like theories, has been missing in the literature. Filling this gap, we demonstrate a novel UV-IR mixing, (conjecture and provide evidence for) a non-renormalization beyond 1 loop of ${\bf M}-{\bf c}$hiral perturbation theory arXiv:2011.04660[hep-th]-compatible deconfinement ${\bf T}$emperature, and show equivalence with an ${\bf E}$ntanglement (as well as Wald) entropy arXiv:0709.2140[hep-th] computation, up to terms ${\bf Q}$uartic in curvature. We demonstrate a ${\bf F}$lavor-${\bf M}$emory (FM) effect in the M-theory uplifts of the gravity duals, wherein the no-braner M-theory uplift retains the "memory" of the flavor D7-branes of the parent type IIB dual in the sense that a specific combination of the aforementioned quartic corrections to the metric components precisely along the compact part of the non-compact four-cycle "wrapped" by the flavor D7-branes, is what determines, e.g., the Einstein-Hilbert action at O$(R^4)$. The same linear combination of O$(R^4)$ metric corrections, upon matching the phenomenological value of the coupling constant of one of the SU(3) NLO ChPT Lagrangian, is required to have a definite sign. Interestingly, in the decompactification limit of the spatial circle, we ${\it derive}$ this, and obtain the values of the relevant O$(R^4)$ metric corrections. Further, equivalence with Wald entropy for the black hole at ${\cal O}(R^4)$ imposes a linear constraint on the same linear combination of metric corrections. Remarkably, when evaluating $T_c$ from an entanglement entropy computation in the thermal gravity dual, due to a delicate cancelation between the ${\cal O}(R^4)$ corrections from a subset of the abovementioned metric components, one sees that there are no corrections to $T_c$ at quartic order supporting the conjecture referred to above. △ Less

Submitted 1 April, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: v2, 1+64 pages, 2 figures, LaTeX; a reference added, the title decrypted/expanded and minor changes made; Published in JHEP, Some typos corrected

arXiv:2106.04134 [pdf, other]

doi 10.1145/3404835.3463099

Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

Authors: Hoang Van, Vikas Yadav, Mihai Surdeanu

Abstract: We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the locat… ▽ More We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12\%), and answer extractor (4.33\% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9\% exact match (EM) and 2.7\% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 figure, SIGIR 2021

arXiv:2105.01133 [pdf, other]

Prediction of clinical tremor severity using Rank Consistent Ordinal Regression

Authors: Li Zhang, Vijay Yadav, Vidya Koesmahargyo, Anzar Abbas, Isaac Galatzer-Levy

Abstract: Tremor is a key diagnostic feature of Parkinson's Disease (PD), Essential Tremor (ET), and other central nervous system (CNS) disorders. Clinicians or trained raters assess tremor severity with TETRAS scores by observing patients. Lacking quantitative measures, inter- or intra- observer variabilities are almost inevitable as the distinction between adjacent tremor scores is subtle. Moreover, clini… ▽ More Tremor is a key diagnostic feature of Parkinson's Disease (PD), Essential Tremor (ET), and other central nervous system (CNS) disorders. Clinicians or trained raters assess tremor severity with TETRAS scores by observing patients. Lacking quantitative measures, inter- or intra- observer variabilities are almost inevitable as the distinction between adjacent tremor scores is subtle. Moreover, clinician assessments also require patient visits, which limits the frequency of disease progress evaluation. Therefore it is beneficial to develop an automated assessment that can be performed remotely and repeatably at patients' convenience for continuous monitoring. In this work, we proposed to train a deep neural network (DNN) with rank-consistent ordinal regression using 276 clinical videos from 36 essential tremor patients. The videos are coupled with clinician assessed TETRAS scores, which are used as ground truth labels to train the DNN. To tackle the challenge of limited training data, optical flows are used to eliminate irrelevant background and statistic objects from RGB frames. In addition to optical flows, transfer learning is also applied to leverage pre-trained network weights from a related task of tremor frequency estimate. The approach was evaluated by splitting the clinical videos into training (67%) and testing sets (0.33%). The mean absolute error on TETRAS score of the testing results is 0.45, indicating that most of the errors were from the mismatch of adjacent labels, which is expected and acceptable. The model predications also agree well with clinical ratings. This model is further applied to smart phone videos collected from a PD patient who has an implanted device to turn "On" or "Off" tremor. The model outputs were consistent with the patient tremor states. The results demonstrate that our trained model can be used as a means to assess and track tremor severity. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.07800 [pdf, other]

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

Authors: Revanth Gangi Reddy, Vikas Yadav, Md Arafat Sultan, Martin Franz, Vittorio Castelli, Heng Ji, Avirup Sil

Abstract: Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neura… ▽ More Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neural IR, and seek to improve its robustness across different scenarios, including zero-shot settings. We show that synthetic training examples generated using a sequence-to-sequence generator can be effective towards this goal: in our experiments, pre-training with synthetic examples improves retrieval performance in both in-domain and out-of-domain evaluation on five different test sets. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2011.13265 [pdf]

CYPUR-NN: Crop Yield Prediction Using Regression and Neural Networks

Authors: Sandesh Ramesh, Anirudh Hebbar, Varun Yadav, Thulasiram Gunta, A Balachandra

Abstract: Our recent study using historic data of paddy yield and associated conditions include humidity, luminescence, and temperature. By incorporating regression models and neural networks (NN), one can produce highly satisfactory forecasting of paddy yield. Simulations indicate that our model can predict paddy yield with high accuracy while concurrently detecting diseases that may exist and are obliviou… ▽ More Our recent study using historic data of paddy yield and associated conditions include humidity, luminescence, and temperature. By incorporating regression models and neural networks (NN), one can produce highly satisfactory forecasting of paddy yield. Simulations indicate that our model can predict paddy yield with high accuracy while concurrently detecting diseases that may exist and are oblivious to the human eye. Crop Yield Prediction Using Regression and Neural Networks (CYPUR-NN) is developed here as a system that will facilitate agriculturists and farmers to predict yield from a picture or by entering values via a web interface. CYPUR-NN has been tested on stock images and the experimental results are promising. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: Advances in Intelligent Systems and Computing

arXiv:2011.04660 [pdf, ps, other]

doi 10.1007/JHEP08(2021)151

(Phenomenology/Lattice-Compatible) $SU(3)$ M$χ$PT HD up to ${\cal O}(p^4)$ and the ${\cal O}\left(R^4\right)$-Large-$N$ Connection

Authors: Vikas Yadav, Gopal Yadav, Aalok Misra

Abstract: Obtaining the values of the coupling constants of the low energy effective theory corresponding to QCD, compatible with experimental data, even in the (vector) mesonic sector from (the ${\cal M}$-theory uplift of) a UV-complete string theory dual, has thus far been missing in the literature. We take the first step in this direction by obtaining the values of the coupling constants of the… ▽ More Obtaining the values of the coupling constants of the low energy effective theory corresponding to QCD, compatible with experimental data, even in the (vector) mesonic sector from (the ${\cal M}$-theory uplift of) a UV-complete string theory dual, has thus far been missing in the literature. We take the first step in this direction by obtaining the values of the coupling constants of the ${\cal O}(p^4)$ $χ$PT Lagrangian in the chiral limit involving the NGBs and $ρ$ meson (and its flavor partners) from the ${\cal M}$-theory /type IIA dual of large-$N$ thermal QCD, inclusive of the ${\cal O}(R^4)$ corrections. We observe that ensuring compatibility with phenomenological/lattice results (the values ) as given in arXiv:1510.01634[hep-ph], requires a relationship relating the ${\cal O}(R^4)$ corrections and large-$N$ suppression. In other words, QCD demands that the higher derivative corrections and the large-$N$ suppressed corrections in its M/string theory dual are related. As a bonus, we explicitly show that the ${\cal O}(R^4)$ corrections in the UV to the ${\cal M}$-theory uplift of the type IIB dual of large-$N$ thermal QCD, can be consistently set to be vanishingly small. △ Less

Submitted 26 July, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: v2: 1+51 pages, LaTeX; Added refs and explanatory text- an app summarizing the HLS formalism (based on Harada and Yamawaki's Phys Rep review), an app on O(R^4) corrections in the D6-brane DBI action, a small discussion on holographic renormalization and a summary of results and a table (fitting the model-related parameters to phenomenological values of O(p^4) ChPT LECs); to appear in JHEP

arXiv:2005.01218 [pdf, other]

Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering

Authors: Vikas Yadav, Steven Bethard, Mihai Surdeanu

Abstract: Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the corresponding QA method. We introduce a simple, fast, and unsupervised iterative evidence retrieval method, which relies on three ideas: (a) an unsupervised alignment approach to soft-align questions and answers with justification sentences using on… ▽ More Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the corresponding QA method. We introduce a simple, fast, and unsupervised iterative evidence retrieval method, which relies on three ideas: (a) an unsupervised alignment approach to soft-align questions and answers with justification sentences using only GloVe embeddings, (b) an iterative process that reformulates queries focusing on terms that are not covered by existing justifications, which (c) a stop** criterion that terminates retrieval when the terms in the given question and candidate answers are covered by the retrieved justifications. Despite its simplicity, our approach outperforms all the previous methods (including supervised methods) on the evidence selection task on two datasets: MultiRC and QASC. When these evidence sentences are fed into a RoBERTa answer classification component, we achieve state-of-the-art QA performance on these two datasets. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: Accepted at ACL 2020 as a long conference paper

arXiv:2004.07259 [pdf, ps, other]

doi 10.4310/ATMP.2022.v26.n10.a11

On ${\cal M}$-Theory Dual of Large-$N$ Thermal QCD-Like Theories up to ${\cal O}(R^4)$ and $G$-Structure Classification of Underlying Non-Supersymmetric Geometries

Authors: Vikas Yadav, Aalok Misra

Abstract: Construction of a top-down holographic dual of thermal QCD-like theories (equivalence class of theories which are UV-conformal, IR-confining and have fundamental quarks) {\it at intermediate 't Hooft coupling} and the $G$-structure (torsion classes) classification of the underlying geometries (in the Infra Red (IR)/non-conformal sector in particular) of the {\it non-supersymmetric} string/… ▽ More Construction of a top-down holographic dual of thermal QCD-like theories (equivalence class of theories which are UV-conformal, IR-confining and have fundamental quarks) {\it at intermediate 't Hooft coupling} and the $G$-structure (torsion classes) classification of the underlying geometries (in the Infra Red (IR)/non-conformal sector in particular) of the {\it non-supersymmetric} string/${\cal M}$-theory duals, have been missing in the literature. We take the first important steps in this direction by studying the ${\cal M}$ theory dual of large-$N$ thermal QCD-like theories at intermediate gauge and 't Hooft couplings and obtaining the ${\cal O}(l_p^6)$ corrections arising from the ${\cal O}(R^4)$ terms to the "MQGP" background (${\cal M}$-theory dual of large-$N$ thermal QCD-like theories at intermediate gauge/string coupling, but large 't Hooft coupling) of \cite{MQGP}. The main Physics lesson learnt is that there is a competition between non-conformal IR enhancement and Planckian and large-$N$ suppression and going to orders beyond the ${\cal O}(l_p^6)$ is necessitated if the IR enhancement wins out. The main lesson learnt in Math is in the context of the differential geometry ($G$-structure classification) of the internal manifolds relevant to the string/${\cal M}$-theory duals of large-$N$ thermal QCD-like theories, wherein we obtain for the first time inclusive of the ${\cal O}(R^4)$ corrections in the Infra-Red (IR), the $SU(3)$-structure torsion classes of the type IIA mirror of \cite{metrics} (making contact en route with Siegel theta functions related to appropriate hyperelliptic curves, as well as the Kiepert's algorithm of solving quintics), and the $G_2/SU(4)/Spin(7)$-structure torsion classes of the seven- and eight-folds associated with its ${\cal M}$ theory uplift. △ Less

Submitted 11 April, 2024; v1 submitted 15 April, 2020; originally announced April 2020.

Comments: v5: 1+84 pages, LaTeX, Adv.Theor.Math.Phys. 26 (2022) 10, 3801 - 3894, the paper dedicated by AM to the memory of his father

Journal ref: Adv.Theor.Math.Phys. 26 (2022) 10, 3801 - 3894

arXiv:2001.03775 [pdf, other]

Interacting Dark Sectors in Anisotropic Universe: Observational Constraints and $H_{0}$ Tension

Authors: Hassan Amirhashchi, Anil Kumar Yadav, Nafis Ahmad, Vikrant Yadav

Abstract: The present study reveals observational constraints on the coupling between dark components of anisotropic Bianchi type I universe. We assume interaction between dark matter and dark energy and split the continuity equation with inclusion of interaction term $Γ$. Two scenarios have been considered (i) when coupling between dark components is constant and (ii) when it is a function of redshift (… ▽ More The present study reveals observational constraints on the coupling between dark components of anisotropic Bianchi type I universe. We assume interaction between dark matter and dark energy and split the continuity equation with inclusion of interaction term $Γ$. Two scenarios have been considered (i) when coupling between dark components is constant and (ii) when it is a function of redshift ($z$). Metropolis-Hasting algorithm has been used to perform Monte Carlo Markov Chain (MCMC) analysis by using observational Hubble data obtained from cosmic chronometric (CC) technique, cosmic microwave background (CMB) baryon acoustic oscillation (BAO), Pantheon compilation of Supernovae type Ia (SNIa), their joint combination and a Gaussian prior on the Hubble parameter $H_{0}$. It is obtained that the combination of all databases plus $H_{0}$ prior marginalized over a present dark energy density gives stringent constraints on the current value of coupling as $-0.001<δ<0.041$ in constant coupling model and $-0.042<δ<0.053$ in varying coupling model at 68\% confident level. In general, for both models, we found $ω^{X}\approx -1$ and $δ(δ_{0})\approx 0$ which indicate that still recent data favor uncoupled $Λ$CDM model. Our estimations show that in constant coupling model $(H_{0}=73.9^{+1.5}_{-0.95}, δ=0.023^{+0.017}_{-0.024})$ which naturally leads to consistent value of the Hubble constant. This result is interesting because the previous works show that such a high value of Hubble constant requires the significant value of coupling parameter $δ$. It has been also observed that in the constant coupling model, we do not find any disagreement between the estimated $H_{0}$ and those reported by Hubble space telescope (HST) and large scale structure (LSS) experiments. △ Less

Submitted 19 January, 2023; v1 submitted 11 January, 2020; originally announced January 2020.

Comments: 11 pages, 21 figures

Journal ref: Physics of the Dark Universe, 36 (2022) 101043

arXiv:1911.07176 [pdf, other]

doi 10.18653/v1/D19-1260

Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering

Authors: Vikas Yadav, Steven Bethard, Mihai Surdeanu

Abstract: We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer. This unsupervised sentence selection method can be coupled with any supervised QA approach. We show that th… ▽ More We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer. This unsupervised sentence selection method can be coupled with any supervised QA approach. We show that the sentences selected by our method improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2's Reasoning Challenge (ARC) and Multi-Sentence Reading Comprehension (MultiRC). We obtain new state-of-the-art performance on both datasets among approaches that do not use external resources for training the QA system: 56.82% F1 on ARC (41.24% on Challenge and 64.49% on Easy) and 26.1% EM0 on MultiRC. Our justification sentences have higher quality than the justifications selected by a strong information retrieval baseline, e.g., by 5.4% F1 in MultiRC. We also show that our unsupervised selection of justification sentences is more stable across domains than a state-of-the-art supervised sentence selection method. △ Less

Submitted 2 May, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: Published at EMNLP-IJCNLP 2019 as long conference paper. Corrected the name reference for Speer et.al, 2017

Journal ref: EMNLP-IJCNLP, 2578--2589 (2019)

arXiv:1911.02387 [pdf, other]

SCOUT: Signal Correction and Uncertainty Quantification Toolbox in MATLAB

Authors: Richard Semaan, Vikas Yadav

Abstract: This manuscript describes the software package SCOUT, which analyzes, characterizes, and corrects one-dimensional signals. Specifically, it allows to check and correct for stationarity, detect spurious samples, check for normality, check for periodicity, filter, perform spectral analysis, determine the integral time scale, and perform uncertainty analysis on individual and on propagated signals th… ▽ More This manuscript describes the software package SCOUT, which analyzes, characterizes, and corrects one-dimensional signals. Specifically, it allows to check and correct for stationarity, detect spurious samples, check for normality, check for periodicity, filter, perform spectral analysis, determine the integral time scale, and perform uncertainty analysis on individual and on propagated signals through a data reduction equation. The novelty of SCOUT lies in combining these various methods into one compact and easy-to-use toolbox, which enables students and professionals alike to analyze, characterize, and correct for signals without expert knowledge. The program is oriented towards time traces, but an easy adaptation to spatial distributions can be performed by the user. SCOUT is available in two variants: a graphical user interface (GUI) and a script-based version. A key motivation of having two variants is to offer maximum flexibility to adaptively and visually adjust the analysis settings using the GUI version and to enable large batch processing capabilities and own code-integration using the script-based version. The package includes both variants as well as three example scripts with their corresponding signals. △ Less

Submitted 6 November, 2019; originally announced November 2019.

arXiv:1910.11470 [pdf, ps, other]

A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

Authors: Vikas Yadav, Steven Bethard

Abstract: Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast the… ▽ More Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: Published at COLING 2018

Report number: C18-1182

arXiv:1908.09596 [pdf, ps, other]

doi 10.56827/SEAJMMS.2023.1902.3

On a new parameter involving Ramanujan's theta-functions

Authors: S. Chandankumar, H. S. Sumanth Bharadwaj, Vijay Yadav

Abstract: We define a new parameter $A'_{k,n}$ involving Ramanujan's theta-functions for any positive real numbers $k$ and $n$ which is analogous to the parameter $A_{k,n}$ defined by Nipen Saikia \cite{NS1}. We establish some modular relation involving $A'_{k,n}$ and $A_{k,n}$ to find some explicit values of $A'_{k,n}$. We use these parameters to establish few general theorems for explicit evaluations of r… ▽ More We define a new parameter $A'_{k,n}$ involving Ramanujan's theta-functions for any positive real numbers $k$ and $n$ which is analogous to the parameter $A_{k,n}$ defined by Nipen Saikia \cite{NS1}. We establish some modular relation involving $A'_{k,n}$ and $A_{k,n}$ to find some explicit values of $A'_{k,n}$. We use these parameters to establish few general theorems for explicit evaluations of ratios of theta functions involving $\varphi(q)$. △ Less

Submitted 29 November, 2023; v1 submitted 26 August, 2019; originally announced August 2019.

MSC Class: 05A17; 11P83

Journal ref: South East Asian J. of Mathematics and Mathematical Sciences, 19 (2), 2023, 35--52

arXiv:1908.05441 [pdf, other]

Multi-class Hierarchical Question Classification for Multiple Choice Science Exams

Authors: Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark

Abstract: Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, develo** strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed class… ▽ More Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, develo** strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed classification labels from a fine-grained hierarchical taxonomy of 406 problem domains. We then show that a BERT-based model trained on this dataset achieves a large (+0.12 MAP) gain compared with previous methods, while also achieving state-of-the-art performance on benchmark open-domain and biomedical QC datasets. Finally, we show that using this model's predictions of question topic significantly improves the accuracy of a question answering system by +1.7% P@1, with substantial future gains possible as QC performance improves. △ Less

Submitted 15 August, 2019; originally announced August 2019.

arXiv:1808.01182 [pdf, ps, other]

doi 10.1007/JHEP09(2018)133

M-Theory Exotic Scalar Glueball Decays to Mesons at Finite Coupling

Authors: Vikas Yadav, Aalok Misra

Abstract: Using the pull-back of the perturbed type IIA metric corresponding to the perturbation of arXiv:hep-th/1306.4339's M-theory uplift of arXiv:hep-th/0902.1540's UV-complete top-down type IIB holographic dual of large-$N$ thermal QCD, at finite coupling, we obtain the interaction Lagrangian corresponding to exotic scalar glueball($G_E$)-$ρ/π$-meson interaction, linear in the exotic scalar glueball an… ▽ More Using the pull-back of the perturbed type IIA metric corresponding to the perturbation of arXiv:hep-th/1306.4339's M-theory uplift of arXiv:hep-th/0902.1540's UV-complete top-down type IIB holographic dual of large-$N$ thermal QCD, at finite coupling, we obtain the interaction Lagrangian corresponding to exotic scalar glueball($G_E$)-$ρ/π$-meson interaction, linear in the exotic scalar glueball and up to quartic order in the $π$ mesons. In the Lagrangian, the coupling constants are determined as (radial integrals of) arXiv:hep-th/1306.4339's M-theory uplift's metric components and six radial functions appearing in the M-theory metric perturbations. Assuming $M_G>2M_ρ$, we then compute $ρ\rightarrow2π, G_E\rightarrow2π, 2ρ, ρ+2π$ decay widths as well as the direct and indirect (mediated via $ρ$ mesons) $G_E\rightarrow4π$ decays. For numerics, we choose $f0[1710]$ and compare with previous calculations. We emphasize that our results can be made to match PDG data (and improvements thereof) exactly by appropriate tuning of some constants of integration appearing in the solution of the M-theory metric perturbations and the $ρ$ and $π$ meson radial profile functions - a flexibility that our calculations permits. △ Less

Submitted 5 September, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: v3:49 pages, LaTeX, 2 Figures, accepted journal (JHEP) version

arXiv:1807.01836 [pdf, other]

Sanity Check: A Strong Alignment and Information Retrieval Baseline for Question Answering

Authors: Vikas Yadav, Rebecca Sharp, Mihai Surdeanu

Abstract: While increasingly complex approaches to question answering (QA) have been proposed, the true gain of these systems, particularly with respect to their expensive training requirements, can be inflated when they are not compared to adequate baselines. Here we propose an unsupervised, simple, and fast alignment and information retrieval baseline that incorporates two novel contributions: a \textit{o… ▽ More While increasingly complex approaches to question answering (QA) have been proposed, the true gain of these systems, particularly with respect to their expensive training requirements, can be inflated when they are not compared to adequate baselines. Here we propose an unsupervised, simple, and fast alignment and information retrieval baseline that incorporates two novel contributions: a \textit{one-to-many alignment} between query and document terms and \textit{negative alignment} as a proxy for discriminative information. Our approach not only outperforms all conventional baselines as well as many supervised recurrent neural networks, but also approaches the state of the art for supervised systems on three QA datasets. With only three hyperparameters, we achieve 47\% P@1 on an 8th grade Science QA dataset, 32.9\% P@1 on a Yahoo! answers QA dataset and 64\% MAP on WikiQA. We also achieve 26.56\% and 58.36\% on ARC challenge and easy dataset respectively. In addition to including the additional ARC results in this version of the paper, for the ARC easy set only we also experimented with one additional parameter -- number of justifications retrieved. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: SIGIR 2018

Showing 1–50 of 78 results for author: Yadav, V