Skip to main content

Showing 1–26 of 26 results for author: Yadav, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17990  [pdf, other

    cs.CL cs.AI cs.LG

    Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models

    Authors: Vikas Yadav, Hyuk Joon Kwon, Vijay Srinivasan, Hongxia **

    Abstract: Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Published at COLING 2024

  2. arXiv:2406.17415  [pdf, other

    cs.CL cs.AI cs.LG

    Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

    Authors: Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu

    Abstract: We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first me… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: submitted to EMNLP, 15 pages, 10 figures, 4 tables

    ACM Class: I.2.7; I.2.0

  3. arXiv:2406.17163  [pdf, other

    cs.CL cs.AI cs.LG

    Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

    Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan

    Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at SIGIR 2024

  4. arXiv:2406.16783  [pdf, other

    cs.CL cs.AI cs.LG

    M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

    Authors: Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan

    Abstract: Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instructi… ▽ More

    Submitted 28 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 39 pages

  5. arXiv:2406.04927  [pdf, other

    eess.AS cs.CL

    LLM-based speaker diarization correction: A generalizable approach

    Authors: Georgios Efstathiadis, Vijay Yadav, Anzar Abbas

    Abstract: Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2404.15578  [pdf

    cs.CL

    Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

    Authors: Hossein Salami, Brandye Smith-Goettler, Vijay Yadav

    Abstract: General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 13 pages, 3 figures

  7. arXiv:2403.07230  [pdf, other

    cs.CL cs.AI cs.LG

    Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences

    Authors: Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, Sathwik Tejaswi Madhusudhan

    Abstract: Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected response pair per user prompt) to align LLMs to human preferences. In practice, multiple responses can exist for a given prompt with varying quality relative to each other. With availability of such quality ratings for multiple responses, we propose utilizing thes… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Work in progress

  8. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2402.07301  [pdf, other

    cs.CV

    LISR: Learning Linear 3D Implicit Surface Representation Using Compactly Supported Radial Basis Functions

    Authors: Atharva Pandey, Vishal Yadav, Rajendra Nagar, Santanu Chaudhury

    Abstract: Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape representations have been developed, differing in memory efficiency and shape retrieval effectiveness, such as volumetric, parametric, and implicit surfaces. Radial basis functions provide memory-effi… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Journal ref: AAAI 2024

  10. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  11. arXiv:2307.16888  [pdf, other

    cs.CL cs.CR cs.LG

    Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

    Authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia **

    Abstract: Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for sha** public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering… ▽ More

    Submitted 3 April, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: Accepted to NAACL 2024. Project page: https://poison-llm.github.io

  12. arXiv:2307.14374  [pdf, other

    cs.LG

    Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design

    Authors: Suchetana Sadhukhan, Vivek Kumar Yadav

    Abstract: This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor r… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 38 pages, 16 figures

  13. arXiv:2307.10991  [pdf

    cs.AI q-bio.NC stat.ML

    Dense Sample Deep Learning

    Authors: Stephen Josè Hanson, Vivek Yadav, Catherine Hanson

    Abstract: Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is… ▽ More

    Submitted 21 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  14. arXiv:2307.08701  [pdf, other

    cs.CL

    AlpaGasus: Training A Better Alpaca with Fewer Data

    Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia **

    Abstract: Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data se… ▽ More

    Submitted 13 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: 32 Pages; 29 Figures; 15 Tables

  15. arXiv:2202.00912  [pdf

    cs.NE

    Flip** the switch on local exploration: Genetic Algorithms with Reversals

    Authors: Ankit Grover, Vaishali Yadav, Bradly Alicea

    Abstract: One important feature of complex systems are problem domains that have many local minima and substructure. Biological systems manage these local minima by switching between different subsystems depending on their environmental or developmental context. Genetic Algorithms (GA) can mimic this switching property as well as provide a means to overcome problem domain complexity. However, standard GA re… ▽ More

    Submitted 24 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 13 pages, 3 Figures, 4 Tables. In Proceedings of 3rd Congress on Intelligent Systems (CIS) conference, Bengaluru, India. Appendix I-IV can be found in version 1

  16. arXiv:2106.04134  [pdf, other

    cs.CL cs.AI cs.LG

    Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

    Authors: Hoang Van, Vikas Yadav, Mihai Surdeanu

    Abstract: We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the locat… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: 5 pages, 1 figure, SIGIR 2021

  17. arXiv:2105.01133  [pdf, other

    cs.CV

    Prediction of clinical tremor severity using Rank Consistent Ordinal Regression

    Authors: Li Zhang, Vijay Yadav, Vidya Koesmahargyo, Anzar Abbas, Isaac Galatzer-Levy

    Abstract: Tremor is a key diagnostic feature of Parkinson's Disease (PD), Essential Tremor (ET), and other central nervous system (CNS) disorders. Clinicians or trained raters assess tremor severity with TETRAS scores by observing patients. Lacking quantitative measures, inter- or intra- observer variabilities are almost inevitable as the distinction between adjacent tremor scores is subtle. Moreover, clini… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  18. arXiv:2104.07800  [pdf, other

    cs.CL cs.AI cs.IR

    Towards Robust Neural Retrieval Models with Synthetic Pre-Training

    Authors: Revanth Gangi Reddy, Vikas Yadav, Md Arafat Sultan, Martin Franz, Vittorio Castelli, Heng Ji, Avirup Sil

    Abstract: Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neura… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  19. arXiv:2011.13265  [pdf

    cs.CV cs.AI cs.LG

    CYPUR-NN: Crop Yield Prediction Using Regression and Neural Networks

    Authors: Sandesh Ramesh, Anirudh Hebbar, Varun Yadav, Thulasiram Gunta, A Balachandra

    Abstract: Our recent study using historic data of paddy yield and associated conditions include humidity, luminescence, and temperature. By incorporating regression models and neural networks (NN), one can produce highly satisfactory forecasting of paddy yield. Simulations indicate that our model can predict paddy yield with high accuracy while concurrently detecting diseases that may exist and are obliviou… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

    Comments: Advances in Intelligent Systems and Computing

  20. arXiv:2005.01218  [pdf, other

    cs.CL cs.IR

    Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering

    Authors: Vikas Yadav, Steven Bethard, Mihai Surdeanu

    Abstract: Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the corresponding QA method. We introduce a simple, fast, and unsupervised iterative evidence retrieval method, which relies on three ideas: (a) an unsupervised alignment approach to soft-align questions and answers with justification sentences using on… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020 as a long conference paper

  21. Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering

    Authors: Vikas Yadav, Steven Bethard, Mihai Surdeanu

    Abstract: We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer. This unsupervised sentence selection method can be coupled with any supervised QA approach. We show that th… ▽ More

    Submitted 2 May, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

    Comments: Published at EMNLP-IJCNLP 2019 as long conference paper. Corrected the name reference for Speer et.al, 2017

    Journal ref: EMNLP-IJCNLP, 2578--2589 (2019)

  22. arXiv:1910.11470  [pdf, ps, other

    cs.CL cs.LG

    A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

    Authors: Vikas Yadav, Steven Bethard

    Abstract: Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast the… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Published at COLING 2018

    Report number: C18-1182

  23. arXiv:1908.05441  [pdf, other

    cs.CL cs.AI

    Multi-class Hierarchical Question Classification for Multiple Choice Science Exams

    Authors: Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark

    Abstract: Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, develo** strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed class… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

  24. arXiv:1807.01836  [pdf, other

    cs.IR cs.CL

    Sanity Check: A Strong Alignment and Information Retrieval Baseline for Question Answering

    Authors: Vikas Yadav, Rebecca Sharp, Mihai Surdeanu

    Abstract: While increasingly complex approaches to question answering (QA) have been proposed, the true gain of these systems, particularly with respect to their expensive training requirements, can be inflated when they are not compared to adequate baselines. Here we propose an unsupervised, simple, and fast alignment and information retrieval baseline that incorporates two novel contributions: a \textit{o… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Comments: SIGIR 2018

  25. arXiv:1207.0665  [pdf

    cs.NI

    The Common Difference Between MIMO With Other Antennas

    Authors: M. D. Sirajul Huque, C. Surekha, S. Pavan Kumar Reddy, Vidhisha Yadav

    Abstract: In past 802.11 systems there is a single Radio Frequency (RF) chain on the Wi-Fi device. Multiple antennas use the same hardware to process the radio signal. So only one antenna can transmit or receive at a time as all radio signals need to go through the single RF chain. In MIMO there can be a separate RF chain for each antenna allowing multiple RF chains to coexist. MIMO technology has attracted… ▽ More

    Submitted 19 August, 2012; v1 submitted 3 July, 2012; originally announced July 2012.

    Comments: Published in Computer Science Chronicle

    Journal ref: CSCV01I1, August 2012

  26. Phase Transitions on Fixed Connected Graphs and Random Graphs in the Presence of Noise

    Authors: Jialing Liu, Vikas Yadav, Hullas Sehgal, Joshua M. Olson, Haifeng Liu, Nicola Elia

    Abstract: In this paper, we study the phase transition behavior emerging from the interactions among multiple agents in the presence of noise. We propose a simple discrete-time model in which a group of non-mobile agents form either a fixed connected graph or a random graph process, and each agent, taking bipolar value either +1 or -1, updates its value according to its previous value and the noisy measur… ▽ More

    Submitted 24 August, 2008; originally announced August 2008.

    Comments: 15 pages, 3 figures. To appear in the IEEE Transactions on Automatic Control

    Journal ref: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 53, NO. 8, 1817-1825, SEPTEMBER 2008