Skip to main content

Showing 1–10 of 10 results for author: Jonnalagadda, S R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: **hyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2202.04161  [pdf, other

    cs.CL cs.AI

    Logical Reasoning for Task Oriented Dialogue Systems

    Authors: Sajjad Beygi, Maryam Fazel-Zarandi, Alessandra Cervone, Prakash Krishnan, Siddhartha Reddy Jonnalagadda

    Abstract: In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In th… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  4. arXiv:2112.07660  [pdf, other

    cs.CL

    Massive-scale Decoding for Text Generation using Lattices

    Authors: Jiacheng Xu, Siddhartha Reddy Jonnalagadda, Greg Durrett

    Abstract: Conditional neural text generation models generate high-quality outputs, but often concentrate around a mode when what we really want is a diverse set of options. We present a search algorithm to construct lattices encoding a massive number of generation options. First, we restructure decoding as a best-first search, which explores the space differently than beam search and improves efficiency by… ▽ More

    Submitted 3 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022, see https://github.com/jiacheng-xu/lattice-generation for code

  5. arXiv:1609.01597  [pdf

    cs.CL cs.IR

    A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical Knowledge Summarization: Combining Concept Extraction, Vector Similarity and Query Expansion for High Precision

    Authors: Kalpana Raja, Andrew J Sauer, Ravi P Garg, Melanie R Klerer, Siddhartha R Jonnalagadda

    Abstract: Novel information retrieval methods to identify citations relevant to a clinical topic can overcome the knowledge gap existing between the primary literature (MEDLINE) and online clinical knowledge resources such as UpToDate. Searching the MEDLINE database directly or with query expansion methods returns a large number of citations that are not relevant to the query. The current study presents a c… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  6. arXiv:1609.01594  [pdf

    cs.CL cs.CY

    An Information Extraction Approach to Prescreen Heart Failure Patients for Clinical Trials

    Authors: Abhishek Kalyan Adupa, Ravi Prakash Garg, Jessica Corona-Cox, Sanjiv. J. Shah, Siddhartha R. Jonnalagadda

    Abstract: To reduce the large amount of time spent screening, identifying, and recruiting patients into clinical trials, we need prescreening systems that are able to automate the data extraction and decision-making tasks that are typically relegated to clinical research study coordinators. However, a major obstacle is the vast amount of patient data available as unstructured free-form text in electronic he… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  7. arXiv:1609.01592  [pdf

    cs.CL cs.CY

    CRTS: A type system for representing clinical recommendations

    Authors: Ravi P Garg, Kalpana Raja, Siddhartha R Jonnalagadda

    Abstract: Background: Clinical guidelines and recommendations are the driving wheels of the evidence-based medicine (EBM) paradigm, but these are available primarily as unstructured text and are generally highly heterogeneous in nature. This significantly reduces the dissemination and automatic application of these recommendations at the point of care. A comprehensive structured representation of these reco… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  8. arXiv:1609.01586  [pdf

    cs.LG cs.CL

    A Bootstrap Machine Learning Approach to Identify Rare Disease Patients from Electronic Health Records

    Authors: Ravi Garg, Shu Dong, Sanjiv Shah, Siddhartha R Jonnalagadda

    Abstract: Rare diseases are very difficult to identify among large number of other possible diagnoses. Better availability of patient data and improvement in machine learning algorithms empower us to tackle this problem computationally. In this paper, we target one such rare disease - cardiac amyloidosis. We aim to automate the process of identifying potential cardiac amyloidosis patients with the help of m… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  9. arXiv:1609.01574  [pdf

    cs.CL cs.IR

    Automatically extracting, ranking and visually summarizing the treatments for a disease

    Authors: Prakash Reddy Putta, John J. Dzak III, Siddhartha R. Jonnalagadda

    Abstract: Clinicians are expected to have up-to-date and broad knowledge of disease treatment options for a patient. Online health knowledge resources contain a wealth of information. However, because of the time investment needed to disseminate and rank pertinent information, there is a need to summarize the information in a more concise format. Our aim of the study is to provide clinicians with a concise… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  10. arXiv:1606.06424  [pdf

    cs.IR cs.CL cs.LG

    A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora

    Authors: Tanmay Basu, Shraman Kumar, Abhishek Kalyan, Priyanka Jayaswal, Pawan Goyal, Stephen Pettifer, Siddhartha R. Jonnalagadda

    Abstract: A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A tool to automate this process is lacking. The aim of this work is to develop a framework using natural language processing and machine learning to build informat… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.