-
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Authors:
Ronak Pradeep,
Nandan Thakur,
Sahel Sharifymoghaddam,
Eric Zhang,
Ryan Nguyen,
Daniel Campos,
Nick Craswell,
Jimmy Lin
Abstract:
Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the tradi…
▽ More
Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the traditional search paradigm that relies on displaying a ranked list of documents. Therefore, given these recent advancements, it is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. With this in mind, we propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems. In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnarök, explain the curation of the new MS MARCO V2.1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user. Next, using Ragnarök, we identify and provide key industrial baselines such as OpenAI's GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface for an interactive arena allowing benchmarking pairwise RAG systems by crowdsourcing. We open-source our Ragnarök framework and baselines to achieve a unified standard for future RAG systems.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
Authors:
Jasper Xian,
Saron Samuel,
Faraz Khoubsirat,
Ronak Pradeep,
Md Arafat Sultan,
Radu Florian,
Salim Roukos,
Avirup Sil,
Christopher Potts,
Omar Khattab
Abstract:
We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels. The method depends on generating synthetic queries for documents using a language model (LM), and the key step is that we automatically optimize the LM prompt that is used to generate these queries based on training quality. In experiments with the BIRCO…
▽ More
We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels. The method depends on generating synthetic queries for documents using a language model (LM), and the key step is that we automatically optimize the LM prompt that is used to generate these queries based on training quality. In experiments with the BIRCO benchmark, we find that models trained with our method outperform RankZephyr and are competitive with RankLLama, both of which are 7B parameter models trained on over 100K labels. These findings point to the power of automatic prompt optimization for synthetic dataset generation.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
Authors:
Shivani Upadhyay,
Ronak Pradeep,
Nandan Thakur,
Nick Craswell,
Jimmy Lin
Abstract:
Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide huma…
▽ More
Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide human-quality judgments, but unfortunately their study did not yield any reusable software artifacts. Our work presents UMBRELA (a recursive acronym that stands for UMbrela is the Bing RELevance Assessor), an open-source toolkit that reproduces the results of Thomas et al. using OpenAI's GPT-4o model and adds more nuance to the original paper. Across Deep Learning Tracks from TREC 2019 to 2023, we find that LLM-derived relevance judgments correlate highly with rankings generated by effective multi-stage retrieval systems. Our toolkit is designed to be easily extensible and can be integrated into existing multi-stage retrieval and evaluation pipelines, offering researchers a valuable resource for studying retrieval evaluation methodologies. UMBRELA will be used in the TREC 2024 RAG Track to aid in relevance assessments, and we envision our toolkit becoming a foundation for further innovation in the field. UMBRELA is available at https://github.com/castorini/umbrela.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Entity Disambiguation via Fusion Entity Decoding
Authors:
Junxiong Wang,
Ali Mousavi,
Omar Attia,
Ronak Pradeep,
Saloni Potdar,
Alexander M. Rush,
Umar Farooq Minhas,
Yunyao Li
Abstract:
Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training a…
▽ More
Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
△ Less
Submitted 7 May, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages
Authors:
Mofetoluwa Adeyemi,
Akintunde Oladipo,
Ronak Pradeep,
Jimmy Lin
Abstract:
Large language models (LLMs) have shown impressive zero-shot capabilities in various document reranking tasks. Despite their successful implementations, there is still a gap in existing literature on their effectiveness in low-resource languages. To address this gap, we investigate how LLMs function as rerankers in cross-lingual information retrieval (CLIR) systems for African languages. Our imple…
▽ More
Large language models (LLMs) have shown impressive zero-shot capabilities in various document reranking tasks. Despite their successful implementations, there is still a gap in existing literature on their effectiveness in low-resource languages. To address this gap, we investigate how LLMs function as rerankers in cross-lingual information retrieval (CLIR) systems for African languages. Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba) and we examine cross-lingual reranking with queries in English and passages in the African languages. Additionally, we analyze and compare the effectiveness of monolingual reranking using both query and document translations. We also evaluate the effectiveness of LLMs when leveraging their own generated translations. To get a grasp of the effectiveness of multiple LLMs, our study focuses on the proprietary models RankGPT-4 and RankGPT-3.5, along with the open-source model, RankZephyr. While reranking remains most effective in English, our results reveal that cross-lingual reranking may be competitive with reranking in African languages depending on the multilingual capability of the LLM.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
Authors:
Manveer Singh Tamber,
Ronak Pradeep,
Jimmy Lin
Abstract:
Recent work in zero-shot listwise reranking using LLMs has achieved state-of-the-art results. However, these methods are not without drawbacks. The proposed methods rely on large LLMs with billions of parameters and limited context sizes. This paper introduces LiT5-Distill and LiT5-Score, two methods for efficient zero-shot listwise reranking, leveraging T5 sequence-to-sequence encoder-decoder mod…
▽ More
Recent work in zero-shot listwise reranking using LLMs has achieved state-of-the-art results. However, these methods are not without drawbacks. The proposed methods rely on large LLMs with billions of parameters and limited context sizes. This paper introduces LiT5-Distill and LiT5-Score, two methods for efficient zero-shot listwise reranking, leveraging T5 sequence-to-sequence encoder-decoder models. Our approaches demonstrate competitive reranking effectiveness compared to recent state-of-the-art LLM rerankers with substantially smaller models. Through LiT5-Score, we also explore the use of cross-attention to calculate relevance scores to perform reranking, eliminating the reliance on external passage relevance labels for training. We present a range of models from 220M parameters to 3B parameters, all with strong reranking results, challenging the necessity of large-scale models for effective zero-shot reranking and opening avenues for more efficient listwise reranking solutions. We provide code and scripts to reproduce our results at https://github.com/castorini/LiT5.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
Authors:
Ronak Pradeep,
Sahel Sharifymoghaddam,
Jimmy Lin
Abstract:
In information retrieval, proprietary large language models (LLMs) such as GPT-4 and open-source counterparts such as LLaMA and Vicuna have played a vital role in reranking. However, the gap between open-source and closed models persists, with reliance on proprietary, non-transparent models constraining reproducibility. Addressing this gap, we introduce RankZephyr, a state-of-the-art, open-source…
▽ More
In information retrieval, proprietary large language models (LLMs) such as GPT-4 and open-source counterparts such as LLaMA and Vicuna have played a vital role in reranking. However, the gap between open-source and closed models persists, with reliance on proprietary, non-transparent models constraining reproducibility. Addressing this gap, we introduce RankZephyr, a state-of-the-art, open-source LLM for listwise zero-shot reranking. RankZephyr not only bridges the effectiveness gap with GPT-4 but in some cases surpasses the proprietary model. Our comprehensive evaluations across several datasets (TREC Deep Learning Tracks; NEWS and COVID from BEIR) showcase this ability. RankZephyr benefits from strategic training choices and is resilient against variations in initial document ordering and the number of documents reranked. Additionally, our model outperforms GPT-4 on the NovelEval test set, comprising queries and passages past its training period, which addresses concerns about data contamination. To foster further research in this rapidly evolving field, we provide all code necessary to reproduce our results at https://github.com/castorini/rank_llm.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Authors:
Ronak Pradeep,
Sahel Sharifymoghaddam,
Jimmy Lin
Abstract:
Researchers have successfully applied large language models (LLMs) such as ChatGPT to reranking in an information retrieval context, but to date, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations…
▽ More
Researchers have successfully applied large language models (LLMs) such as ChatGPT to reranking in an information retrieval context, but to date, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. To address this significant shortcoming, we present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4. We hope our work provides the foundation for future research on reranking with modern LLMs. All the code necessary to reproduce our results is available at https://github.com/castorini/rank_llm.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Vector Search with OpenAI Embeddings: Lucene Is All You Need
Authors:
Jimmy Lin,
Ronak Pradeep,
Tommaso Teofili,
Jasper Xian
Abstract:
We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular MS MARCO passage ranking test collection. The main goal of our work is to challenge the prevailing narrative that a dedicated vector store is necessary to take advantage of recent advances in deep neural networks as applied to search. Quite the contrary, we show that hierarchical…
▽ More
We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular MS MARCO passage ranking test collection. The main goal of our work is to challenge the prevailing narrative that a dedicated vector store is necessary to take advantage of recent advances in deep neural networks as applied to search. Quite the contrary, we show that hierarchical navigable small-world network (HNSW) indexes in Lucene are adequate to provide vector search capabilities in a standard bi-encoder architecture. This suggests that, from a simple cost-benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern "AI stack" for search, since such applications have already received substantial investments in existing, widely deployed infrastructure.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
ReadProbe: A Demo of Retrieval-Enhanced Large Language Models to Support Lateral Reading
Authors:
Dake Zhang,
Ronak Pradeep
Abstract:
With the rapid growth and spread of online misinformation, people need tools to help them evaluate the credibility and accuracy of online information. Lateral reading, a strategy that involves cross-referencing information with multiple sources, may be an effective approach to achieving this goal. In this paper, we present ReadProbe, a tool to support lateral reading, powered by generative large l…
▽ More
With the rapid growth and spread of online misinformation, people need tools to help them evaluate the credibility and accuracy of online information. Lateral reading, a strategy that involves cross-referencing information with multiple sources, may be an effective approach to achieving this goal. In this paper, we present ReadProbe, a tool to support lateral reading, powered by generative large language models from OpenAI and the Bing search engine. Our tool is able to generate useful questions for lateral reading, scour the web for relevant documents, and generate well-attributed answers to help people better evaluate online information. We made a web-based application to demonstrate how ReadProbe can help reduce the risk of being misled by false information. The code is available at https://github.com/DakeZhang1998/ReadProbe. An earlier version of our tool won the first prize in a national AI misinformation hackathon.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
How Does Generative Retrieval Scale to Millions of Passages?
Authors:
Ronak Pradeep,
Kai Hui,
Jai Gupta,
Adam D. Lelkes,
Honglei Zhuang,
Jimmy Lin,
Donald Metzler,
Vinh Q. Tran
Abstract:
Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer. Although many different approaches have been proposed to improve the effectiveness of generative retrieval, they have…
▽ More
Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer. Although many different approaches have been proposed to improve the effectiveness of generative retrieval, they have only been evaluated on document corpora on the order of 100k in size. We conduct the first empirical study of generative retrieval techniques across various corpus scales, ultimately scaling up to the entire MS MARCO passage ranking task with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters. We uncover several findings about scaling generative retrieval to millions of passages; notably, the central importance of using synthetic queries as document representations during indexing, the ineffectiveness of existing proposed architecture modifications when accounting for compute cost, and the limits of naively scaling model parameters with respect to retrieval performance. While we find that generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge. We believe these findings will be valuable for the community to clarify the current state of generative retrieval, highlight the unique challenges, and inspire new research directions.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Zero-Shot Listwise Document Reranking with a Large Language Model
Authors:
Xueguang Ma,
Xinyu Zhang,
Ronak Pradeep,
Jimmy Lin
Abstract:
Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data. In this work, we propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Different from the existi…
▽ More
Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data. In this work, we propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Different from the existing pointwise ranking methods, where documents are scored independently and ranked according to the scores, LRL directly generates a reordered list of document identifiers given the candidate documents. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker to improve the top-ranked results of a pointwise method for improved efficiency. Additionally, we apply our approach to subsets of MIRACL, a recent multilingual retrieval dataset, with results showing its potential to generalize across different languages.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
For-Each Operations in Collaborative Apps
Authors:
Matthew Weidner,
Ria Pradeep,
Benito Geordie,
Heather Miller
Abstract:
Conflict-free Replicated Data Types (CRDTs) allow collaborative access to an app's data. We describe a novel CRDT operation, for-each on the list of CRDTs, and demonstrate its use in collaborative apps. Our for-each operation applies a given mutation to each element of a list, including elements inserted concurrently. This often preserves user intention in a way that would otherwise require custom…
▽ More
Conflict-free Replicated Data Types (CRDTs) allow collaborative access to an app's data. We describe a novel CRDT operation, for-each on the list of CRDTs, and demonstrate its use in collaborative apps. Our for-each operation applies a given mutation to each element of a list, including elements inserted concurrently. This often preserves user intention in a way that would otherwise require custom CRDT algorithms. We give example applications of our for-each operation to collaborative rich-text, recipe, and slideshow editors.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Collabs: A Flexible and Performant CRDT Collaboration Framework
Authors:
Matthew Weidner,
Huairui Qi,
Maxime Kjaer,
Ria Pradeep,
Benito Geordie,
Yicheng Zhang,
Gregory Schare,
Xuan Tang,
Sicheng Xing,
Heather Miller
Abstract:
A collaboration framework is a distributed system that serves as the data layer for a collaborative app. Conflict-free Replicated Data Types (CRDTs) are a promising theoretical technique for implementing collaboration frameworks. However, existing frameworks are inflexible: they are often one-off implementations of research papers or only permit a restricted set of CRDT semantics, and they do not…
▽ More
A collaboration framework is a distributed system that serves as the data layer for a collaborative app. Conflict-free Replicated Data Types (CRDTs) are a promising theoretical technique for implementing collaboration frameworks. However, existing frameworks are inflexible: they are often one-off implementations of research papers or only permit a restricted set of CRDT semantics, and they do not allow app-specific optimizations. Until now, there was no general framework that lets programmers mix, match, and modify CRDTs.
We solve this with Collabs, a CRDT-based collaboration framework that lets programmers implement their own CRDTs, either from-scratch or by composing existing building blocks. Collabs prioritizes both semantic flexibility and performance flexibility: it allows arbitrary app-specific CRDT behaviors and optimizations, while still providing strong eventual consistency. We demonstrate Collabs's capabilities and programming model with example apps and CRDT implementations. We then show that a collaborative rich-text editor using Collabs's built-in CRDTs can scale to over 100 simultaneous users, unlike existing CRDT frameworks and Google Docs. Collabs also has lower end-to-end latency and server CPU usage than a popular Operational Transformation framework, with acceptable CRDT metadata overhead.
△ Less
Submitted 13 October, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
A Replication Study of Dense Passage Retriever
Authors:
Xueguang Ma,
Kai Sun,
Ronak Pradeep,
Jimmy Lin
Abstract:
Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations. One recent work that has garnered much attention is the dense passage retriever (DPR) technique proposed by Karpukhin et al. (2020) for end-to-end open-domain question answering. We present a replication study of this work, st…
▽ More
Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations. One recent work that has garnered much attention is the dense passage retriever (DPR) technique proposed by Karpukhin et al. (2020) for end-to-end open-domain question answering. We present a replication study of this work, starting with model checkpoints provided by the authors, but otherwise from an independent implementation in our group's Pyserini IR toolkit and PyGaggle neural text ranking library. Although our experimental results largely verify the claims of the original paper, we arrived at two important additional findings that contribute to a better understanding of DPR: First, it appears that the original authors under-report the effectiveness of the BM25 baseline and hence also dense--sparse hybrid retrieval results. Second, by incorporating evidence from the retriever and an improved answer span scoring technique, we are able to improve end-to-end question answering effectiveness using exactly the same models as in the original work.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Authors:
Jimmy Lin,
Xueguang Ma,
Sheng-Chieh Lin,
Jheng-Hong Yang,
Ronak Pradeep,
Rodrigo Nogueira
Abstract:
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire r…
▽ More
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. We also describe how our group has built a culture of replicability through shared norms and tools that enable rigorous automated testing.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
Authors:
Ronak Pradeep,
Rodrigo Nogueira,
Jimmy Lin
Abstract:
We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains. At the core, our design relies on pretrained sequence-to-sequence models within a standard multi-stage ranking architecture. "Expando" refers to the use of document expansion techniques to enrich keyword represen…
▽ More
We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains. At the core, our design relies on pretrained sequence-to-sequence models within a standard multi-stage ranking architecture. "Expando" refers to the use of document expansion techniques to enrich keyword representations of texts prior to inverted indexing. "Mono" and "Duo" refer to components in a reranking pipeline based on a pointwise model and a pairwise model that rerank initial candidates retrieved using keyword search. We present experimental results from the MS MARCO passage and document ranking tasks, the TREC 2020 Deep Learning Track, and the TREC-COVID challenge that validate our design. In all these tasks, we achieve effectiveness that is at or near the state of the art, in some cases using a zero-shot approach that does not exploit any training data from the target task. To support replicability, implementations of our design pattern are open-sourced in the Pyserini IR toolkit and PyGaggle neural reranking library.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Scientific Claim Verification with VERT5ERINI
Authors:
Ronak Pradeep,
Xueguang Ma,
Rodrigo Nogueira,
Jimmy Lin
Abstract:
This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain. We propose VERT5ERINI that exploits T5 for abstract retrieval, sentence selection and label prediction, which are three critical sub-tasks of claim verification. We evaluate our pipeline on SCIFACT, a newly curated dataset that requires models to not…
▽ More
This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain. We propose VERT5ERINI that exploits T5 for abstract retrieval, sentence selection and label prediction, which are three critical sub-tasks of claim verification. We evaluate our pipeline on SCIFACT, a newly curated dataset that requires models to not just predict the veracity of claims but also provide relevant sentences from a corpus of scientific literature that support this decision. Empirically, our pipeline outperforms a strong baseline in each of the three steps. Finally, we show VERT5ERINI's ability to generalize to two new datasets of COVID-19 claims using evidence from the ever-expanding CORD-19 corpus.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
Authors:
Edwin Zhang,
Nikhil Gupta,
Raphael Tang,
Xiao Han,
Ronak Pradeep,
Kuang Lu,
Yue Zhang,
Rodrigo Nogueira,
Kyunghyun Cho,
Hui Fang,
Jimmy Lin
Abstract:
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for hel** domain experts tackle the ongo…
▽ More
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for hel** domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
FPGA based Agile Algorithm-On-Demand Co-Processor
Authors:
R. Pradeep,
S. Vinay,
Sanjay Burman,
V. Kamakoti
Abstract:
With growing computational needs of many real-world applications, frequently changing specifications of standards, and the high design and NRE costs of ASICs, an algorithm-agile FPGA based co-processor has become a viable alternative. In this article, we report about the general design of an algorith-agile co-processor and the proof-of-concept implementation.
With growing computational needs of many real-world applications, frequently changing specifications of standards, and the high design and NRE costs of ASICs, an algorithm-agile FPGA based co-processor has become a viable alternative. In this article, we report about the general design of an algorith-agile co-processor and the proof-of-concept implementation.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.