Skip to main content

Showing 1–10 of 10 results for author: Cole, J R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: **hyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  2. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: **hyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  3. arXiv:2305.14613  [pdf, other

    cs.CL cs.AI

    Selectively Answering Ambiguous Questions

    Authors: Jeremy R. Cole, Michael J. Q. Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein

    Abstract: Trustworthy language models should abstain from answering questions when they do not know the answer. However, the answer to a question can be unknown for a variety of reasons. Prior research has focused on the case in which the question is clear and the answer is unambiguous but possibly unknown, but the answer to a question can also be unclear due to uncertainty of the questioner's intent or con… ▽ More

    Submitted 14 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear in EMNLP 2023. 9 pages, 5 figures, 2 pages of appendix

  4. arXiv:2305.14499  [pdf, other

    cs.CL cs.IR

    NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

    Authors: Livio Baldini Soares, Daniel Gillick, Jeremy R. Cole, Tom Kwiatkowski

    Abstract: Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at EMNLP 2023

  5. arXiv:2303.12860  [pdf, other

    cs.CL cs.AI

    Salient Span Masking for Temporal Understanding

    Authors: Jeremy R. Cole, Aditi Chaudhary, Bhuwan Dhingra, Partha Talukdar

    Abstract: Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. SSM extends general masked language model pretraining by creating additional unsupervised training sentences that mask a single entity or date span, thus oversampling factual information. Despite the success of this paradigm, the span types and sampling strategies are rela… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 5 pages 1 figure, to appear in EACL 2023

  6. arXiv:2303.00242  [pdf, other

    cs.CL

    DIFFQG: Generating Questions to Summarize Factual Changes

    Authors: Jeremy R. Cole, Palak Jain, Julian Martin Eisenschlos, Michael J. Q. Zhang, Eunsol Choi, Bhuwan Dhingra

    Abstract: Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of authoritative websites must keep their information up to date. We propose representing factual changes between paired documents as question-answer pa… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 14 pages. Accepted at EACL 2023 (main, long)

  7. arXiv:2209.12786  [pdf, other

    cs.CL cs.AI

    Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour

    Authors: Fangyu Liu, Julian Martin Eisenschlos, Jeremy R. Cole, Nigel Collier

    Abstract: Language models (LMs) trained on raw texts have no direct access to the physical world. Gordon and Van Durme (2013) point out that LMs can thus suffer from reporting bias: texts rarely report on common facts, instead focusing on the unusual aspects of a situation. If LMs are only trained on text corpora and naively memorise local co-occurrence statistics, they thus naturally would learn a biased v… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: AACL 2022

  8. arXiv:2209.12153  [pdf, other

    cs.CL cs.AI

    WinoDict: Probing language models for in-context word acquisition

    Authors: Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen

    Abstract: We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

  9. arXiv:2109.04587  [pdf, other

    cs.CL cs.AI

    Graph-Based Decoding for Task Oriented Semantic Parsing

    Authors: Jeremy R. Cole, Nanjiang Jiang, Panupong Pasupat, Luheng He, Peter Shaw

    Abstract: The dominant paradigm for semantic parsing in recent years is to formulate parsing as a sequence-to-sequence task, generating predictions with auto-regressive sequence decoders. In this work, we explore an alternative paradigm. We formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing. We compare various decoding techniques… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: To appear in EMNLP 5 pages 4 figures

  10. Time-Aware Language Models as Temporal Knowledge Bases

    Authors: Bhuwan Dhingra, Jeremy R. Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein, William W. Cohen

    Abstract: Many facts come with an expiration date, from the name of the President to the basketball team Lebron James plays for. But language models (LMs) are trained on snapshots of data collected at a specific moment in time, and this can limit their utility, especially in the closed-book setting where the pretraining corpus must contain the facts the model should memorize. We introduce a diagnostic datas… ▽ More

    Submitted 23 April, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Version accepted to TACL

    Journal ref: Transactions of the Association for Computational Linguistics 2022; 10 257-273