Skip to main content

Showing 1–17 of 17 results for author: Soares, L B

.
  1. arXiv:2407.00361  [pdf, other

    cs.CL cs.AI

    From RAG to RICHES: Retrieval Interlaced with Sequence Generation

    Authors: Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

    Abstract: We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RIC… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figures, Preprint

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2310.16568  [pdf, other

    cs.CL

    1-PAGER: One Pass Answer Generation and Evidence Retrieval

    Authors: Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

    Abstract: We present 1-Pager the first system that answers a question and retrieves evidence using a single Transformer-based model and decoding process. 1-Pager incrementally partitions the retrieval corpus using constrained decoding to select a document and answer string, and we show that this is competitive with comparable retrieve-and-read alternatives according to both retrieval and answer accuracy met… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (Findings)

  5. arXiv:2310.08764  [pdf, other

    cs.CL cs.LG

    Calibrating Likelihoods towards Consistency in Summarization Models

    Authors: Polina Zablotskaia, Misha Khalman, Rishabh Joshi, Livio Baldini Soares, Shoshana Jakobovits, Joshua Maynez, Shashi Narayan

    Abstract: Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context, but t… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  6. arXiv:2305.14499  [pdf, other

    cs.CL cs.IR

    NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

    Authors: Livio Baldini Soares, Daniel Gillick, Jeremy R. Cole, Tom Kwiatkowski

    Abstract: Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at EMNLP 2023

  7. arXiv:2305.14332  [pdf, other

    cs.CL

    Evaluating and Modeling Attribution for Cross-Lingual Question Answering

    Authors: Benjamin Muller, John Wieting, Jonathan H. Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Baldini Soares, Roee Aharoni, Jonathan Herzig, Xinyi Wang

    Abstract: Trustworthy answer content is abundant in many high-resource languages and is instantly accessible through question answering systems, yet this content can be hard to access for those that do not speak these languages. The leap forward in cross-lingual modeling quality offered by generative language models offers much promise, yet their raw generations often fall short in factuality. To improve tr… ▽ More

    Submitted 15 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published as a long paper at EMNLP 2023

  8. arXiv:2212.08037  [pdf, other

    cs.CL

    Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

    Authors: Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster

    Abstract: Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of… ▽ More

    Submitted 10 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  9. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  10. arXiv:2012.00893  [pdf, other

    cs.CL cs.LG

    Evaluating Explanations: How much do explanations from the teacher aid students?

    Authors: Danish Pruthi, Rachit Bansal, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, William W. Cohen

    Abstract: While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of explanations via the accuracy gains that they confer on a student model trained to simulate a teacher model. Crucially, the explanations are available to the stude… ▽ More

    Submitted 16 December, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: TACL 2021 (pre-MIT Press publication version)

  11. arXiv:2009.06354  [pdf, other

    cs.CL cs.AI

    QED: A Framework and Dataset for Explanations in Question Answering

    Authors: Matthew Lamm, Jennimaria Palomaki, Chris Alberti, Daniel Andor, Eunsol Choi, Livio Baldini Soares, Michael Collins

    Abstract: A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and ans… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  12. arXiv:2007.00849  [pdf, other

    cs.CL cs.AI cs.LG

    Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

    Authors: Pat Verga, Haitian Sun, Livio Baldini Soares, William W. Cohen

    Abstract: Massive language models are the core of modern NLP modeling and have been shown to encode impressive amounts of commonsense and factual information. However, that knowledge exists only within the latent parameters of the model, inaccessible to inspection and interpretation, and even worse, factual information memorized from the training corpora is likely to become stale as the world changes. Knowl… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  13. arXiv:2005.14253  [pdf, ps, other

    cs.CL cs.LG

    Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking

    Authors: Thibault Févry, Nicholas FitzGerald, Livio Baldini Soares, Tom Kwiatkowski

    Abstract: In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity c… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 11 pages, 8 figures, appearing at AKBC 2020

  14. arXiv:2004.11997  [pdf, other

    cs.CL

    New Protocols and Negative Results for Textual Entailment Data Collection

    Authors: Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

    Abstract: Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving e… ▽ More

    Submitted 29 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: To appear at EMNLP 2020

  15. arXiv:2004.07202  [pdf, other

    cs.CL cs.LG

    Entities as Experts: Sparse Memory Access with Entity Supervision

    Authors: Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, Tom Kwiatkowski

    Abstract: We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model - Entities as Experts (EAE) - that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE's entity representations are learned directly from text. We show… ▽ More

    Submitted 6 October, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  16. arXiv:2001.03765  [pdf, other

    cs.CL

    Learning Cross-Context Entity Representations from Text

    Authors: Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault Févry, David Weiss, Tom Kwiatkowski

    Abstract: Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a f… ▽ More

    Submitted 11 January, 2020; originally announced January 2020.

  17. arXiv:1906.03158  [pdf, other

    cs.CL cs.AI

    Matching the Blanks: Distributional Similarity for Relation Learning

    Authors: Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski

    Abstract: General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. However, both of these approaches are limited in their ability to generalize. In… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: To appear at ACL 2019