Skip to main content

Showing 1–16 of 16 results for author: Ciaramita, M

.
  1. arXiv:2310.02932  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Assessing Large Language Models on Climate Information

    Authors: Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Hübscher, Christian Buck, Niels G. Mede, Markus Leippold, Nadine Strauß

    Abstract: As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM genera… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  2. arXiv:2305.14171  [pdf, other

    cs.CL

    In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models

    Authors: Afra Amini, Massimiliano Ciaramita

    Abstract: Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g.,… ▽ More

    Submitted 22 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  3. arXiv:2212.08037  [pdf, other

    cs.CL

    Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

    Authors: Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster

    Abstract: Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of… ▽ More

    Submitted 10 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  4. arXiv:2210.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Decoding a Neural Retriever's Latent Space for Query Suggestion

    Authors: Leonard Adolphs, Michelle Chen Huebscher, Christian Buck, Sertan Girgin, Olivier Bachem, Massimiliano Ciaramita, Thomas Hofmann

    Abstract: Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  5. arXiv:2209.15469  [pdf, other

    cs.CL

    Zero-Shot Retrieval with Search Agents and Hybrid Environments

    Authors: Michelle Chen Huebscher, Christian Buck, Massimiliano Ciaramita, Sascha Rothe

    Abstract: Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid envir… ▽ More

    Submitted 29 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

  6. arXiv:2109.00527  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Boosting Search Engines with Interactive Agents

    Authors: Leonard Adolphs, Benjamin Boerschinger, Christian Buck, Michelle Chen Huebscher, Massimiliano Ciaramita, Lasse Espeholt, Thomas Hofmann, Yannic Kilcher, Sascha Rothe, Pier Giuseppe Sessa, Lierni Sestorain Saralegui

    Abstract: This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and s… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Published in Transactions on Machine Learning Research (06/2022)

  7. arXiv:2012.00614  [pdf, other

    cs.CL cs.AI

    CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims

    Authors: Thomas Diggelmann, Jordan Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, Markus Leippold

    Abstract: We introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to facilitate and encourage work on improving algorithms for retrieving evidential support for climate-specific claims, addressing the underlying language understanding challenges, and ultimately help alleviate the impact of misin… ▽ More

    Submitted 2 January, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020

  8. arXiv:2012.00483  [pdf, other

    cs.CL cs.AI

    ClimaText: A Dataset for Climate Change Topic Detection

    Authors: Francesco S. Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, Markus Leippold

    Abstract: Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex… ▽ More

    Submitted 2 January, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020

  9. arXiv:1911.04156  [pdf, other

    cs.CL cs.AI

    Meta Answering for Machine Reading

    Authors: Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu

    Abstract: We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment. The environment encapsulates a competitive machine reader based on BERT, providing candidate answers to questions, and possibly some context. To validate the realism of our formulation, we ask humans to play the role of… ▽ More

    Submitted 30 April, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

  10. arXiv:1908.04211  [pdf, other

    cs.CL cs.LG

    On Identifiability in Transformers

    Authors: Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer

    Abstract: In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We pr… ▽ More

    Submitted 7 February, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Published as a conference paper at ICLR 2020

    MSC Class: I.2.7; I.7.0 ACM Class: I.2.7; I.7.0

  11. arXiv:1809.10658  [pdf, other

    cs.LG stat.ML

    Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

    Authors: Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita

    Abstract: We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the tr… ▽ More

    Submitted 25 December, 2018; v1 submitted 27 September, 2018; originally announced September 2018.

  12. arXiv:1805.10338  [pdf, other

    cs.CL cs.NE

    Zero-Shot Dual Machine Translation

    Authors: Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann

    Abstract: Neural Machine Translation (NMT) systems rely on large amounts of parallel data. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we present an approach that combines zero-shot and dual learning. The latter relies on reinforcement learning, to exploit the duality of the machine translation task, and requires only monolingual… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  13. arXiv:1801.07537  [pdf, other

    cs.CL cs.AI

    Analyzing Language Learned by an Active Question Answering Agent

    Authors: Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

    Abstract: We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the syste… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.

    Comments: Emergent Communication Workshop, NIPS 2017

  14. arXiv:1705.07830  [pdf, other

    cs.CL cs.AI

    Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

    Authors: Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

    Abstract: We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned… ▽ More

    Submitted 2 March, 2018; v1 submitted 22 May, 2017; originally announced May 2017.

    Journal ref: Sixth International Conference on Learning Representations (ICLR), 2018

  15. arXiv:1309.0337  [pdf, other

    stat.ML cs.IR cs.LG

    Scalable Probabilistic Entity-Topic Modeling

    Authors: Neil Houlsby, Massimiliano Ciaramita

    Abstract: We present an LDA approach to entity disambiguation. Each topic is associated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary size, both in the millions. We tackle these problems using a novel distributed inference and representation framework based on a parallel Gibbs sampler guided by th… ▽ More

    Submitted 2 September, 2013; originally announced September 2013.

  16. arXiv:cs/0008020  [pdf, ps, other

    cs.CL cs.AI

    Explaining away ambiguity: Learning verb selectional preference with Bayesian networks

    Authors: Massimiliano Ciaramita, Mark Johnson

    Abstract: This paper presents a Bayesian model for unsupervised learning of verb selectional preferences. For each verb the model creates a Bayesian network whose architecture is determined by the lexical hierarchy of Wordnet and whose parameters are estimated from a list of verb-object pairs found from a corpus. ``Explaining away'', a well-known property of Bayesian networks, helps the model deal in a na… ▽ More

    Submitted 22 August, 2000; originally announced August 2000.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany, Vol.1, 2000, p.187