Skip to main content

Showing 1–50 of 64 results for author: Minervini, P

.
  1. arXiv:2406.14425  [pdf, other

    cs.CL cs.AI cs.LG

    SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

    Authors: Gayane Ghazaryan, Erik Arakelyan, Pasquale Minervini, Isabelle Augenstein

    Abstract: Question Answering (QA) datasets have been instrumental in develo** and evaluating Large Language Model (LLM) capabilities. However, such datasets are scarce for languages other than English due to the cost and difficulties of collection and manual annotation. This means that producing novel models and measuring the performance of multilingual LLMs in low-resource languages is challenging. To mi… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.13229  [pdf, other

    cs.CL cs.AI cs.LG

    Probing the Emergence of Cross-lingual Alignment during LLM Training

    Authors: Hetong Wang, Pasquale Minervini, Edoardo M. Ponti

    Abstract: Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance. We speculate that this is predicated on their ability to align languages without explicit supervision from parallel sentences. While representations of translationally equivalent sentences in different languages are known to be similar after convergence, however, it remains unclear… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted to Findings of the Association for Computational Linguistics: ACL 2024

  3. arXiv:2406.11430  [pdf, other

    cs.CL cs.AI

    A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression

    Authors: Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini

    Abstract: The deployment of large language models (LLMs) is often hindered by the extensive memory requirements of the Key-Value (KV) cache, especially as context lengths increase. Existing approaches to reduce the KV cache size involve either fine-tuning the model to learn a compression strategy or leveraging attention scores to reduce the sequence length. We analyse the attention distributions in decoder-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.04127  [pdf, other

    cs.CL cs.AI

    Are We Done with MMLU?

    Authors: Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini

    Abstract: Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2405.18028  [pdf, other

    cs.CL cs.AI

    Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints

    Authors: Aryo Pradipta Gema, Chaeeun Lee, Pasquale Minervini, Luke Daines, T. Ian Simpson, Beatrice Alex

    Abstract: The MEDIQA-CORR 2024 shared task aims to assess the ability of Large Language Models (LLMs) to identify and correct medical errors in clinical notes. In this study, we evaluate the capability of general LLMs, specifically GPT-3.5 and GPT-4, to identify and correct medical errors with multiple prompting strategies. Recognising the limitation of LLMs in generating accurate corrections only via promp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.15984  [pdf, other

    cs.CL cs.AI

    Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

    Authors: Simon Chi Lok Yu, Jie He, Pasquale Minervini, Jeff Z. Pan

    Abstract: With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically r… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 29 pages, 6 figures

  7. arXiv:2404.19597  [pdf, other

    cs.CL cs.CR

    Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

    Authors: Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. However, the impact of backdoor attacks on multilingual models remains under-explored. Our research focuses on cross-lingual backdoor att… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: work in progress

  8. arXiv:2404.16041  [pdf, other

    cs.PL cs.AI cs.LG

    Forklift: An Extensible Neural Lifter

    Authors: Jordi Armengol-Estapé, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael F. P. O'Boyle

    Abstract: The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  9. arXiv:2404.08458  [pdf, other

    stat.ML cs.AI cs.LG

    On the Independence Assumption in Neurosymbolic Learning

    Authors: Emile van Krieken, Pasquale Minervini, Edoardo M. Ponti, Antonio Vergari

    Abstract: State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder op… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at ICML 2024

  10. arXiv:2404.05904  [pdf, other

    cs.CL

    The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

    Authors: Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, ** Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini

    Abstract: Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and com… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  11. arXiv:2404.00484  [pdf, other

    cs.CL

    Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

    Authors: Aryo Pradipta Gema, Giwon Hong, Pasquale Minervini, Luke Daines, Beatrice Alex

    Abstract: The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports. In this study, we evaluate various Large Language Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Parameter-Efficient Fine-Tuning (PEFT). We propose a PEFT method to improve the consistency of LLMs by me… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  12. arXiv:2403.20288  [pdf, other

    cs.CL cs.AI

    Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain

    Authors: Burcu Sayin, Pasquale Minervini, Jacopo Staiano, Andrea Passerini

    Abstract: We explore the potential of Large Language Models (LLMs) to assist and potentially correct physicians in medical decision-making tasks. We evaluate several LLMs, including Meditron, Llama2, and Mistral, to analyze the ability of these models to interact effectively with physicians across different scenarios. We consider questions from PubMedQA and several tasks, ranging from binary (yes/no) respon… ▽ More

    Submitted 6 May, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for oral presentation at NAACL 2024, The 6th Clinical Natural Language Processing Workshop

  13. arXiv:2403.07965  [pdf, other

    cs.LG cs.AI

    Conditional computation in neural networks: principles and research trends

    Authors: Simone Scardapane, Alessandro Baiocchi, Alessio Devoto, Valerio Marsocci, Pasquale Minervini, Jary Pomponi

    Abstract: This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Under review at Intelligenza Artificiale (IOS Press)

  14. arXiv:2403.03230  [pdf, other

    q-bio.NC cs.AI

    Large language models surpass human experts in predicting neuroscience results

    Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

    Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  15. arXiv:2403.01461  [pdf, other

    cs.CL

    Answerability in Retrieval-Augmented Open-Domain Question Answering

    Authors: Rustam Abdumalikov, Pasquale Minervini, Yova Kementchedjhieva

    Abstract: The performance of Open-Domain Question Answering (ODQA) retrieval systems can exhibit sub-optimal behavior, providing text excerpts with varying degrees of irrelevance. Unfortunately, many existing ODQA datasets lack examples specifically targeting the identification of irrelevant text excerpts. Previous attempts to address this gap have relied on a simplistic approach of pairing questions with r… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 tables

  16. arXiv:2402.17389  [pdf, other

    cs.CL cs.AI

    FairBelief -- Assessing Harmful Beliefs in Language Models

    Authors: Mattia Setzu, Marta Marchiori Manerba, Pasquale Minervini, Debora Nozza

    Abstract: Language Models (LMs) have been shown to inherit undesired biases that might hurt minorities and underrepresented groups if such systems were integrated into real-world applications without careful fairness auditing. This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that an LM may embed with different degrees of confidence and that covertly in… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  17. arXiv:2402.13991  [pdf, other

    cs.CL

    Analysing The Impact of Sequence Composition on Language Model Pre-Training

    Authors: Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini

    Abstract: Most language model pre-training frameworks concatenate multiple documents into fixed-length sequences and use causal masking to compute the likelihood of each token given its context; this strategy is widely adopted due to its simplicity and efficiency. However, to this day, the influence of the pre-training sequence composition strategy on the generalisation properties of the model remains under… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  18. arXiv:2312.10193  [pdf, other

    cs.LG

    Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

    Authors: Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane

    Abstract: The computational cost of transformer models makes them inefficient in low-latency or low-power applications. While techniques such as quantization or linear attention can reduce the computational load, they may incur a reduction in accuracy. In addition, globally reducing the cost for all inputs may be sub-optimal. We observe that for each layer, the full width of the layer may be needed only for… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  19. arXiv:2311.07556  [pdf, other

    cs.CL

    Using Natural Language Explanations to Improve Robustness of In-context Learning

    Authors: Xuanli He, Yuxiang Wu, Oana-Maria Camburu, Pasquale Minervini, Pontus Stenetorp

    Abstract: Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets coveri… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: accepted to ACL2024 (main)

  20. arXiv:2310.14418  [pdf, other

    cs.CL

    REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization

    Authors: Mohammad Reza Ghasemi Madani, Pasquale Minervini

    Abstract: Human-annotated textual explanations are becoming increasingly important in Explainable Natural Language Processing. Rationale extraction aims to provide faithful (i.e., reflective of the behavior of the model) and plausible (i.e., convincing to humans) explanations by highlighting the inputs that had the largest impact on the prediction without compromising the performance of the task model. In r… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  21. arXiv:2309.09045  [pdf, other

    cs.LG

    Temporal Smoothness Regularisers for Neural Link Predictors

    Authors: Manuel Dileo, Pasquale Minervini, Matteo Zignani, Sabrina Gaito

    Abstract: Most algorithms for representation learning and link prediction on relational data are designed for static data. However, the data to which they are applied typically evolves over time, including online social networks or interactions between users and items in recommender systems. This is also the case for graph-structured knowledge bases -- knowledge graphs -- which contain facts that are valid… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2103.10379 by other authors

  22. arXiv:2308.06585  [pdf, other

    cs.LG cs.AI cs.DB cs.LO cs.NE

    Approximate Answering of Graph Queries

    Authors: Michael Cochez, Dimitrios Alivanistos, Erik Arakelyan, Max Berrendorf, Daniel Daza, Mikhail Galkin, Pasquale Minervini, Mathias Niepert, Hongyu Ren

    Abstract: Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. In this chapter, we will give an overview of several methods wh… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Preprint of Ch. 17 "Approximate Answering of Graph Queries" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2

  23. arXiv:2307.06440  [pdf, other

    cs.LG cs.AI cs.CL cs.NE cs.PF

    No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

    Authors: Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner

    Abstract: The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer drop**), batch sel… ▽ More

    Submitted 14 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  24. arXiv:2307.03042  [pdf, other

    cs.CL cs.LG

    Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

    Authors: Aryo Pradipta Gema, Pasquale Minervini, Luke Daines, Tom Hope, Beatrice Alex

    Abstract: Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. Parameter-Efficient Fine-Tuning (PEFT) techniques for fine-tuning language models significantly reduce computational requirements by selectively fine-tuning small subsets of parameters. In this study, we propose a two-step PEFT framework and evaluat… ▽ More

    Submitted 9 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

  25. arXiv:2305.19979  [pdf, other

    cs.LG cs.AI

    Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks

    Authors: Aryo Pradipta Gema, Dominik Grabarczyk, Wolf De Wulf, Piyush Borole, Javier Antonio Alfaro, Pasquale Minervini, Antonio Vergari, Ajitha Rajan

    Abstract: Knowledge graphs are powerful tools for representing and organising complex biomedical data. Several knowledge graph embedding algorithms have been proposed to learn from and complete knowledge graphs. However, a recent study demonstrates the limited efficacy of these embedding algorithms when applied to biomedical knowledge graphs, raising the question of whether knowledge graph embeddings have l… ▽ More

    Submitted 31 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  26. arXiv:2305.13235  [pdf, other

    cs.CL cs.AI

    SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

    Authors: Jesus Solano, Oana-Maria Camburu, Pasquale Minervini

    Abstract: Explaining the decisions of neural models is crucial for ensuring their trustworthiness at deployment time. Using Natural Language Explanations (NLEs) to justify a model's predictions has recently gained increasing interest. However, this approach usually demands large datasets of human-written NLEs for the ground-truth answers, which are expensive and potentially infeasible for some applications.… ▽ More

    Submitted 23 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  27. arXiv:2305.13214  [pdf, other

    cs.CL

    Logical Reasoning for Natural Language Inference Using Generated Facts as Atoms

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Oana-Maria Camburu, Marek Rei

    Abstract: State-of-the-art neural models can now reach human performance levels across various natural language understanding tasks. However, despite this impressive performance, models are known to learn from annotation artefacts at the expense of the underlying task. While interpretability methods can identify influential features for each prediction, there are no guarantees that these features are respon… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    ACM Class: I.2.7

  28. arXiv:2301.12313  [pdf, other

    cs.LG cs.AI cs.LO cs.NE

    Adapting Neural Link Predictors for Data-Efficient Complex Query Answering

    Authors: Erik Arakelyan, Pasquale Minervini, Daniel Daza, Michael Cochez, Isabelle Augenstein

    Abstract: Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Prior work in the literature has proposed to address this problem by designing architectures trained end-to-end for the complex query answering task with a reasoning process that is hard to interpret while requiring data and reso… ▽ More

    Submitted 11 July, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

  29. arXiv:2211.09856  [pdf, other

    cs.LG q-bio.QM

    Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients

    Authors: Adrianna Janik, Maria Torrente, Luca Costabello, Virginia Calvo, Brian Walsh, Carlos Camps, Sameh K. Mohamed, Ana L. Ortega, Vít Nováček, Bartomeu Massutí, Pasquale Minervini, M. Rosario Garcia Campelo, Edel del Barco, Joaquim Bosch-Barrera, Ernestina Menasalvas, Mohan Timilsina, Mariano Provencio

    Abstract: Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients? Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from t… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  30. arXiv:2210.16773  [pdf, other

    cs.CL cs.AI cs.LG

    An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

    Authors: Yuxiang Wu, Yu Zhao, Baotian Hu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a retrieval-augmented model that has access to an external knowledge source. Parametric and retrieval-augmented models have complementary strengths in terms of computational e… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 main conference long paper. 8 pages, 6 figures

  31. arXiv:2210.15353  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Directed Acyclic Graphs via Backpropagation

    Authors: Andrew J. Wren, Pasquale Minervini, Luca Franceschi, Valentina Zantedeschi

    Abstract: Recently continuous relaxations have been proposed in order to learn Directed Acyclic Graphs (DAGs) from data by backpropagation, instead of using combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore that direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation. Based on the… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 15 pages, 2 figures, 7 tables. Accepted for NeurIPS 2022 workshops on: Causal Machine Learning for Real-World Impact; and Neuro Causal and Symbolic AI

  32. arXiv:2209.04862  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

    Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert

    Abstract: The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. Howe… ▽ More

    Submitted 5 February, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

  33. arXiv:2207.09980  [pdf, other

    cs.LG cs.AI cs.CL

    ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

    Authors: Yihong Chen, Pushkar Mishra, Luca Franceschi, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture draws upon… ▽ More

    Submitted 27 October, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

    MSC Class: 68T05; 68T07; 68T50 ACM Class: I.2.7; I.2.6

  34. arXiv:2205.11432  [pdf, other

    cs.CL cs.LG

    Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Marek Rei

    Abstract: Current Natural Language Inference (NLI) models achieve impressive results, sometimes outperforming humans when evaluating on in-distribution test sets. However, as these models are known to learn from annotation artefacts and dataset biases, it is unclear to what extent the models are learning the task of NLI instead of learning from shallow heuristics in their training data. We address this issu… ▽ More

    Submitted 21 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022

  35. arXiv:2204.05895  [pdf, other

    cs.CL

    XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking

    Authors: Han Zhou, Ignacio Iacobacci, Pasquale Minervini

    Abstract: Dialogue State Tracking (DST), a crucial component of task-oriented dialogue (ToD) systems, keeps track of all important information pertaining to dialogue history: filling slots with the most probable values throughout the conversation. Existing methods generally rely on a predefined set of values and struggle to generalise to previously unseen slots in new domains. To overcome these challenges,… ▽ More

    Submitted 25 February, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted to Findings of EACL 2023

  36. arXiv:2204.04779  [pdf, other

    cs.CL cs.LG

    MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

    Authors: Saadullah Amin, Pasquale Minervini, David Chang, Pontus Stenetorp, Günter Neumann

    Abstract: Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedica… ▽ More

    Submitted 13 September, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted by COLING 2022 (Oral presentation, Main Conference: Long Papers)

  37. arXiv:2203.10620  [pdf, other

    cs.CL cs.AI cs.LG

    Differentiable Reasoning over Long Stories -- Assessing Systematic Generalisation in Neural Models

    Authors: Wanshui Li, Pasquale Minervini

    Abstract: Contemporary neural networks have achieved a series of developments and successes in many aspects; however, when exposed to data outside the training distribution, they may fail to predict correct answers. In this work, we were concerned about this generalisation issue and thus analysed a broad set of models systematically and robustly over long stories. Related experiments were conducted based on… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

  38. arXiv:2110.13205  [pdf, other

    cs.LG

    A Probabilistic Framework for Knowledge Graph Data Augmentation

    Authors: Jatin Chauhan, Priyanshu Gupta, Pasquale Minervini

    Abstract: We present NNMFAug, a probabilistic framework to perform data augmentation for the task of knowledge graph completion to counter the problem of data scarcity, which can enhance the learning process of neural link predictors. Our method can generate potentially diverse triples with the advantage of being efficient and scalable as well as agnostic to the choice of the link prediction model and datas… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  39. arXiv:2110.02834  [pdf, other

    cs.CL cs.AI

    Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

    Authors: Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp

    Abstract: Learning good representations on multi-relational graphs is essential to knowledge base completion (KBC). In this paper, we propose a new self-supervised training objective for multi-relational graph representation learning, via simply incorporating relation prediction into the commonly used 1vsAll objective. The new training objective contains not only terms for predicting the subject and object… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: AKBC 2021

  40. arXiv:2107.02102  [pdf, other

    cs.CL cs.AI

    Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

    Authors: Yuxiang Wu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: 7 pages, 1 figure, to be published in ACL-IJCNLP 2021

  41. arXiv:2106.14052  [pdf, other

    cs.AI cs.DB cs.LG

    Combining Inductive and Deductive Reasoning for Query Answering over Incomplete Knowledge Graphs

    Authors: Medina Andresel, Trung-Kien Tran, Csaba Domokos, Pasquale Minervini, Daria Stepanova

    Abstract: Current methods for embedding-based query answering over incomplete Knowledge Graphs (KGs) only focus on inductive reasoning, i.e., predicting answers by learning patterns from the data, and lack the complementary ability to do deductive reasoning, which requires the application of domain knowledge to infer further information. To address this shortcoming, we investigate the problem of incorporati… ▽ More

    Submitted 31 August, 2023; v1 submitted 26 June, 2021; originally announced June 2021.

  42. arXiv:2106.01798  [pdf, other

    cs.LG cs.AI

    Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

    Authors: Mathias Niepert, Pasquale Minervini, Luca Franceschi

    Abstract: Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it… ▽ More

    Submitted 27 October, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready; repo: https://github.com/nec-research/tf-imle

  43. arXiv:2102.07033  [pdf, other

    cs.CL cs.AI cs.LG

    PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

    Authors: Patrick Lewis, Yuxiang Wu, Linqing Liu, Pasquale Minervini, Heinrich Küttler, Aleksandra Piktus, Pontus Stenetorp, Sebastian Riedel

    Abstract: Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowl… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  44. arXiv:2102.04220  [pdf, other

    cs.LG

    Grid-to-Graph: Flexible Spatial Relational Inductive Biases for Reinforcement Learning

    Authors: Zhengyao Jiang, Pasquale Minervini, Minqi Jiang, Tim Rocktaschel

    Abstract: Although reinforcement learning has been successfully applied in many domains in recent years, we still lack agents that can systematically generalize. While relational inductive biases that fit a task can improve generalization of RL agents, these biases are commonly hard-coded directly in the agent's neural architecture. In this work, we show that we can incorporate relational inductive biases,… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted by AAMAS 2021

  45. arXiv:2101.09688  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models

    Authors: Daniel de Vassimon Manela, David Errington, Thomas Fisher, Boris van Breugel, Pasquale Minervini

    Abstract: This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models when tackling the WinoBias pronoun resolution task. We find evidence that gender stereotype correlates approximately negatively with gender skew in out-of-the-box models, suggesting that there is a trade-off between these two forms of bias. We investigate… ▽ More

    Submitted 16 February, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Comments: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

  46. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  47. arXiv:2011.05435  [pdf, other

    cs.CL cs.AI cs.LG

    Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering

    Authors: Yuxiang Wu, Sebastian Riedel, Pasquale Minervini, Pontus Stenetorp

    Abstract: Most approaches to Open-Domain Question Answering consist of a light-weight retriever that selects a set of candidate passages, and a computationally expensive reader that examines the passages to identify the correct answer. Previous works have shown that as the number of retrieved passages increases, so does the performance of the reader. However, they assume all retrieved passages are of equal… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: 11 pages, 9 figures, presented in EMNLP 2020 main conference and SustaiNLP 2020 workshop

  48. arXiv:2011.03459  [pdf, other

    cs.LG cs.AI cs.LO cs.NE

    Complex Query Answering with Neural Link Predictors

    Authors: Erik Arakelyan, Daniel Daza, Pasquale Minervini, Michael Cochez

    Abstract: Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions ($\land$), disjunctions ($\lor$) and existential quantifiers ($\exists$), while accounting for missing edges. In this work, w… ▽ More

    Submitted 18 March, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Proceedings of the Ninth International Conference on Learning Representations (ICLR 2021, oral presentation)

  49. arXiv:2007.09185  [pdf, other

    cs.AI cs.CL cs.LG

    WordCraft: An Environment for Benchmarking Commonsense Agents

    Authors: Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

    Abstract: The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and p… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  50. arXiv:2007.06477  [pdf, other

    cs.AI cs.CL cs.LG cs.NE cs.SC

    Learning Reasoning Strategies in End-to-End Differentiable Proving

    Authors: Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel

    Abstract: Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs). These neuro-symbolic models can induce interpretable rules and learn representations from data via back-propagation, while providing logical explanations for their predictions. However, they are restri… ▽ More

    Submitted 24 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Proceedings of the 37th International Conference on Machine Learning (ICML 2020)