-
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Authors:
Joao Monteiro,
Pierre-Andre Noel,
Etienne Marcotte,
Sai Rajeswar,
Valentina Zantedeschi,
David Vazquez,
Nicolas Chapados,
Christopher Pal,
Perouz Taslakian
Abstract:
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training se…
▽ More
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Authors:
João Monteiro,
Étienne Marcotte,
Pierre-André Noël,
Valentina Zantedeschi,
David Vázquez,
Nicolas Chapados,
Christopher Pal,
Perouz Taslakian
Abstract:
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right contex…
▽ More
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Authors:
Charles Guille-Escuret,
Pierre-André Noël,
Ioannis Mitliagkas,
David Vazquez,
Joao Monteiro
Abstract:
Improving the reliability of deployed machine learning systems often involves develo** methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where…
▽ More
Improving the reliability of deployed machine learning systems often involves develo** methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Flaky Performances when Pretraining on Relational Databases
Authors:
Shengchao Liu,
David Vazquez,
Jian Tang,
Pierre-André Noël
Abstract:
We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negat…
▽ More
We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negative transfer'': linear evaluation on fixed representations from a pretrained model performs worse than on representations from the randomly-initialized model. Based on the conjecture that contrastive SSL conflicts with the message passing layers of the GNN, we propose InfoNode: a contrastive loss aiming to maximize the mutual information between a node's initial- and final-layer representation. The primary empirical results support our conjecture and the effectiveness of InfoNode.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Constraining Representations Yields Models That Know What They Don't Know
Authors:
Joao Monteiro,
Pau Rodriguez,
Pierre-Andre Noel,
Issam Laradji,
David Vazquez
Abstract:
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal act…
▽ More
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal activation patterns. Specifically, we assign to each class a unique, fixed, randomly-generated binary vector - hereafter called class code - and train the model so that its cross-depths activation patterns predict the appropriate class code according to the input sample's class. The resulting predictors are dubbed Total Activation Classifiers (TAC), and TACs may either be trained from scratch, or used with negligible cost as a thin add-on on top of a frozen, pre-trained neural network. The distance between a TAC's activation pattern and the closest valid code acts as an additional confidence score, besides the default unTAC'ed prediction head's. In the add-on case, the original neural network's inference head is completely unaffected (so its accuracy remains the same) but we now have the option to use TAC's own confidence and prediction when determining which course of action to take in an hypothetical production workflow. In particular, we show that TAC strictly improves the value derived from models allowed to reject/defer. We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models.
△ Less
Submitted 19 April, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
On the Value of ML Models
Authors:
Fabio Casati,
Pierre-André Noël,
Jie Yang
Abstract:
We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications. For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what t…
▽ More
We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications. For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a ``good'' ML model.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Quantifying dynamical spillover in co-evolving multiplex networks
Authors:
Vikram S. Vijayaraghavan,
Pierre-André Noël,
Zeev Maoz,
Raissa M. D'Souza
Abstract:
Multiplex networks (a system of multiple networks that have different types of links but share a common set of nodes) arise naturally in a wide spectrum of fields. Theoretical studies show that in such multiplex networks, correlated edge dynamics between the layers can have a profound effect on dynamical processes. However, how to extract the correlations from real-world systems is an outstanding…
▽ More
Multiplex networks (a system of multiple networks that have different types of links but share a common set of nodes) arise naturally in a wide spectrum of fields. Theoretical studies show that in such multiplex networks, correlated edge dynamics between the layers can have a profound effect on dynamical processes. However, how to extract the correlations from real-world systems is an outstanding challenge. Here we provide a null model based on Markov chains to quantify correlations in edge dynamics found in longitudinal data of multiplex networks. We use this approach on two different data sets: the network of trade and alliances between nation states, and the email and co-commit networks between developers of open source software. We establish the existence of "dynamical spillover" showing the correlated formation (or deletion) of edges of different types as the system evolves. The details of the dynamics over time provide insight into potential causal pathways.
△ Less
Submitted 18 May, 2015;
originally announced May 2015.
-
Bond percolation on a class of correlated and clustered random graphs
Authors:
Antoine Allard,
Laurent Hébert-Dufresne,
Pierre-André Noël,
Vincent Marceau,
Louis J. Dubé
Abstract:
We introduce a formalism for computing bond percolation properties of a class of correlated and clustered random graphs. This class of graphs is a generalization of the Configuration Model where nodes of different types are connected via different types of hyperedges, edges that can link more than 2 nodes. We argue that the multitype approach coupled with the use of clustered hyperedges can reprod…
▽ More
We introduce a formalism for computing bond percolation properties of a class of correlated and clustered random graphs. This class of graphs is a generalization of the Configuration Model where nodes of different types are connected via different types of hyperedges, edges that can link more than 2 nodes. We argue that the multitype approach coupled with the use of clustered hyperedges can reproduce a wide spectrum of complex patterns, and thus enhances our capability to model real complex networks. As an illustration of this claim, we use our formalism to highlight unusual behaviors of the size and composition of the components (small and giant) in a synthetic, albeit realistic, social network.
△ Less
Submitted 3 September, 2012; v1 submitted 22 January, 2012;
originally announced January 2012.
-
Exact solution of bond percolation on small arbitrary graphs
Authors:
Antoine Allard,
Laurent Hébert-Dufresne,
Pierre-André Noël,
Vincent Marceau,
Louis J. Dubé
Abstract:
We introduce a set of iterative equations that exactly solves the size distribution of components on small arbitrary graphs after the random removal of edges. We also demonstrate how these equations can be used to predict the distribution of the node partitions (i.e., the constrained distribution of the size of each component) in undirected graphs. Besides opening the way to the theoretical predic…
▽ More
We introduce a set of iterative equations that exactly solves the size distribution of components on small arbitrary graphs after the random removal of edges. We also demonstrate how these equations can be used to predict the distribution of the node partitions (i.e., the constrained distribution of the size of each component) in undirected graphs. Besides opening the way to the theoretical prediction of percolation on arbitrary graphs of large but finite size, we show how our results find application in graph theory, epidemiology, percolation and fragmentation theory.
△ Less
Submitted 27 April, 2012; v1 submitted 20 January, 2012;
originally announced January 2012.
-
Modeling the dynamical interaction between epidemics on overlay networks
Authors:
Vincent Marceau,
Pierre-André Noël,
Laurent Hébert-Dufresne,
Antoine Allard,
Louis J. Dubé
Abstract:
Epidemics seldom occur as isolated phenomena. Typically, two or more viral agents spread within the same host population and may interact dynamically with each other. We present a general model where two viral agents interact via an immunity mechanism as they propagate simultaneously on two networks connecting the same set of nodes. Exploiting a correspondence between the propagation dynamics and…
▽ More
Epidemics seldom occur as isolated phenomena. Typically, two or more viral agents spread within the same host population and may interact dynamically with each other. We present a general model where two viral agents interact via an immunity mechanism as they propagate simultaneously on two networks connecting the same set of nodes. Exploiting a correspondence between the propagation dynamics and a dynamical process performing progressive network generation, we develop an analytic approach that accurately captures the dynamical interaction between epidemics on overlay networks. The formalism allows for overlay networks with arbitrary joint degree distribution and overlap. To illustrate the versatility of our approach, we consider a hypothetical delayed intervention scenario in which an immunizing agent is disseminated in a host population to hinder the propagation of an undesirable agent (e.g. the spread of preventive information in the context of an emerging infectious disease).
△ Less
Submitted 24 June, 2011; v1 submitted 21 March, 2011;
originally announced March 2011.
-
Propagation on networks: an exact alternative perspective
Authors:
Pierre-André Noël,
Antoine Allard,
Laurent Hébert-Dufresne,
Vincent Marceau,
Louis J. Dubé
Abstract:
By generating the specifics of a network structure only when needed (on-the-fly), we derive a simple stochastic process that exactly models the time evolution of susceptible-infectious dynamics on finite-size networks. The small number of dynamical variables of this birth-death Markov process greatly simplifies analytical calculations. We show how a dual analytical description, treating large scal…
▽ More
By generating the specifics of a network structure only when needed (on-the-fly), we derive a simple stochastic process that exactly models the time evolution of susceptible-infectious dynamics on finite-size networks. The small number of dynamical variables of this birth-death Markov process greatly simplifies analytical calculations. We show how a dual analytical description, treating large scale epidemics with a Gaussian approximations and small outbreaks with a branching process, provides an accurate approximation of the distribution even for rather small networks. The approach also offers important computational advantages and generalizes to a vast class of systems.
△ Less
Submitted 1 March, 2012; v1 submitted 4 February, 2011;
originally announced February 2011.