-
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Authors:
Luckeciano C. Melo,
Panagiotis Tigas,
Alessandro Abate,
Yarin Gal
Abstract:
Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the furth…
▽ More
Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Challenges and Considerations in the Evaluation of Bayesian Causal Discovery
Authors:
Amir Mohammad Karimi Mamaghan,
Panagiotis Tigas,
Karl Henrik Johansson,
Yarin Gal,
Yashas Annadani,
Stefan Bauer
Abstract:
Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Bayesian Causal Discovery (BCD) offers a principled approach to encapsulating this uncertainty. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, evaluating BCD presents…
▽ More
Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Bayesian Causal Discovery (BCD) offers a principled approach to encapsulating this uncertainty. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, evaluating BCD presents challenges due to the nature of its inferred quantity - the posterior distribution. As a result, the research community has proposed various metrics to assess the quality of the approximate posterior. However, there is, to date, no consensus on the most suitable metric(s) for evaluation. In this work, we reexamine this question by dissecting various metrics and understanding their limitations. Through extensive empirical evaluation, we find that many existing metrics fail to exhibit a strong correlation with the quality of approximation to the true posterior, especially in scenarios with low sample sizes where BCD is most desirable. We highlight the suitability (or lack thereof) of these metrics under two distinct factors: the identifiability of the underlying causal model and the quantity of available data. Both factors affect the entropy of the true posterior, indicating that the current metrics are less fitting in settings of higher entropy. Our findings underline the importance of a more nuanced evaluation of new methods by taking into account the nature of the true posterior, as well as guide and motivate the development of new evaluation procedures for this challenge.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Amortized Active Causal Induction with Deep Reinforcement Learning
Authors:
Yashas Annadani,
Panagiotis Tigas,
Stefan Bauer,
Adam Foster
Abstract:
We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close th…
▽ More
We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close the true causal graph is to a causal graph posterior inferred from the gathered data. On synthetic data and a single-cell gene expression simulator, we demonstrate empirically that the data acquired through our policy results in a better estimate of the underlying causal graph than alternative strategies. Our design policy successfully achieves amortized intervention design on the distribution of the training environment while also generalizing well to distribution shifts in test-time design environments. Further, our policy also demonstrates excellent zero-shot generalization to design environments with dimensionality higher than that during training, and to intervention types that it has not been trained on.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Differentiable Multi-Target Causal Bayesian Experimental Design
Authors:
Yashas Annadani,
Panagiotis Tigas,
Desi R. Ivanova,
Andrew Jesson,
Yarin Gal,
Adam Foster,
Stefan Bauer
Abstract:
We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting -- a critical component for causal discovery from finite data where interventions can be costly or risky. Existing methods rely on greedy approximations to construct a batch of experiments while using black-box methods to optimize over a single target-state pair…
▽ More
We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting -- a critical component for causal discovery from finite data where interventions can be costly or risky. Existing methods rely on greedy approximations to construct a batch of experiments while using black-box methods to optimize over a single target-state pair to intervene with. In this work, we completely dispose of the black-box optimization techniques and greedy heuristics and instead propose a conceptually simple end-to-end gradient-based optimization procedure to acquire a set of optimal intervention target-state pairs. Such a procedure enables parameterization of the design space to efficiently optimize over a batch of multi-target-state interventions, a setting which has hitherto not been explored due to its complexity. We demonstrate that our proposed method outperforms baselines and existing acquisition strategies in both single-target and multi-target settings across a number of synthetic datasets.
△ Less
Submitted 2 June, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Modelling non-reinforced preferences using selective attention
Authors:
Noor Sajid,
Panagiotis Tigas,
Zafeirios Fountas,
Qinghai Guo,
Alexey Zakharov,
Lancelot Da Costa
Abstract:
How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: ($i$) encoding diverse memories and ($ii$) selectively attending to these for preference formation. Our proposed \emph{no}n-\emph{re}inforced preference learning mechanism using selective attention, \textsc{Nore}, addresses both…
▽ More
How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: ($i$) encoding diverse memories and ($ii$) selectively attending to these for preference formation. Our proposed \emph{no}n-\emph{re}inforced preference learning mechanism using selective attention, \textsc{Nore}, addresses both by leveraging the agent's world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent's preferences. We validate \textsc{Nore} in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment -- and compare its behaviour to \textsc{Pepper}, a Hebbian preference learning mechanism. We demonstrate that \textsc{Nore} provides a straightforward framework to induce exploratory preferences in the absence of external signals.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Global geomagnetic perturbation forecasting using Deep Learning
Authors:
Vishal Upendran,
Panagiotis Tigas,
Banafsheh Ferdousi,
Teo Bloch,
Mark C. M. Cheung,
Siddha Ganju,
Asti Bhatt,
Ryan M. McGranaghan,
Yarin Gal
Abstract:
Geomagnetically Induced Currents (GICs) arise from spatio-temporal changes to Earth's magnetic field which arise from the interaction of the solar wind with Earth's magnetosphere, and drive catastrophic destruction to our technologically dependent society. Hence, computational models to forecast GICs globally with large forecast horizon, high spatial resolution and temporal cadence are of increasi…
▽ More
Geomagnetically Induced Currents (GICs) arise from spatio-temporal changes to Earth's magnetic field which arise from the interaction of the solar wind with Earth's magnetosphere, and drive catastrophic destruction to our technologically dependent society. Hence, computational models to forecast GICs globally with large forecast horizon, high spatial resolution and temporal cadence are of increasing importance to perform prompt necessary mitigation. Since GIC data is proprietary, the time variability of horizontal component of the magnetic field perturbation (dB/dt) is used as a proxy for GICs. In this work, we develop a fast, global dB/dt forecasting model, which forecasts 30 minutes into the future using only solar wind measurements as input. The model summarizes 2 hours of solar wind measurement using a Gated Recurrent Unit, and generates forecasts of coefficients which are folded with a spherical harmonic basis to enable global forecasts. When deployed, our model produces results in under a second, and generates global forecasts for horizontal magnetic perturbation components at 1-minute cadence. We evaluate our model across models in literature for two specific storms of 5 August 2011 and 17 March 2015, while having a self-consistent benchmark model set. Our model outperforms, or has consistent performance with state-of-the-practice high time cadence local and low time cadence global models, while also outperforming/having comparable performance with the benchmark models. Such quick inferences at high temporal cadence and arbitrary spatial resolutions may ultimately enable accurate forewarning of dB/dt for any place on Earth, resulting in precautionary measures to be taken in an informed manner.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Interventions, Where and How? Experimental Design for Causal Models at Scale
Authors:
Panagiotis Tigas,
Yashas Annadani,
Andrew Jesson,
Bernhard Schölkopf,
Yarin Gal,
Stefan Bauer
Abstract:
Causal discovery from observational and interventional data is challenging due to limited data and non-identifiability: factors that introduce uncertainty in estimating the underlying structural causal model (SCM). Selecting experiments (interventions) based on the uncertainty arising from both factors can expedite the identification of the SCM. Existing methods in experimental design for causal d…
▽ More
Causal discovery from observational and interventional data is challenging due to limited data and non-identifiability: factors that introduce uncertainty in estimating the underlying structural causal model (SCM). Selecting experiments (interventions) based on the uncertainty arising from both factors can expedite the identification of the SCM. Existing methods in experimental design for causal discovery from limited data either rely on linear assumptions for the SCM or select only the intervention target. This work incorporates recent advances in Bayesian causal discovery into the Bayesian optimal experimental design framework, allowing for active causal discovery of large, nonlinear SCMs while selecting both the interventional target and the value. We demonstrate the performance of the proposed method on synthetic graphs (Erdos-Rènyi, Scale Free) for both linear and nonlinear SCMs as well as on the \emph{in-silico} single-cell gene regulatory network dataset, DREAM.
△ Less
Submitted 21 October, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data
Authors:
Andrew Jesson,
Panagiotis Tigas,
Joost van Amersfoort,
Andreas Kirsch,
Uri Shalit,
Yarin Gal
Abstract:
Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical, or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations. However, when measuring individual outcomes is costly, as is the case of a tumor biopsy, a sample-efficient strategy…
▽ More
Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical, or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations. However, when measuring individual outcomes is costly, as is the case of a tumor biopsy, a sample-efficient strategy for acquiring each result is required. Deep Bayesian active learning provides a framework for efficient data acquisition by selecting points with high uncertainty. However, existing methods bias training data acquisition towards regions of non-overlap** support between the treated and control populations. These are not sample-efficient because the treatment effect is not identifiable in such regions. We introduce causal, Bayesian acquisition functions grounded in information theory that bias data acquisition towards regions with overlap** support to maximize sample efficiency for learning personalized treatment effects. We demonstrate the performance of the proposed acquisition strategies on synthetic and semi-synthetic datasets IHDP and CMNIST and their extensions, which aim to simulate common dataset biases and pathologies.
△ Less
Submitted 1 February, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks
Authors:
Andrey Malinin,
Neil Band,
Ganshin,
Alexander,
German Chesnokov,
Yarin Gal,
Mark J. F. Gales,
Alexey Noskov,
Andrey Ploskonosov,
Liudmila Prokhorenkova,
Ivan Provilkov,
Vatsal Raina,
Vyas Raina,
Roginskiy,
Denis,
Mariya Shmatova,
Panos Tigas,
Boris Yangel
Abstract:
There has been significant research done on develo** methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined develo** standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image class…
▽ More
There has been significant research done on develo** methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined develo** standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the Shifts Dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, "in-the-wild" distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks.
△ Less
Submitted 11 February, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Latent Map**s: Generating Open-Ended Expressive Map**s Using Variational Autoencoders
Authors:
Tim Murray-Browne,
Panagiotis Tigas
Abstract:
In many contexts, creating map**s for gestural interactions can form part of an artistic process. Creators seeking a map** that is expressive, novel, and affords them a sense of authorship may not know how to program it up in a signal processing patch. Tools like Wekinator and MIMIC allow creators to use supervised machine learning to learn map**s from example input/output pairings. However,…
▽ More
In many contexts, creating map**s for gestural interactions can form part of an artistic process. Creators seeking a map** that is expressive, novel, and affords them a sense of authorship may not know how to program it up in a signal processing patch. Tools like Wekinator and MIMIC allow creators to use supervised machine learning to learn map**s from example input/output pairings. However, a creator may know a good map** when they encounter it yet start with little sense of what the inputs or outputs should be. We call this an open-ended map** process. Addressing this need, we introduce the latent map**, which leverages the latent space of an unsupervised machine learning algorithm such as a Variational Autoencoder trained on a corpus of unlabelled gestural data from the creator. We illustrate it with Sonified Body, a system map** full-body movement to sound which we explore in a residency with three dancers.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Exploration and preference satisfaction trade-off in reward-free learning
Authors:
Noor Sajid,
Panagiotis Tigas,
Alexey Zakharov,
Zafeirios Fountas,
Karl Friston
Abstract:
Biological agents have meaningful interactions with their environment despite the absence of immediate reward signals. In such instances, the agent can learn preferred modes of behaviour that lead to predictable states -- necessary for survival. In this paper, we pursue the notion that this learnt behaviour can be a consequence of reward-free preference learning that ensures an appropriate trade-o…
▽ More
Biological agents have meaningful interactions with their environment despite the absence of immediate reward signals. In such instances, the agent can learn preferred modes of behaviour that lead to predictable states -- necessary for survival. In this paper, we pursue the notion that this learnt behaviour can be a consequence of reward-free preference learning that ensures an appropriate trade-off between exploration and preference satisfaction. For this, we introduce a model-based Bayesian agent equipped with a preference learning mechanism (pepper) using conjugate priors. These conjugate priors are used to augment the expected free energy planner for learning preferences over states (or outcomes) across time. Importantly, our approach enables the agent to learn preferences that encourage adaptive behaviour at test time. We illustrate this in the OpenAI Gym FrozenLake and the 3D mini-world environments -- with and without volatility. Given a constant environment, these agents learn confident (i.e., precise) preferences and act to satisfy them. Conversely, in a volatile setting, perpetual preference uncertainty maintains exploratory behaviour. Our experiments suggest that learnable (reward-free) preferences entail a trade-off between exploration and preference satisfaction. Pepper offers a straightforward framework suitable for designing adaptive agents when reward functions cannot be predefined as in real environments.
△ Less
Submitted 18 July, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Global Earth Magnetic Field Modeling and Forecasting with Spherical Harmonics Decomposition
Authors:
Panagiotis Tigas,
Téo Bloch,
Vishal Upendran,
Banafsheh Ferdoushi,
Mark C. M. Cheung,
Siddha Ganju,
Ryan M. McGranaghan,
Yarin Gal,
Asti Bhatt
Abstract:
Modeling and forecasting the solar wind-driven global magnetic field perturbations is an open challenge. Current approaches depend on simulations of computationally demanding models like the Magnetohydrodynamics (MHD) model or sampling spatially and temporally through sparse ground-based stations (SuperMAG). In this paper, we develop a Deep Learning model that forecasts in Spherical Harmonics spac…
▽ More
Modeling and forecasting the solar wind-driven global magnetic field perturbations is an open challenge. Current approaches depend on simulations of computationally demanding models like the Magnetohydrodynamics (MHD) model or sampling spatially and temporally through sparse ground-based stations (SuperMAG). In this paper, we develop a Deep Learning model that forecasts in Spherical Harmonics space 2, replacing reliance on MHD models and providing global coverage at one minute cadence, improving over the current state-of-the-art which relies on feature engineering. We evaluate the performance in SuperMAG dataset (improved by 14.53%) and MHD simulations (improved by 24.35%). Additionally, we evaluate the extrapolation performance of the spherical harmonics reconstruction based on sparse ground-based stations (SuperMAG), showing that spherical harmonics can reliably reconstruct the global magnetic field as evaluated on MHD simulation.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Spatial Assembly: Generative Architecture With Reinforcement Learning, Self Play and Tree Search
Authors:
Panagiotis Tigas,
Tyson Hosmer
Abstract:
With this work, we investigate the use of Reinforcement Learning (RL) for the generation of spatial assemblies, by combining ideas from Procedural Generation algorithms (Wave Function Collapse algorithm (WFC)) and RL for Game Solving. WFC is a Generative Design algorithm, inspired by Constraint Solving. In WFC, one defines a set of tiles/blocks and constraints and the algorithm generates an assemb…
▽ More
With this work, we investigate the use of Reinforcement Learning (RL) for the generation of spatial assemblies, by combining ideas from Procedural Generation algorithms (Wave Function Collapse algorithm (WFC)) and RL for Game Solving. WFC is a Generative Design algorithm, inspired by Constraint Solving. In WFC, one defines a set of tiles/blocks and constraints and the algorithm generates an assembly that satisfies these constraints. Casting the problem of generation of spatial assemblies as a Markov Decision Process whose states transitions are defined by WFC, we propose an algorithm that uses Reinforcement Learning and Self-Play to learn a policy that generates assemblies that maximize objectives set by the designer. Finally, we demonstrate the use of our Spatial Assembly algorithm in Architecture Design.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?
Authors:
Angelos Filos,
Panagiotis Tigas,
Rowan McAllister,
Nicholas Rhinehart,
Sergey Levine,
Yarin Gal
Abstract:
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment, typically leading to arbitrary deductions and poorly-informed decisions. In principle, detection of and adaptation to OOD scenes can mitigate their adverse effects. In this paper, we highlight the limitations of current approaches to novel driving scenes and propose an epistemic uncertainty-aware…
▽ More
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment, typically leading to arbitrary deductions and poorly-informed decisions. In principle, detection of and adaptation to OOD scenes can mitigate their adverse effects. In this paper, we highlight the limitations of current approaches to novel driving scenes and propose an epistemic uncertainty-aware planning method, called \emph{robust imitative planning} (RIP). Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes. If the model's uncertainty is too great to suggest a safe course of action, the model can instead query the expert driver for feedback, enabling sample-efficient online adaptation, a variant of our method we term \emph{adaptive robust imitative planning} (AdaRIP). Our methods outperform current state-of-the-art approaches in the nuScenes \emph{prediction} challenge, but since no benchmark evaluating OOD detection and adaption currently exists to assess \emph{control}, we introduce an autonomous car novel-scene benchmark, \texttt{CARNOVEL}, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts.
△ Less
Submitted 2 September, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Percival: Making In-Browser Perceptual Ad Blocking Practical With Deep Learning
Authors:
Zain ul abi Din,
Panagiotis Tigas,
Samuel T. King,
Benjamin Livshits
Abstract:
In this paper we present Percival, a browser-embedded, lightweight, deep learning-powered ad blocker. Percival embeds itself within the browser's image rendering pipeline, which makes it possible to intercept every image obtained during page execution and to perform blocking based on applying machine learning for image classification to flag potential ads. Our implementation inside both Chromium a…
▽ More
In this paper we present Percival, a browser-embedded, lightweight, deep learning-powered ad blocker. Percival embeds itself within the browser's image rendering pipeline, which makes it possible to intercept every image obtained during page execution and to perform blocking based on applying machine learning for image classification to flag potential ads. Our implementation inside both Chromium and Brave browsers shows only a minor rendering performance overhead of 4.55%, demonstrating the feasibility of deploying traditionally heavy models (i.e. deep neural networks) inside the critical path of the rendering engine of a browser. We show that our image-based ad blocker can replicate EasyList rules with an accuracy of 96.76%. To show the versatility of the Percival's approach we present case studies that demonstrate that Percival 1) does surprisingly well on ads in languages other than English; 2) Percival also performs well on blocking first-party Facebook ads, which have presented issues for other ad blockers. Percival proves that image-based perceptual ad blocking is an attractive complement to today's dominant approach of block lists
△ Less
Submitted 19 May, 2020; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Real-time jam-session support system
Authors:
Panagiotis Tigas
Abstract:
We propose a method for the problem of real time chord accompaniment of improvised music. Our implementation can learn an underlying structure of the musical performance and predict next chord. The system uses Hidden Markov Model to find the most probable chord sequence for the played melody and then a Variable Order Markov Model is used to a) learn the structure (if any) and b) predict next chord…
▽ More
We propose a method for the problem of real time chord accompaniment of improvised music. Our implementation can learn an underlying structure of the musical performance and predict next chord. The system uses Hidden Markov Model to find the most probable chord sequence for the played melody and then a Variable Order Markov Model is used to a) learn the structure (if any) and b) predict next chord. We implemented our system in Java and MAX/Msp and compared and evaluated using objective (prediction accuracy) and subjective (questionnaire) evaluation methods.
△ Less
Submitted 27 January, 2012;
originally announced January 2012.