-
Investigating the role of model-based learning in exploration and transfer
Authors:
Jacob Walker,
Eszter Vértes,
Yazhe Li,
Gabriel Dulac-Arnold,
Ankesh Anand,
Théophane Weber,
Jessica B. Hamrick
Abstract:
State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribu…
▽ More
State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand when exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Authors:
Jannik Kossen,
Cătălina Cangea,
Eszter Vértes,
Andrew Jaegle,
Viorica Patraucean,
Ira Ktena,
Nenad Tomasev,
Danielle Belgrave
Abstract:
We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at significant cost. With A2MT, we aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performan…
▽ More
We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at significant cost. With A2MT, we aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performance. A2MT extends a previous task called active feature acquisition to temporal decision making about high-dimensional inputs. We propose a method based on the Perceiver IO architecture to address A2MT in practice. Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and AudioSet, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models. Applications of A2MT may be impactful in domains like medicine, robotics, or finance, where modalities differ in acquisition cost and informativeness.
△ Less
Submitted 3 July, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Authors:
Angelos Filos,
Eszter Vértes,
Zita Marinho,
Gregory Farquhar,
Diana Borsa,
Abram Friesen,
Feryal Behbahani,
Tom Schaul,
André Barreto,
Simon Osindero
Abstract:
Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrap** with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates c…
▽ More
Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrap** with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent's epistemic uncertainty; we term this signal \emph{model-value inconsistency} or \emph{self-inconsistency} for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.
△ Less
Submitted 29 June, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Procedural Generalization by Planning with Self-Supervised World Models
Authors:
Ankesh Anand,
Jacob Walker,
Yazhe Li,
Eszter Vértes,
Julian Schrittwieser,
Sherjil Ozair,
Théophane Weber,
Jessica B. Hamrick
Abstract:
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ab…
▽ More
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero (Schrittwieser et al., 2020), a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify three factors of procedural generalization -- planning, self-supervised representation learning, and procedural data diversity -- and show that by combining these techniques, we achieve state-of-the art generalization performance and data efficiency on Procgen (Cobbe et al., 2019). However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World (Yu et al., 2019), indicating that transfer remains a challenge and may require different approaches than procedural generalization. Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
A neurally plausible model learns successor representations in partially observable environments
Authors:
Eszter Vertes,
Maneesh Sahani
Abstract:
Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-g…
▽ More
Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-ground between model-based and model-free reinforcement learning strategies, allowing for fast value computation and rapid adaptation to changes in the reward function or goal locations. Indeed, recent studies suggest that features of neural responses are consistent with the SR framework. However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
Flexible and accurate inference and learning for deep generative models
Authors:
Eszter Vertes,
Maneesh Sahani
Abstract:
We introduce a new approach to learning in hierarchical latent-variable generative models called the "distributed distributional code Helmholtz machine", which emphasises flexibility and accuracy in the inferential process. In common with the original Helmholtz machine and later variational autoencoder algorithms (but unlike adverserial methods) our approach learns an explicit inference or "recogn…
▽ More
We introduce a new approach to learning in hierarchical latent-variable generative models called the "distributed distributional code Helmholtz machine", which emphasises flexibility and accuracy in the inferential process. In common with the original Helmholtz machine and later variational autoencoder algorithms (but unlike adverserial methods) our approach learns an explicit inference or "recognition" model to approximate the posterior distribution over the latent variables. Unlike in these earlier methods, the posterior representation is not limited to a narrow tractable parameterised form (nor is it represented by samples). To train the generative and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz Machine. This makes it possible to learn hierarchical latent models with both discrete and continuous variables, where an accurate posterior representation is essential. We demonstrate that the new algorithm outperforms current state-of-the-art methods on synthetic, natural image patch and the MNIST data sets.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Versatility of nodal affiliation to communities
Authors:
Maxwell Shinn,
Rafael Romero-Garcia,
Jakob Seidlitz,
František Váša,
Petra E. Vértes,
Edward Bullmore
Abstract:
Graph theoretical analysis of the community structure of networks attempts to identify the communities (or modules) to which each node affiliates. However, this is in most cases an ill-posed problem, as the affiliation of a node to a single community is often ambiguous. Previous solutions have attempted to identify all of the communities to which each node affiliates. Instead of taking this approa…
▽ More
Graph theoretical analysis of the community structure of networks attempts to identify the communities (or modules) to which each node affiliates. However, this is in most cases an ill-posed problem, as the affiliation of a node to a single community is often ambiguous. Previous solutions have attempted to identify all of the communities to which each node affiliates. Instead of taking this approach, we introduce versatility, $V$, as a novel metric of nodal affiliation: $V \sim 0$ means that a node is consistently assigned to a specific community; $V \gg 0$ means it is inconsistently assigned to different communities. Versatility works in conjunction with existing community detection algorithms, and it satisfies many theoretically desirable properties in idealised networks designed to maximise ambiguity of modular decomposition. The local minima of global mean versatility identified the resolution parameters of a hierarchical community detection algorithm that least ambiguously decomposed the community structure of a social (karate club) network and the mouse brain connectome. Our results suggest that nodal versatility is useful in quantifying the inherent ambiguity of modular decomposition.
△ Less
Submitted 2 January, 2017;
originally announced January 2017.