Skip to main content

Showing 1–50 of 63 results for author: Grefenstette, E

.
  1. arXiv:2402.06782  [pdf, other

    cs.AI cs.CL

    Debating with More Persuasive LLMs Leads to More Truthful Answers

    Authors: Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez

    Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this… ▽ More

    Submitted 30 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: For code please check: https://github.com/ucl-dark/llm_debate

  2. arXiv:2312.12568  [pdf, other

    cs.AI

    Scaling Opponent Sha** to High Dimensional Games

    Authors: Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes. To address this issue, opponent sha** (OS) methods explicitly learn to influence the learning dynamics of co-players and empirically lead to improved individual and collective outcomes. However, OS methods have only been evaluated in low-dimensional environments du… ▽ More

    Submitted 10 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  3. arXiv:2312.12564  [pdf, other

    cs.LG cs.GT cs.MA

    Leading the Pack: N-player Opponent Sha**

    Authors: Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

    Abstract: Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Sha** (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world inv… ▽ More

    Submitted 26 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

  4. arXiv:2312.02682  [pdf, other

    cs.LG cs.AI cs.RO

    H-GAP: Humanoid Control with a Generalist Planner

    Authors: Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian

    Abstract: Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 18 pages including appendix, 4 figures

  5. arXiv:2311.12786  [pdf, other

    cs.LG

    Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

    Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

    Abstract: Fine-tuning large pre-trained models has become the de facto strategy for develo** both task-specific and general-purpose machine learning systems, including develo** models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely nov… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  6. arXiv:2311.12716  [pdf, other

    cs.LG cs.AI

    minimax: Efficient Baselines for Autocurricula in JAX

    Authors: Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel

    Abstract: Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obst… ▽ More

    Submitted 23 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Presented at ALOE 2023

  7. arXiv:2310.06452  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding the Effects of RLHF on LLM Generalisation and Diversity

    Authors: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

    Abstract: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has been significant work develo** these methods, our understanding of the benefits and downsides of each stage in RLHF is still limited. To fill this gap, we present an ext… ▽ More

    Submitted 19 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Code available here: https://github.com/facebookresearch/rlfh-gen-div

  8. arXiv:2303.17396  [pdf, other

    cs.LG

    Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

    Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth

    Abstract: Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online perfor… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: An abstract of this paper was accepted at RLDM 2022

  9. arXiv:2303.13971  [pdf, other

    cs.LG

    Optimal Transport for Offline Imitation Learning

    Authors: Yicheng Luo, Zhengyao Jiang, Samuel Cohen, Edward Grefenstette, Marc Peter Deisenroth

    Abstract: With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this pa… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Published in ICLR 2023

  10. arXiv:2211.07819  [pdf, other

    cs.AI cs.LG

    General Intelligence Requires Rethinking Exploration

    Authors: Minqi Jiang, Tim Rocktäschel, Edward Grefenstette

    Abstract: We are at the cusp of a transition from "learning from data" to "learning what data to learn from" as a central focus of artificial intelligence (AI) research. While the first-order learning problem is not completely solved, large models under unified architectures, such as transformers, have shifted the learning bottleneck from how to effectively train our models to how to effectively acquire and… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  11. arXiv:2210.14986  [pdf, other

    cs.CL

    The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

    Authors: Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

    Abstract: Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context -- incorporating its pragmatics. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meani… ▽ More

    Submitted 3 December, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted as Spotlight at NeurIPS 2023

  12. arXiv:2210.12719  [pdf, other

    cs.LG cs.AI

    Learning General World Models in a Handful of Reward-Free Deployments

    Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette

    Abstract: Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we i… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: To be published at NeurIPS 2022. Code and videos available at https://ycxuyingchen.github.io/cascade/

  13. arXiv:2210.00066  [pdf, other

    cs.LG cs.AI cs.CL

    Improving Policy Learning via Language Dynamics Distillation

    Authors: Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel

    Abstract: Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language d… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022. 16 pages, 12 figures

  14. arXiv:2208.10291  [pdf, other

    cs.LG

    Efficient Planning in a Compact Latent Action Space

    Authors: Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian

    Abstract: Planning-based reinforcement learning has shown strong performance in tasks in discrete and low-dimensional continuous action spaces. However, planning usually brings significant computational overhead for decision-making, and scaling such methods to high-dimensional action spaces remains challenging. To advance efficient planning for high-dimensional continuous control, we propose Trajectory Auto… ▽ More

    Submitted 24 January, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Accepted by ICLR2023. Code available at https://github.com/ZhengyaoJiang/latentplan

  15. arXiv:2207.11584  [pdf, other

    cs.LG cs.AI

    Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

    Authors: Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel

    Abstract: Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated in… ▽ More

    Submitted 15 August, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

    Comments: 19 pages, 12 figures, to be published in the Conference on Lifelong Learning Agents 2022

  16. arXiv:2207.05219  [pdf, other

    cs.LG cs.AI stat.ML

    Grounding Aleatoric Uncertainty for Unsupervised Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or… ▽ More

    Submitted 24 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022

  17. arXiv:2205.15824  [pdf, other

    cs.LG

    Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

    Authors: Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette

    Abstract: The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  18. arXiv:2203.11889  [pdf, other

    cs.LG cs.AI cs.NE cs.SC stat.ML

    Insights From the NeurIPS 2021 NetHack Challenge

    Authors: Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Dae** Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel , et al. (4 additional authors not shown)

    Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with develo** a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices

  19. arXiv:2203.01302  [pdf, other

    cs.LG

    Evolving Curricula with Regret-Based Environment Design

    Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the stude… ▽ More

    Submitted 30 September, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: First two authors contributed equally

  20. arXiv:2202.08938  [pdf, other

    cs.LG cs.AI cs.CL

    Improving Intrinsic Exploration with Language Abstractions

    Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

    Abstract: Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural la… ▽ More

    Submitted 21 November, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  21. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

    Authors: Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel

    Abstract: The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpred… ▽ More

    Submitted 19 January, 2023; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: JAIR version. Added formal definitions of ZSPT and related concepts, JAIR formatting, other small rewrites; https://www.jair.org/index.php/jair/article/view/14174

    Journal ref: Journal of Artificial Intelligence Research (JAIR), 76:201-264, 2023

  22. arXiv:2110.02439  [pdf, other

    cs.LG cs.AI

    Replay-Guided Adversarial Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emer… ▽ More

    Submitted 13 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  23. arXiv:2109.13202  [pdf, other

    cs.LG stat.ML

    MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

    Authors: Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in deep reinforcement learning (RL) is heavily driven by the availability of challenging benchmarks used for training agents. However, benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods. While there exist environments for assessing particular open problems in RL (such as exploration, transfer learning, unsuper… ▽ More

    Submitted 16 November, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2021: Datasets and Benchmarks Track

  24. arXiv:2010.03934  [pdf, other

    cs.LG cs.AI

    Prioritized Level Replay

    Authors: Minqi Jiang, Edward Grefenstette, Tim Rocktäschel

    Abstract: Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned… ▽ More

    Submitted 12 June, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  25. arXiv:2007.06477  [pdf, other

    cs.AI cs.CL cs.LG cs.NE cs.SC

    Learning Reasoning Strategies in End-to-End Differentiable Proving

    Authors: Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel

    Abstract: Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs). These neuro-symbolic models can induce interpretable rules and learn representations from data via back-propagation, while providing logical explanations for their predictions. However, they are restri… ▽ More

    Submitted 24 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Proceedings of the 37th International Conference on Machine Learning (ICML 2020)

  26. arXiv:2006.13760  [pdf, other

    cs.LG cs.AI cs.CL cs.NE stat.ML

    The NetHack Learning Environment

    Authors: Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging… ▽ More

    Submitted 1 December, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 28 pages. Accepted at NeurIPS 2020

  27. arXiv:2006.12122  [pdf, other

    cs.LG cs.AI stat.ML

    Learning with AMIGo: Adversarially Motivated Intrinsic Goals

    Authors: Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette

    Abstract: A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated… ▽ More

    Submitted 23 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 18 pages, 6 figures, published at The Ninth International Conference on Learning Representations (2021)

  28. arXiv:1912.10824  [pdf, other

    cs.LG cs.CL cs.LO

    Differentiable Reasoning on Large Knowledge Bases and Natural Language

    Authors: Pasquale Minervini, Matko Bošnjak, Tim Rocktäschel, Sebastian Riedel, Edward Grefenstette

    Abstract: Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: Accepted at the 34th AAAI Conference on Artificial Intelligence (AAAI-20)

  29. arXiv:1910.08210  [pdf, other

    cs.CL cs.AI cs.LG

    RTFM: Generalising to Novel Environment Dynamics via Reading

    Authors: Victor Zhong, Tim Rocktäschel, Edward Grefenstette

    Abstract: Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dy… ▽ More

    Submitted 1 February, 2021; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: ICLR 2020; 17 pages, 13 figures

  30. arXiv:1910.03552  [pdf, other

    cs.LG stat.ML

    TorchBeast: A PyTorch Platform for Distributed RL

    Authors: Heinrich Küttler, Nantas Nardelli, Thibaut Lavril, Marco Selvatici, Viswanath Sivakumar, Tim Rocktäschel, Edward Grefenstette

    Abstract: TorchBeast is a platform for reinforcement learning (RL) research in PyTorch. It implements a version of the popular IMPALA algorithm for fast, asynchronous, parallel training of RL agents. Additionally, TorchBeast has simplicity as an explicit design goal: We provide both a pure-Python implementation ("MonoBeast") as well as a multi-machine high-performance version ("PolyBeast"). In the latter, p… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

  31. arXiv:1910.01727  [pdf, other

    cs.LG stat.ML

    Generalized Inner Loop Meta-Learning

    Authors: Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala

    Abstract: Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem. In this paper, we give a formalization of this shared pattern, which we call GIMLI, prove its general requirements, and derive a general-purpose algorithm for implementing similar approaches. Based on this… ▽ More

    Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 17 pages, 3 figures, 1 algorithm

  32. arXiv:1906.05374  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Meta-Learning via Learned Loss

    Authors: Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, Franziska Meier

    Abstract: Typically, loss functions, regularization mechanisms and other important aspects of training parametric models are chosen heuristically from a limited set of options. In this paper, we take the first step towards automating this process, with the view of producing models which train faster and more robustly. Concretely, we present a meta-learning method for learning parametric loss functions that… ▽ More

    Submitted 19 January, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Project website with code and video at https://sites.google.com/view/mlthree

  33. arXiv:1906.03926  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    A Survey of Reinforcement Learning Informed by Natural Language

    Authors: Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

    Abstract: To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making pr… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Published at IJCAI'19

  34. arXiv:1904.12004  [pdf, other

    cs.LG cs.AI stat.ML

    Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications

    Authors: Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli

    Abstract: Models such as Sequence-to-Sequence and Image-to-Sequence are widely used in real world applications. While the ability of these neural architectures to produce variable-length outputs makes them extremely effective for problems like Machine Translation and Image Captioning, it also leaves them vulnerable to failures of the form where the model produces outputs of undesirable length. This behavior… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

  35. arXiv:1904.01557  [pdf, other

    cs.LG stat.ML

    Analysing Mathematical Reasoning Abilities of Neural Models

    Authors: David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli

    Abstract: Mathematical reasoning---a core ability within human intelligence---presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventuall… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  36. arXiv:1812.01483  [pdf, other

    stat.ML cs.LG

    CompILE: Compositional Imitation Learning and Execution

    Authors: Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia

    Abstract: We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our… ▽ More

    Submitted 14 May, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: ICML (2019)

  37. arXiv:1811.09300  [pdf, other

    cs.NE cs.CR cs.LG

    Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

    Authors: Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. This makes these models particularly unsuitable for safety-critical application domains (e.g. self-driving… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 12 pages

  38. arXiv:1806.01946  [pdf, other

    cs.AI cs.LG

    Learning to Understand Goal Specifications by Modelling Reward

    Authors: Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

    Abstract: Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we prese… ▽ More

    Submitted 23 December, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: 19 pages, 9 figures

  39. arXiv:1802.08535  [pdf, other

    cs.NE cs.AI

    Can Neural Networks Understand Logical Entailment?

    Authors: Richard Evans, David Saxton, David Amos, Pushmeet Kohli, Edward Grefenstette

    Abstract: We introduce a new dataset of logical entailments for the purpose of measuring models' ability to capture and exploit the structure of logical expressions against an entailment prediction task. We use this task to compare a series of architectures which are ubiquitous in the sequence-processing literature, in addition to a new model class---PossibleWorldNets---which computes entailment as a "convo… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Comments: Published at ICLR 2018 (main conference)

  40. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  41. arXiv:1711.04574  [pdf, other

    cs.NE math.LO

    Learning Explanatory Rules from Noisy Data

    Authors: Richard Evans, Edward Grefenstette

    Abstract: Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of t… ▽ More

    Submitted 25 January, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: 64 pages, to appear in Journal of Artificial Intelligence Research (Special Track on Deep Learning, Knowledge Representation, and Reasoning)

  42. arXiv:1706.00359  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Discovering Discrete Latent Topics with Neural Variational Inference

    Authors: Yishu Miao, Edward Grefenstette, Phil Blunsom

    Abstract: Topic models have been widely explored as probabilistic generative models of documents. Traditional inference methods have sought closed-form derivations for updating the models, however as the expressiveness of these models grows, so does the difficulty of performing fast and accurate inference over their parameters. This paper presents alternative neural approaches to topic modelling by providin… ▽ More

    Submitted 21 May, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: ICML 2017

  43. arXiv:1706.00286  [pdf, other

    cs.LG cs.CL

    Learning to Compute Word Embeddings On the Fly

    Authors: Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

    Abstract: Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words… ▽ More

    Submitted 7 March, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

  44. arXiv:1611.09100  [pdf, other

    cs.CL

    Learning to Compose Words into Sentences with Reinforcement Learning

    Authors: Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

    Abstract: We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experim… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

  45. arXiv:1611.02554  [pdf, ps, other

    cs.CL cs.AI cs.NE

    The Neural Noisy Channel

    Authors: Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky

    Abstract: We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during training, noisy channel models must produce outputs that explain their inputs, and their component models can be trained with not only paired training samples but… ▽ More

    Submitted 6 March, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: ICLR 2017

  46. arXiv:1609.09315  [pdf, other

    cs.CL cs.AI cs.NE

    Semantic Parsing with Semi-Supervised Sequential Autoencoders

    Authors: Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

    Abstract: We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

  47. arXiv:1603.06744  [pdf, other

    cs.CL cs.NE

    Latent Predictor Networks for Code Generation

    Authors: Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom

    Abstract: Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be… ▽ More

    Submitted 8 June, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

  48. arXiv:1509.06664  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Reasoning about Entailment with Neural Attention

    Authors: Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

    Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entai… ▽ More

    Submitted 1 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

    Comments: ICLR 2016 camera-ready, 9 pages, 10 figures (incl. subfigures)

    MSC Class: 68T50 ACM Class: I.2.6; I.2.7

  49. arXiv:1506.03340  [pdf, other

    cs.CL cs.AI cs.NE

    Teaching Machines to Read and Comprehend

    Authors: Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

    Abstract: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides la… ▽ More

    Submitted 19 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 14 pages, 13 figures

  50. arXiv:1506.02516  [pdf, other

    cs.NE cs.CL cs.LG

    Learning to Transduce with Unbounded Memory

    Authors: Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom

    Abstract: Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networ… ▽ More

    Submitted 3 November, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: 14 pages, 4 figures, NIPS 2015

    MSC Class: 68T05 ACM Class: I.5.1; I.2.6; I.2.7