Skip to main content

Showing 1–8 of 8 results for author: Luketina, J

.
  1. arXiv:2310.06452  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding the Effects of RLHF on LLM Generalisation and Diversity

    Authors: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

    Abstract: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has been significant work develo** these methods, our understanding of the benefits and downsides of each stage in RLHF is still limited. To fill this gap, we present an ext… ▽ More

    Submitted 19 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Code available here: https://github.com/facebookresearch/rlfh-gen-div

  2. arXiv:2209.06159  [pdf, other

    cs.LG

    Meta-Gradients in Non-Stationary Environments

    Authors: Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

    Abstract: Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we as… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 16 pages, 9 figures, CoLLAs 2022

  3. arXiv:2007.09185  [pdf, other

    cs.AI cs.CL cs.LG

    WordCraft: An Environment for Benchmarking Commonsense Agents

    Authors: Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

    Abstract: The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and p… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  4. arXiv:2006.05826  [pdf, other

    cs.LG cs.AI stat.ML

    Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

    Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

    Abstract: Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exh… ▽ More

    Submitted 22 September, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

  5. arXiv:1906.03926  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    A Survey of Reinforcement Learning Informed by Natural Language

    Authors: Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

    Abstract: To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making pr… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Published at IJCAI'19

  6. arXiv:1805.06370  [pdf, other

    stat.ML cs.LG

    Progress & Compress: A scalable framework for continual learning

    Authors: Jonathan Schwarz, Jelena Luketina, Wojciech M. Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell

    Abstract: We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of… ▽ More

    Submitted 2 July, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: Accepted at ICML 2018

  7. arXiv:1703.10987  [pdf, other

    cs.CY physics.pop-ph

    On the Impossibility of Supersized Machines

    Authors: Ben Garfinkel, Miles Brundage, Daniel Filan, Carrick Flynn, Jelena Luketina, Michael Page, Anders Sandberg, Andrew Snyder-Beattie, Max Tegmark

    Abstract: In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a significant margin. However, there are at least seven distinct arguments that preclude this outcome. We show t… ▽ More

    Submitted 31 March, 2017; originally announced March 2017.

    Comments: 9 pages, 2 figures

  8. arXiv:1511.06727  [pdf, other

    cs.LG

    Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters

    Authors: Jelena Luketina, Mathias Berglund, Klaus Greff, Tapani Raiko

    Abstract: Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach… ▽ More

    Submitted 17 June, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: 9 pages, 7 figures. Accepted at ICML 2016