Skip to main content

Showing 1–14 of 14 results for author: Konyushkova, K

.
  1. arXiv:2308.08998  [pdf, other

    cs.CL cs.LG

    Reinforced Self-Training (ReST) for Language Modeling

    Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas

    Abstract: Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating sampl… ▽ More

    Submitted 21 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 23 pages, 16 figures

  2. arXiv:2306.09800  [pdf, other

    cs.LG cs.RO

    $\pi2\text{vec}$: Policy Representations with Successor Features

    Authors: Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Tom Le Paine, Yutian Chen, Misha Denil

    Abstract: This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted paper at ICLR2024

  3. arXiv:2303.07280  [pdf, other

    cs.CV cs.AI cs.LG

    Vision-Language Models as Success Detectors

    Authors: Yuqing Du, Ksenia Konyushkova, Misha Denil, Akhil Raju, Jessica Landon, Felix Hill, Nando de Freitas, Serkan Cabi

    Abstract: Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on develo** robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success det… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  4. arXiv:2202.08417  [pdf, other

    cs.LG

    Retrieval-Augmented Reinforcement Learning

    Authors: Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent Sifre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

    Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the… ▽ More

    Submitted 24 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  5. arXiv:2106.10251  [pdf, other

    cs.LG cs.AI stat.ML

    Active Offline Policy Selection

    Authors: Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

    Abstract: This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of polici… ▽ More

    Submitted 6 May, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Presented at NeurIPS 2021

  6. arXiv:2012.06899  [pdf, other

    cs.LG cs.AI cs.RO

    Semi-supervised reward learning for offline reinforcement learning

    Authors: Ksenia Konyushkova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, Nando de Freitas

    Abstract: In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engi… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

    Comments: Accepted to Offline Reinforcement Learning Workshop at Neural Information Processing Systems (2020)

  7. arXiv:2011.13885  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Offline Learning from Demonstrations and Unlabeled Experience

    Authors: Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

    Abstract: Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: Accepted to Offline Reinforcement Learning Workshop at Neural Information Processing Systems (2020)

  8. arXiv:1909.12200  [pdf, other

    cs.RO cs.LG

    Scaling data-driven robotics with reward sketching and batch reinforcement learning

    Authors: Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyu Wang

    Abstract: We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human… ▽ More

    Submitted 4 June, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Project website: https://sites.google.com/view/data-driven-robotics/

    Journal ref: Robotics: Science and Systems Conference 2020

  9. arXiv:1810.04114  [pdf, other

    cs.LG stat.ML

    Discovering General-Purpose Active Learning Strategies

    Authors: Ksenia Konyushkova, Raphael Sznitman, Pascal Fua

    Abstract: We propose a general-purpose approach to discovering active learning (AL) strategies from data. These strategies are transferable from one domain to another and can be used in conjunction with many machine learning models. To this end, we formalize the annotation process as a Markov decision process, design universal state and action spaces and introduce a new reward function that precisely model… ▽ More

    Submitted 2 April, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

  10. arXiv:1712.08087  [pdf, other

    cs.CV

    Learning Intelligent Dialogs for Bounding Box Annotation

    Authors: Ksenia Konyushkova, Jasper Uijlings, Christoph Lampert, Vittorio Ferrari

    Abstract: We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification, where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one b… ▽ More

    Submitted 20 November, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: This paper appeared at CVPR 2018

  11. arXiv:1703.03365  [pdf, other

    cs.LG

    Learning Active Learning from Data

    Authors: Ksenia Konyushkova, Raphael Sznitman, Pascal Fua

    Abstract: In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from… ▽ More

    Submitted 14 July, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

  12. Geometry in Active Learning for Binary and Multi-class Image Segmentation

    Authors: Ksenia Konyushkova, Raphael Sznitman, Pascal Fua

    Abstract: We propose an active learning approach to image segmentation that exploits geometric priors to speed up and streamline the annotation process. It can be applied for both background-foreground and multi-class segmentation tasks in 2D images and 3D image volumes. Our approach combines geometric smoothness priors in the image space with more traditional uncertainty measures to estimate which pixels o… ▽ More

    Submitted 4 April, 2019; v1 submitted 29 June, 2016; originally announced June 2016.

    Comments: Extension of our previous paper arXiv:1508.04955

    Journal ref: Published in "Computer Vision and Image Understanding" journal, 1077-3142, 2019

  13. arXiv:1511.03466  [pdf, other

    cs.CV

    God(s) Know(s): Developmental and Cross-Cultural Patterns in Children Drawings

    Authors: Ksenia Konyushkova, Nikolaos Arvanitopoulos, Zhargalma Dandarova Robert, Pierre-Yves Brandt, Sabine Süsstrunk

    Abstract: This paper introduces a novel approach to data analysis designed for the needs of specialists in psychology of religion. We detect developmental and cross-cultural patterns in children's drawings of God(s) and other supernatural agents. We develop methods to objectively evaluate our empirical observations of the drawings with respect to: (1) the gravity center, (2) the average intensities of the c… ▽ More

    Submitted 8 February, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

  14. arXiv:1508.04955  [pdf, other

    cs.CV

    Introducing Geometry in Active Learning for Image Segmentation

    Authors: Ksenia Konyushkova, Raphael Sznitman, Pascal Fua

    Abstract: We propose an Active Learning approach to training a segmentation classifier that exploits geometric priors to streamline the annotation process in 3D image volumes. To this end, we use these priors not only to select voxels most in need of annotation but to guarantee that they lie on 2D planar patch, which makes it much easier to annotate than if they were randomly distributed in the volume. A si… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.