Skip to main content

Showing 1–11 of 11 results for author: Plappert, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2110.14168  [pdf, other

    cs.LG cs.CL

    Training Verifiers to Solve Math Word Problems

    Authors: Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

    Abstract: State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high tes… ▽ More

    Submitted 17 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

  2. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  3. arXiv:2101.04882  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Asymmetric self-play for automatic goal discovery in robotic manipulation

    Authors: OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique P. d. O. Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba

    Abstract: We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without an… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: Videos are shown at https://robotics-self-play.github.io

  4. arXiv:2009.12864  [pdf, other

    cs.LG cs.AI cs.RO

    Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models

    Authors: Lei M. Zhang, Matthias Plappert, Wojciech Zaremba

    Abstract: We propose a method to predict the sim-to-real transfer performance of RL policies. Our transfer metric simplifies the selection of training setups (such as algorithm, hyperparameters, randomizations) and policies in simulation, without the need for extensive and time-consuming real-world rollouts. A probabilistic dynamics model is trained alongside the policy and evaluated on a fixed set of real-… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

  5. arXiv:1910.07113  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Solving Rubik's Cube with a Robot Hand

    Authors: OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang

    Abstract: We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing di… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  6. arXiv:1808.00177  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Learning Dexterous In-Hand Manipulation

    Authors: OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

    Abstract: We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite… ▽ More

    Submitted 18 January, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Making OpenAI the first author. We wish this paper to be cited as "Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating the approach from the physics community: arXiv:1812.06489

  7. arXiv:1802.09464  [pdf, other

    cs.LG cs.AI cs.RO

    Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

    Authors: Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Peter Welinder, Vikash Kumar, Wojciech Zaremba

    Abstract: The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Mu… ▽ More

    Submitted 10 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

  8. arXiv:1706.01905  [pdf, other

    cs.LG cs.AI cs.NE cs.RO stat.ML

    Parameter Space Noise for Exploration

    Authors: Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz

    Abstract: Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and requir… ▽ More

    Submitted 31 January, 2018; v1 submitted 6 June, 2017; originally announced June 2017.

    Comments: Updated to camera-ready ICLR submission

  9. arXiv:1705.06400  [pdf, other

    cs.LG cs.CL cs.RO stat.ML

    Learning a bidirectional map** between human whole-body motion and natural language using deep recurrent neural networks

    Authors: Matthias Plappert, Christian Mandery, Tamim Asfour

    Abstract: Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion pri… ▽ More

    Submitted 2 August, 2018; v1 submitted 17 May, 2017; originally announced May 2017.

  10. arXiv:1607.03827  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    The KIT Motion-Language Dataset

    Authors: Matthias Plappert, Christian Mandery, Tamim Asfour

    Abstract: Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore… ▽ More

    Submitted 9 August, 2018; v1 submitted 13 July, 2016; originally announced July 2016.

    Comments: 5 figures, 4 tables, submitted to Big Data journal, Special Issue on Robotics

  11. arXiv:1605.01569  [pdf, other

    cs.LG cs.CV

    Classification of Human Whole-Body Motion using Hidden Markov Models

    Authors: Matthias Plappert

    Abstract: Human motion plays an important role in many fields. Large databases exist that store and make available recordings of human motions. However, annotating each motion with multiple labels is a cumbersome and error-prone process. This bachelor's thesis presents different approaches to solve the multi-label classification problem using Hidden Markov Models (HMMs). First, different features that can b… ▽ More

    Submitted 5 May, 2016; originally announced May 2016.