Skip to main content

Showing 1–31 of 31 results for author: Zaremba, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  2. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  3. arXiv:2106.00958  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    A Generalizable Approach to Learning Optimizers

    Authors: Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba

    Abstract: A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function. This system outperforms Adam at all neural network tasks incl… ▽ More

    Submitted 7 June, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

  4. arXiv:2101.04882  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Asymmetric self-play for automatic goal discovery in robotic manipulation

    Authors: OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique P. d. O. Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba

    Abstract: We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without an… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: Videos are shown at https://robotics-self-play.github.io

  5. arXiv:2009.12864  [pdf, other

    cs.LG cs.AI cs.RO

    Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models

    Authors: Lei M. Zhang, Matthias Plappert, Wojciech Zaremba

    Abstract: We propose a method to predict the sim-to-real transfer performance of RL policies. Our transfer metric simplifies the selection of training setups (such as algorithm, hyperparameters, randomizations) and policies in simulation, without the need for extensive and time-consuming real-world rollouts. A probabilistic dynamics model is trained alongside the policy and evaluated on a fixed set of real-… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

  6. arXiv:1910.07113  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Solving Rubik's Cube with a Robot Hand

    Authors: OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang

    Abstract: We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing di… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  7. arXiv:1808.00177  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Learning Dexterous In-Hand Manipulation

    Authors: OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

    Abstract: We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite… ▽ More

    Submitted 18 January, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Making OpenAI the first author. We wish this paper to be cited as "Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating the approach from the physics community: arXiv:1812.06489

  8. arXiv:1802.09464  [pdf, other

    cs.LG cs.AI cs.RO

    Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

    Authors: Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Peter Welinder, Vikash Kumar, Wojciech Zaremba

    Abstract: The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Mu… ▽ More

    Submitted 10 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

  9. arXiv:1710.06542  [pdf, other

    cs.RO cs.AI cs.LG

    Asymmetric Actor Critic for Image-Based Robot Learning

    Authors: Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, Pieter Abbeel

    Abstract: Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring p… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

    Comments: Videos of experiments can be found at http://www.goo.gl/b57WTs

  10. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

    Authors: Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel

    Abstract: Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts… ▽ More

    Submitted 2 March, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

  11. arXiv:1710.06425  [pdf, other

    cs.RO cs.LG

    Domain Randomization and Generative Models for Robotic Gras**

    Authors: Joshua Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Jonas Schneider, Peter Welinder, Wojciech Zaremba, Pieter Abbeel

    Abstract: Deep learning-based robotic gras** has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to pe… ▽ More

    Submitted 3 April, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

  12. arXiv:1709.10089  [pdf, other

    cs.LG cs.AI cs.NE cs.RO

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Authors: Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel

    Abstract: Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually sha** a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out… ▽ More

    Submitted 25 February, 2018; v1 submitted 28 September, 2017; originally announced September 2017.

    Comments: 8 pages, ICRA 2018

  13. arXiv:1707.01495  [pdf, other

    cs.LG cs.AI cs.NE cs.RO

    Hindsight Experience Replay

    Authors: Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba

    Abstract: Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit… ▽ More

    Submitted 23 February, 2018; v1 submitted 5 July, 2017; originally announced July 2017.

  14. arXiv:1703.07326  [pdf, other

    cs.AI cs.LG cs.NE cs.RO

    One-Shot Imitation Learning

    Authors: Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba

    Abstract: Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineer… ▽ More

    Submitted 4 December, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

  15. arXiv:1703.06907  [pdf, other

    cs.RO cs.LG

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

    Authors: Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel

    Abstract: Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear… ▽ More

    Submitted 20 March, 2017; originally announced March 2017.

    Comments: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)

  16. arXiv:1611.00736  [pdf, other

    cs.NE cs.AI

    Extensions and Limitations of the Neural GPU

    Authors: Eric Price, Wojciech Zaremba, Ilya Sutskever

    Abstract: The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. The latter requires a memory efficient implementation, as a naive im… ▽ More

    Submitted 4 November, 2016; v1 submitted 2 November, 2016; originally announced November 2016.

  17. arXiv:1610.03518  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

    Authors: Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba

    Abstract: Develo** control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevert… ▽ More

    Submitted 11 October, 2016; originally announced October 2016.

  18. arXiv:1606.03498  [pdf, other

    cs.LG cs.CV cs.NE

    Improved Techniques for Training GANs

    Authors: Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen

    Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, n… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  19. arXiv:1606.01540  [pdf, other

    cs.LG cs.AI

    OpenAI Gym

    Authors: Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba

    Abstract: OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

    Submitted 5 June, 2016; originally announced June 2016.

  20. arXiv:1511.07275  [pdf, other

    cs.AI cs.LG

    Learning Simple Algorithms from Examples

    Authors: Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus

    Abstract: We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their abilit… ▽ More

    Submitted 23 November, 2015; v1 submitted 23 November, 2015; originally announced November 2015.

  21. arXiv:1511.06732  [pdf, other

    cs.LG cs.CL

    Sequence Level Training with Recurrent Neural Networks

    Authors: Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

    Abstract: Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We addre… ▽ More

    Submitted 6 May, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

  22. arXiv:1506.08230  [pdf, other

    cs.LG cs.NE

    Convolutional networks and learning invariant to homogeneous multiplicative scalings

    Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

    Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More

    Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures, 4 tables

    Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

  23. arXiv:1505.00521  [pdf, other

    cs.LG

    Reinforcement Learning Neural Turing Machines - Revised

    Authors: Wojciech Zaremba, Ilya Sutskever

    Abstract: The Neural Turing Machine (NTM) is more expressive than all previously considered models because of its external memory. It can be viewed as a broader effort to use abstract external Interfaces and to learn a parametric model that interacts with them. The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world. These external Interfaces include… ▽ More

    Submitted 12 January, 2016; v1 submitted 4 May, 2015; originally announced May 2015.

  24. arXiv:1410.8206  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Addressing the Rare Word Problem in Neural Machine Translation

    Authors: Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba

    Abstract: Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OO… ▽ More

    Submitted 30 May, 2015; v1 submitted 29 October, 2014; originally announced October 2014.

    Comments: ACL 2015 camera-ready version

  25. arXiv:1410.4615  [pdf, other

    cs.NE cs.AI cs.LG

    Learning to Execute

    Authors: Wojciech Zaremba, Ilya Sutskever

    Abstract: Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks.… ▽ More

    Submitted 19 February, 2015; v1 submitted 16 October, 2014; originally announced October 2014.

  26. arXiv:1409.2329  [pdf, ps, other

    cs.NE

    Recurrent Neural Network Regularization

    Authors: Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

    Abstract: We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include langu… ▽ More

    Submitted 19 February, 2015; v1 submitted 8 September, 2014; originally announced September 2014.

  27. arXiv:1406.1584  [pdf, other

    cs.LG

    Learning to Discover Efficient Mathematical Identities

    Authors: Wojciech Zaremba, Karol Kurach, Rob Fergus

    Abstract: In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities. We introduce an attribute grammar framework for representing symbolic expressions. Given a set of grammar rules we build trees that combine different rules, looking for branches which yield compositions that are analytically equivalent to a target expression, but of lower… ▽ More

    Submitted 5 November, 2014; v1 submitted 6 June, 2014; originally announced June 2014.

  28. arXiv:1404.0736  [pdf, other

    cs.CV cs.LG

    Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

    Authors: Remi Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus

    Abstract: We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lo… ▽ More

    Submitted 9 June, 2014; v1 submitted 2 April, 2014; originally announced April 2014.

  29. arXiv:1312.6203  [pdf, other

    cs.LG cs.CV cs.NE

    Spectral Networks and Locally Connected Networks on Graphs

    Authors: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun

    Abstract: Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain. In this paper we consider possible generalizations of CNNs to signals defined on more general domains without the action of a translation group. In particular, we propose two construction… ▽ More

    Submitted 21 May, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: 14 pages

  30. arXiv:1312.6199  [pdf, other

    cs.CV cs.LG cs.NE

    Intriguing properties of neural networks

    Authors: Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus

    Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction betwee… ▽ More

    Submitted 19 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

  31. arXiv:1307.1954  [pdf, other

    cs.LG stat.ML

    B-tests: Low Variance Kernel Two-Sample Tests

    Authors: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

    Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the $B$-test family combines favorable properties of pr… ▽ More

    Submitted 10 February, 2014; v1 submitted 8 July, 2013; originally announced July 2013.

    Comments: Neural Information Processing Systems (2013)