Skip to main content

Showing 1–50 of 57 results for author: Sutskever, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04093  [pdf, other

    cs.LG cs.AI

    Scaling and evaluating sparse autoencoders

    Authors: Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu

    Abstract: Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2312.09390  [pdf, other

    cs.CL

    Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

    Authors: Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

    Abstract: Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly su… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  3. arXiv:2305.20050  [pdf, other

    cs.LG cs.AI cs.CL

    Let's Verify Step by Step

    Authors: Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

    Abstract: In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning ste… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  4. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  5. arXiv:2303.01469  [pdf, other

    cs.LG cs.CV stat.ML

    Consistency Models

    Authors: Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever

    Abstract: Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly map** noise to data. They support fast one-step generation by design, while still allowing mult… ▽ More

    Submitted 31 May, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: ICML 2023

  6. arXiv:2212.04356  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Robust Speech Recognition via Large-Scale Weak Supervision

    Authors: Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

    Abstract: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuni… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  7. arXiv:2202.01344  [pdf, other

    cs.LG cs.AI

    Formal Mathematics Statement Curriculum Learning

    Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever

    Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of f… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  8. arXiv:2112.10741  [pdf, other

    cs.CV cs.GR cs.LG

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

    Authors: Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

    Abstract: Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators f… ▽ More

    Submitted 8 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: 20 pages, 18 figures

  9. arXiv:2110.05448  [pdf, other

    cs.CL cs.AI

    Unsupervised Neural Machine Translation with Generative Language Models Only

    Authors: Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh, Alec Radford, Ilya Sutskever

    Abstract: We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 10 pages

  10. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  11. arXiv:2103.00020  [pdf, other

    cs.CV cs.LG

    Learning Transferable Visual Models From Natural Language Supervision

    Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

    Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstr… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

  12. arXiv:2102.12092  [pdf, other

    cs.CV cs.LG

    Zero-Shot Text-to-Image Generation

    Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

    Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More

    Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

  13. arXiv:2009.03393  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Generative Language Modeling for Automated Theorem Proving

    Authors: Stanislas Polu, Ilya Sutskever

    Abstract: We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans -- the generation of original mathematical terms -- might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formal… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

    Comments: 15+5 pages

  14. arXiv:2005.14165  [pdf, other

    cs.CL

    Language Models are Few-Shot Learners

    Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess , et al. (6 additional authors not shown)

    Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few… ▽ More

    Submitted 22 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 40+32 pages

  15. arXiv:2005.00341  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Jukebox: A Generative Model for Music

    Authors: Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

    Abstract: We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

  16. arXiv:1912.06680  [pdf, other

    cs.LG stat.ML

    Dota 2 with Large Scale Deep Reinforcement Learning

    Authors: OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang , et al. (2 additional authors not shown)

    Abstract: On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learnin… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  17. arXiv:1912.02292  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Deep Double Descent: Where Bigger Models and More Data Hurt

    Authors: Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

    Abstract: We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effect… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: G.K. and Y.B. contributed equally

  18. arXiv:1904.10509  [pdf, other

    cs.LG stat.ML

    Generating Long Sequences with Sparse Transformers

    Authors: Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever

    Abstract: Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matrices to save memory, and c) fast at… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

  19. arXiv:1810.01367  [pdf, other

    cs.LG cs.CV stat.ML

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    Authors: Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud

    Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In… ▽ More

    Submitted 22 October, 2018; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: 8 Pages, 6 figures

  20. arXiv:1806.00608  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    GamePad: A Learning Environment for Theorem Proving

    Authors: Daniel Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever

    Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesi… ▽ More

    Submitted 21 December, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

  21. arXiv:1803.01118  [pdf, other

    cs.AI

    Some Considerations on Learning to Explore via Meta-Reinforcement Learning

    Authors: Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

    Abstract: We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

    Submitted 11 January, 2019; v1 submitted 3 March, 2018; originally announced March 2018.

  22. arXiv:1710.03748  [pdf, other

    cs.AI

    Emergent Complexity via Multi-Agent Competition

    Authors: Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch

    Abstract: Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can pro… ▽ More

    Submitted 14 March, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018

  23. arXiv:1710.03641  [pdf, other

    cs.LG cs.AI

    Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

    Authors: Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel

    Abstract: Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additi… ▽ More

    Submitted 23 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018

  24. arXiv:1706.06428  [pdf, other

    cs.CL cs.LG stat.ML

    An online sequence-to-sequence model for noisy speech recognition

    Authors: Chung-Cheng Chiu, Dieterich Lawson, Yu** Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

    Abstract: Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1608.01281

  25. arXiv:1704.01444  [pdf, other

    cs.LG cs.CL cs.NE

    Learning to Generate Reviews and Discovering Sentiment

    Authors: Alec Radford, Rafal Jozefowicz, Ilya Sutskever

    Abstract: We explore the properties of byte-level recurrent language models. When given sufficient amounts of capacity, training data, and compute time, the representations learned by these models include disentangled features corresponding to high-level concepts. Specifically, we find a single unit which performs sentiment analysis. These representations, learned in an unsupervised manner, achieve state of… ▽ More

    Submitted 6 April, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

  26. arXiv:1703.07326  [pdf, other

    cs.AI cs.LG cs.NE cs.RO

    One-Shot Imitation Learning

    Authors: Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba

    Abstract: Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineer… ▽ More

    Submitted 4 December, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

  27. arXiv:1703.03864  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Authors: Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever

    Abstract: We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, ou… ▽ More

    Submitted 7 September, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

  28. arXiv:1703.01703  [pdf, other

    cs.LG

    Third-Person Imitation Learning

    Authors: Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever

    Abstract: Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demo… ▽ More

    Submitted 22 September, 2019; v1 submitted 5 March, 2017; originally announced March 2017.

    Comments: Only changed the abstract to remove unneeded hyphens

  29. arXiv:1611.02779  [pdf, other

    cs.AI cs.LG cs.NE stat.ML

    RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

    Authors: Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel

    Abstract: Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we prop… ▽ More

    Submitted 9 November, 2016; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: 14 pages. Under review as a conference paper at ICLR 2017

  30. arXiv:1611.02731  [pdf, other

    cs.LG stat.ML

    Variational Lossy Autoencoder

    Authors: Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel

    Abstract: Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations… ▽ More

    Submitted 4 March, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: Added CIFAR10 experiments; ICLR 2017

  31. arXiv:1611.00736  [pdf, other

    cs.NE cs.AI

    Extensions and Limitations of the Neural GPU

    Authors: Eric Price, Wojciech Zaremba, Ilya Sutskever

    Abstract: The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. The latter requires a memory efficient implementation, as a naive im… ▽ More

    Submitted 4 November, 2016; v1 submitted 2 November, 2016; originally announced November 2016.

  32. arXiv:1608.01281  [pdf, other

    cs.LG cs.CL

    Learning Online Alignments with Continuous Rewards Policy Gradient

    Authors: Yu** Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

    Abstract: Sequence-to-sequence models with soft attention had significant success in machine translation, speech recognition, and question answering. Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition. To address this problem, we present a new met… ▽ More

    Submitted 3 August, 2016; originally announced August 2016.

  33. arXiv:1606.04934  [pdf, other

    cs.LG stat.ML

    Improving Variational Inference with Inverse Autoregressive Flow

    Authors: Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling

    Abstract: The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation… ▽ More

    Submitted 30 January, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

  34. arXiv:1606.03657  [pdf, other

    cs.LG stat.ML

    InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

    Authors: Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

    Abstract: This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information obje… ▽ More

    Submitted 11 June, 2016; originally announced June 2016.

  35. arXiv:1603.04467  [pdf, other

    cs.DC cs.LG

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    Authors: Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah , et al. (15 additional authors not shown)

    Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de… ▽ More

    Submitted 16 March, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

    Comments: Version 2 updates only the metadata, to correct the formatting of Martín Abadi's name

  36. arXiv:1603.00748  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Continuous Deep Q-Learning with Model-based Acceleration

    Authors: Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

    Abstract: Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algo… ▽ More

    Submitted 2 March, 2016; originally announced March 2016.

  37. arXiv:1511.08228  [pdf, ps, other

    cs.LG cs.NE

    Neural GPUs Learn Algorithms

    Authors: Łukasz Kaiser, Ilya Sutskever

    Abstract: Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they are not parallel an… ▽ More

    Submitted 14 March, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

  38. arXiv:1511.06807  [pdf, other

    stat.ML cs.LG

    Adding Gradient Noise Improves Learning for Very Deep Networks

    Authors: Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

    Abstract: Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

  39. arXiv:1511.06440  [pdf, other

    cs.LG

    Towards Principled Unsupervised Learning

    Authors: Ilya Sutskever, Rafal Jozefowicz, Karol Gregor, Danilo Rezende, Tim Lillicrap, Oriol Vinyals

    Abstract: General unsupervised learning is a long-standing conceptual problem in machine learning. Supervised learning is successful because it can be solved by the minimization of the training error cost function. Unsupervised learning is not as successful, because the unsupervised objective may be unrelated to the supervised task of interest. For an example, density modelling and reconstruction have often… ▽ More

    Submitted 3 December, 2015; v1 submitted 19 November, 2015; originally announced November 2015.

  40. arXiv:1511.06392  [pdf, other

    cs.LG cs.NE

    Neural Random-Access Machines

    Authors: Karol Kurach, Marcin Andrychowicz, Ilya Sutskever

    Abstract: In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure input-output examples using backpropagation. We evaluate the new model on a number of simple algorithmic tasks whose solutions require pointer manipulation and… ▽ More

    Submitted 9 February, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR submission, 17 pages, 9 figures, 6 tables (with bibliography and appendix)

  41. arXiv:1511.06114  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Multi-task Sequence to Sequence Learning

    Authors: Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

    Abstract: Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machi… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: 10 pages, 4 figures, ICLR 2016 camera-ready, added parsing SOTA results

  42. arXiv:1511.05176  [pdf, other

    cs.LG

    MuProp: Unbiased Backpropagation for Stochastic Neural Networks

    Authors: Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih

    Abstract: Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling ope… ▽ More

    Submitted 25 February, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

  43. arXiv:1511.04868  [pdf, other

    cs.LG cs.CL cs.NE

    A Neural Transducer

    Authors: Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

    Abstract: Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make i… ▽ More

    Submitted 4 August, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

  44. arXiv:1511.04834  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Programmer: Inducing Latent Programs with Gradient Descent

    Authors: Arvind Neelakantan, Quoc V. Le, Ilya Sutskever

    Abstract: Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simp… ▽ More

    Submitted 4 August, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Accepted as a conference paper at ICLR 2015

  45. arXiv:1505.00521  [pdf, other

    cs.LG

    Reinforcement Learning Neural Turing Machines - Revised

    Authors: Wojciech Zaremba, Ilya Sutskever

    Abstract: The Neural Turing Machine (NTM) is more expressive than all previously considered models because of its external memory. It can be viewed as a broader effort to use abstract external Interfaces and to learn a parametric model that interacts with them. The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world. These external Interfaces include… ▽ More

    Submitted 12 January, 2016; v1 submitted 4 May, 2015; originally announced May 2015.

  46. arXiv:1412.7449  [pdf, other

    cs.CL cs.LG stat.ML

    Grammar as a Foreign Language

    Authors: Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

    Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used… ▽ More

    Submitted 9 June, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

  47. arXiv:1412.6564  [pdf, other

    cs.LG cs.NE

    Move Evaluation in Go Using Deep Convolutional Neural Networks

    Authors: Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver

    Abstract: The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether deep convolutional networks can be used to directly represent and learn this knowledge. We train a large 12-layer convolutional neural network by supervised learning from a database of human professional games. The network cor… ▽ More

    Submitted 10 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Minor edits and included captures in Figure 2

  48. arXiv:1410.8206  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Addressing the Rare Word Problem in Neural Machine Translation

    Authors: Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba

    Abstract: Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OO… ▽ More

    Submitted 30 May, 2015; v1 submitted 29 October, 2014; originally announced October 2014.

    Comments: ACL 2015 camera-ready version

  49. arXiv:1410.4615  [pdf, other

    cs.NE cs.AI cs.LG

    Learning to Execute

    Authors: Wojciech Zaremba, Ilya Sutskever

    Abstract: Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks.… ▽ More

    Submitted 19 February, 2015; v1 submitted 16 October, 2014; originally announced October 2014.

  50. arXiv:1409.3215  [pdf, ps, other

    cs.CL cs.LG

    Sequence to Sequence Learning with Neural Networks

    Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le

    Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a mu… ▽ More

    Submitted 14 December, 2014; v1 submitted 10 September, 2014; originally announced September 2014.

    Comments: 9 pages