Skip to main content

Showing 1–22 of 22 results for author: Odena, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2112.00114  [pdf, other

    cs.LG cs.NE

    Show Your Work: Scratchpads for Intermediate Computation with Language Models

    Authors: Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

    Abstract: Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  2. arXiv:2108.07732  [pdf, other

    cs.PL cs.LG

    Program Synthesis with Large Language Models

    Authors: Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton

    Abstract: This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

    Comments: Jacob and Augustus contributed equally

  3. arXiv:2010.11983  [pdf, other

    quant-ph cs.CC cs.LG

    Learnability and Complexity of Quantum Samples

    Authors: Murphy Yuezhen Niu, Andrew M. Dai, Li Li, Augustus Odena, Zhengli Zhao, Vadim Smelyanskyi, Hartmut Neven, Sergio Boixo

    Abstract: Given a quantum circuit, a quantum computer can sample the output distribution exponentially faster in the number of bits than classical computers. A similar exponential separation has yet to be established in generative models through quantum sample learning: given samples from an n-qubit computation, can we learn the underlying quantum distribution using models with training parameters that scal… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  4. arXiv:2010.05315  [pdf, other

    cs.LG

    SMYRF: Efficient Attention using Asymmetric Clustering

    Authors: Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis

    Abstract: We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: 30 pages, 10 figures

  5. arXiv:2007.14381  [pdf, other

    cs.PL cs.LG stat.ML

    BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration

    Authors: Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

    Abstract: Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up sear… ▽ More

    Submitted 30 September, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

  6. arXiv:2002.09030  [pdf, other

    cs.PL cs.LG

    Learning to Represent Programs with Property Signatures

    Authors: Augustus Odena, Charles Sutton

    Abstract: We introduce the notion of property signatures, a representation for programs and program specifications meant for consumption by machine learning algorithms. Given a function with input type $τ_{in}$ and output type $τ_{out}$, a property is a function of type: $(τ_{in}, τ_{out}) \rightarrow \texttt{Bool}$ that (informally) describes some simple property of the function under consideration. For in… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  7. arXiv:2002.06224  [pdf, other

    stat.ML cs.LG

    Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

    Authors: Samarth Sinha, Zhengli Zhao, Anirudh Goyal, Colin Raffel, Augustus Odena

    Abstract: We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost: When updating the generator parameters, we simply zero out the gradient contributions from the elements of the batch that the critic scores as `least realistic'. Through experiments on many different GAN varian… ▽ More

    Submitted 22 October, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

    Comments: NeurIPS 2020. Samarth Sinha and Zhengli Zhao contributed equally as joint first authors

  8. arXiv:2002.04724  [pdf, other

    stat.ML cs.LG

    Improved Consistency Regularization for GANs

    Authors: Zhengli Zhao, Sameer Singh, Honglak Lee, Zizhao Zhang, Augustus Odena, Han Zhang

    Abstract: Recent work has increased the performance of Generative Adversarial Networks (GANs) by enforcing a consistency cost on the discriminator. We improve on this technique in several ways. We first show that consistency regularization can introduce artifacts into the GAN samples and explain how to fix this issue. We then propose several modifications to the consistency regularization procedure designed… ▽ More

    Submitted 14 December, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: AAAI 2021

  9. arXiv:1911.12287  [pdf, other

    cs.LG cs.CV stat.ML

    Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

    Authors: Giannis Daras, Augustus Odena, Han Zhang, Alexandros G. Dimakis

    Abstract: We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from $18.65$ to $15.94$ on ImageNet, kee** all other parameters the same. The sparse attention patterns tha… ▽ More

    Submitted 2 December, 2019; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: Added TFRC, tensorflow-gan acknowledgements. Changed "Ablation Study" to "Ablation Studies"

  10. arXiv:1910.13540  [pdf, ps, other

    stat.ML cs.LG

    Small-GAN: Speeding Up GAN Training Using Core-sets

    Authors: Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena

    Abstract: Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of C… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  11. arXiv:1910.12027  [pdf, other

    cs.LG cs.CV stat.ML

    Consistency Regularization for Generative Adversarial Networks

    Authors: Han Zhang, Zizhao Zhang, Augustus Odena, Honglak Lee

    Abstract: Generative Adversarial Networks (GANs) are known to be difficult to train, despite considerable research effort. Several regularization techniques for stabilizing training have been proposed, but they introduce non-trivial computational overheads and interact poorly with existing techniques like spectral normalization. In this work, we propose a simple, effective training stabilizer based on the n… ▽ More

    Submitted 18 February, 2020; v1 submitted 26 October, 2019; originally announced October 2019.

    Comments: ICLR2020

  12. arXiv:1910.01177  [pdf, other

    stat.ML cs.LG

    Improving Differentially Private Models with Active Learning

    Authors: Zhengli Zhao, Nicolas Papernot, Sameer Singh, Neoklis Polyzotis, Augustus Odena

    Abstract: Broad adoption of machine learning techniques has increased privacy concerns for models trained on sensitive data such as medical records. Existing techniques for training differentially private (DP) models give rigorous privacy guarantees, but applying these techniques to neural networks can severely degrade model performance. This performance reduction is an obstacle to deploying private models… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

  13. arXiv:1810.06758  [pdf, other

    stat.ML cs.LG

    Discriminator Rejection Sampling

    Authors: Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena

    Abstract: We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be us… ▽ More

    Submitted 26 February, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  14. arXiv:1808.04888  [pdf, other

    stat.ML cs.LG

    Skill Rating for Generative Models

    Authors: Catherine Olsson, Surya Bhupatiraju, Tom Brown, Augustus Odena, Ian Goodfellow

    Abstract: We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different cont… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  15. arXiv:1807.10875  [pdf, other

    stat.ML cs.LG

    TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

    Authors: Augustus Odena, Ian Goodfellow

    Abstract: Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: Preprint - work in progress

  16. arXiv:1805.08318  [pdf, other

    stat.ML cs.LG

    Self-Attention Generative Adversarial Networks

    Authors: Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena

    Abstract: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover,… ▽ More

    Submitted 14 June, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

  17. arXiv:1804.09170  [pdf, other

    cs.LG stat.ML

    Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

    Authors: Avital Oliver, Augustus Odena, Colin Raffel, Ekin D. Cubuk, Ian J. Goodfellow

    Abstract: Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified r… ▽ More

    Submitted 17 June, 2019; v1 submitted 24 April, 2018; originally announced April 2018.

    Journal ref: NeurIPS 2018 Proceedings

  18. arXiv:1802.08768  [pdf, other

    stat.ML cs.LG

    Is Generator Conditioning Causally Related to GAN Performance?

    Authors: Augustus Odena, Jacob Buckman, Catherine Olsson, Tom B. Brown, Christopher Olah, Colin Raffel, Ian Goodfellow

    Abstract: Recent work (Pennington et al, 2017) suggests that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Motivated by this, we study the distribution of singular values of the Jacobian of the generator in Generative Adversarial Networks (GANs). We find that this Jacobian generally becomes ill-conditioned at the beginning of training.… ▽ More

    Submitted 18 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

  19. arXiv:1702.07780  [pdf, other

    stat.ML cs.LG

    Changing Model Behavior at Test-Time Using Reinforcement Learning

    Authors: Augustus Odena, Dieterich Lawson, Christopher Olah

    Abstract: Machine learning models are often used at test-time subject to constraints and trade-offs not present at training-time. For example, a computer vision model operating on an embedded device may need to perform real-time inference, or a translation model operating on a cell phone may wish to bound its average compute time in order to be power-efficient. In this work we describe a mixture-of-experts… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

    Comments: Submitted to ICLR 2017 Workshop Track

  20. arXiv:1610.09585  [pdf, other

    stat.ML cs.CV

    Conditional Image Synthesis With Auxiliary Classifier GANs

    Authors: Augustus Odena, Christopher Olah, Jonathon Shlens

    Abstract: Synthesizing high resolution photorealistic images has been a long-standing challenge in machine learning. In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128x128 resolution image samples exhibiting global coherence. We expand on previous work… ▽ More

    Submitted 20 July, 2017; v1 submitted 29 October, 2016; originally announced October 2016.

  21. arXiv:1606.01583  [pdf, other

    stat.ML cs.LG

    Semi-Supervised Learning with Generative Adversarial Networks

    Authors: Augustus Odena

    Abstract: We extend Generative Adversarial Networks (GANs) to the semi-supervised context by forcing the discriminator network to output class labels. We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G. We s… ▽ More

    Submitted 21 October, 2016; v1 submitted 5 June, 2016; originally announced June 2016.

    Comments: Appearing in the Data Efficient Machine Learning workshop at ICML 2016

  22. arXiv:1601.04033  [pdf, other

    stat.ML cs.LG

    Faster Asynchronous SGD

    Authors: Augustus Odena

    Abstract: Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since been updated on the server. Approaches have been proposed to circumvent this problem that quantify staleness in terms of the number of elapsed updates. In this… ▽ More

    Submitted 15 January, 2016; originally announced January 2016.

    Comments: 10 pages