Skip to main content

Showing 1–24 of 24 results for author: Radford, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04093  [pdf, other

    cs.LG cs.AI

    Scaling and evaluating sparse autoencoders

    Authors: Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu

    Abstract: Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2308.01666  [pdf, other

    cs.IR cs.CL

    Evaluating ChatGPT text-mining of clinical records for obesity monitoring

    Authors: Ivo S. Fins, Heather Davies, Sean Farrell, Jose R. Torres, Gina Pinchbeck, Alan D. Radford, Peter-John Noble

    Abstract: Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either Reg… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Supplementary Material: The data that support the findings of this study are available in the ancillary files of this submission. 5 pages, 2 figures (textboxes)

  3. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  4. arXiv:2212.04356  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Robust Speech Recognition via Large-Scale Weak Supervision

    Authors: Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

    Abstract: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuni… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  5. arXiv:2211.02364  [pdf, other

    stat.AP cs.HC

    Visualising spatio-temporal health data: the importance of capturing the 4th dimension

    Authors: Alison C. Hale, Charlotte Appleton, P. -J. M. Noble, Gina L. Pinchbeck, Barry Rowlingson, Peter J. Diggle, Alan D. Radford, Christopher P. Jewell

    Abstract: Confronted by a rapidly evolving health threat, such as an infectious disease outbreak, it is essential that decision-makers are able to comprehend the complex dynamics not just in space but also in the 4th dimension, time. In this paper this is addressed by a novel visualisation tool, referred to as the Dynamic Health Atlas web app, which is designed specifically for displaying the spatial evolut… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 4 Figures, 27 pages

  6. arXiv:2201.10005  [pdf, other

    cs.CL cs.LG

    Text and Code Embeddings by Contrastive Pre-Training

    Authors: Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng

    Abstract: Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code.… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  7. arXiv:2110.05448  [pdf, other

    cs.CL cs.AI

    Unsupervised Neural Machine Translation with Generative Language Models Only

    Authors: Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh, Alec Radford, Ilya Sutskever

    Abstract: We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 10 pages

  8. arXiv:2108.02818  [pdf, other

    cs.CV cs.AI cs.CY

    Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications

    Authors: Sandhini Agarwal, Gretchen Krueger, Jack Clark, Alec Radford, Jong Wook Kim, Miles Brundage

    Abstract: Recently, there have been breakthroughs in computer vision ("CV") models that are more generalizable with the advent of models such as CLIP and ALIGN. In this paper, we analyze CLIP and highlight some of the challenges such models pose. CLIP reduces the need for task specific training data, potentially opening up many niche tasks to automation. CLIP also allows its users to flexibly specify image… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2103.00020

  9. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  10. arXiv:2103.00020  [pdf, other

    cs.CV cs.LG

    Learning Transferable Visual Models From Natural Language Supervision

    Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

    Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstr… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

  11. arXiv:2102.12092  [pdf, other

    cs.CV cs.LG

    Zero-Shot Text-to-Image Generation

    Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

    Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More

    Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

  12. arXiv:2010.14701  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Laws for Autoregressive Generative Modeling

    Authors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

    Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depe… ▽ More

    Submitted 5 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 20+17 pages, 33 figures; added appendix with additional language results

  13. arXiv:2009.01325  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to summarize from human feedback

    Authors: Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

    Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible t… ▽ More

    Submitted 15 February, 2022; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2020

  14. arXiv:2005.14165  [pdf, other

    cs.CL

    Language Models are Few-Shot Learners

    Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess , et al. (6 additional authors not shown)

    Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few… ▽ More

    Submitted 22 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 40+32 pages

  15. arXiv:2005.00341  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Jukebox: A Generative Model for Music

    Authors: Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

    Abstract: We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

  16. arXiv:2001.08361  [pdf, other

    cs.LG stat.ML

    Scaling Laws for Neural Language Models

    Authors: Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

    Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

    Comments: 19 pages, 15 figures

  17. arXiv:1909.08593  [pdf, other

    cs.CL cs.LG stat.ML

    Fine-Tuning Language Models from Human Preferences

    Authors: Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving

    Abstract: Reward learning enables the application of reinforcement learning (RL) to tasks where reward is defined by human judgment, building a model of reward by asking humans questions. Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and saf… ▽ More

    Submitted 8 January, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

  18. arXiv:1908.09203  [pdf

    cs.CL cs.AI cs.CY

    Release Strategies and the Social Impacts of Language Models

    Authors: Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, Jasmine Wang

    Abstract: Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI's work related to the release of its GPT-2 language model. It discusses staged release, which allows time between model releases to conduct risk and… ▽ More

    Submitted 12 November, 2019; v1 submitted 24 August, 2019; originally announced August 2019.

    Comments: 71 pages, report

    ACM Class: I.2; I.2.7; K.4

  19. arXiv:1904.10509  [pdf, other

    cs.LG stat.ML

    Generating Long Sequences with Sparse Transformers

    Authors: Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever

    Abstract: Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matrices to save memory, and c) fast at… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

  20. arXiv:1803.05573  [pdf, other

    cs.LG stat.ML

    Improving GANs Using Optimal Transport

    Authors: Tim Salimans, Han Zhang, Alec Radford, Dimitris Metaxas

    Abstract: We present Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution. This metric, which we call mini-batch energy distance, combines optimal transport in primal form with an energy distance defined in an adversarially learned feature space, resulting in a highly discriminativ… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

  21. arXiv:1707.06347  [pdf, other

    cs.LG

    Proximal Policy Optimization Algorithms

    Authors: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

    Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of min… ▽ More

    Submitted 28 August, 2017; v1 submitted 19 July, 2017; originally announced July 2017.

  22. arXiv:1704.01444  [pdf, other

    cs.LG cs.CL cs.NE

    Learning to Generate Reviews and Discovering Sentiment

    Authors: Alec Radford, Rafal Jozefowicz, Ilya Sutskever

    Abstract: We explore the properties of byte-level recurrent language models. When given sufficient amounts of capacity, training data, and compute time, the representations learned by these models include disentangled features corresponding to high-level concepts. Specifically, we find a single unit which performs sentiment analysis. These representations, learned in an unsupervised manner, achieve state of… ▽ More

    Submitted 6 April, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

  23. arXiv:1606.03498  [pdf, other

    cs.LG cs.CV cs.NE

    Improved Techniques for Training GANs

    Authors: Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen

    Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, n… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  24. arXiv:1511.06434  [pdf, other

    cs.LG cs.CV

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

    Authors: Alec Radford, Luke Metz, Soumith Chintala

    Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversa… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under review as a conference paper at ICLR 2016