Skip to main content

Showing 1–13 of 13 results for author: Bosma, M

.
  1. arXiv:2206.07682  [pdf, other

    cs.CL

    Emergent Abilities of Large Language Models

    Authors: Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

    Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Transactions on Machine Learning Research (TMLR), 2022

  2. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  3. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  4. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  5. arXiv:2202.11401  [pdf, other

    eess.IV cs.CV cs.NE

    Mixed-Block Neural Architecture Search for Medical Image Segmentation

    Authors: Martijn M. A. Bosma, Arkadiy Dushatskiy, Monika Grewal, Tanja Alderliesten, Peter A. N. Bosman

    Abstract: Deep Neural Networks (DNNs) have the potential for making various clinical procedures more time-efficient by automating medical image segmentation. Due to their strong, in some cases human-level, performance, they have become the standard approach in this field. The design of the best possible medical image segmentation DNNs, however, is task-specific. Neural Architecture Search (NAS), i.e., the a… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  6. arXiv:2201.11903  [pdf, other

    cs.CL cs.AI

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

    Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided… ▽ More

    Submitted 10 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

  7. arXiv:2201.08239  [pdf, other

    cs.CL cs.AI

    LaMDA: Language Models for Dialog Applications

    Authors: Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia **, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yan** Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao , et al. (35 additional authors not shown)

    Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotat… ▽ More

    Submitted 10 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  8. arXiv:2112.06905  [pdf, other

    cs.CL

    GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

    Authors: Nan Du, Yan** Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu , et al. (2 additional authors not shown)

    Abstract: Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GL… ▽ More

    Submitted 1 August, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICML 2022

  9. arXiv:2112.00114  [pdf, other

    cs.LG cs.NE

    Show Your Work: Scratchpads for Intermediate Computation with Language Models

    Authors: Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

    Abstract: Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  10. arXiv:2109.01652  [pdf, other

    cs.CL

    Finetuned Language Models Are Zero-Shot Learners

    Authors: Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

    Abstract: This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natur… ▽ More

    Submitted 8 February, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Version 5. Find list of changes in Appendix F (page 35)

  11. arXiv:2108.07732  [pdf, other

    cs.PL cs.LG

    Program Synthesis with Large Language Models

    Authors: Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton

    Abstract: This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

    Comments: Jacob and Augustus contributed equally

  12. arXiv:2106.03317  [pdf, other

    math.NA physics.comp-ph

    Smooth Implicit Hybrid Upwinding for Compositional Multiphase Flow in Porous Media

    Authors: Sebastian B. M. Bosma, Francois P. Hamon, Brad T. Mallison, Hamdi A. Tchelepi

    Abstract: In subsurface multiphase flow simulations, poor nonlinear solver performance is a significant runtime sink. The system of fully implicit mass balance equations is highly nonlinear and often difficult to solve for the nonlinear solver, generally Newton(-Raphson). Strong nonlinearities can cause Newton iterations to converge very slowly. This frequently results in time step cuts, leading to computat… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  13. Enhanced multiscale restriction-smoothed basis (MsRSB) preconditioning with applications to porous media flow and geomechanics

    Authors: Sebastian B. M. Bosma, Sergey Klevtsov, Olav Møyner, Nicola Castelletto

    Abstract: A novel method to enable application of the Multiscale Restricted Smoothed Basis (MsRSB) method to non M-matrices is presented. The original MsRSB method is enhanced with a filtering strategy enforcing M-matrix properties to enable the robust application of MsRSB as a preconditioner. Through applications to porous media flow and linear elastic geomechanics, the method is proven to be effective for… ▽ More

    Submitted 27 July, 2020; v1 submitted 14 December, 2019; originally announced December 2019.