Skip to main content

Showing 1–14 of 14 results for author: So, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04175  [pdf, other

    cs.CL cs.AI

    Confabulation: The Surprising Value of Large Language Model Hallucinations

    Authors: Peiqi Sui, Eamon Duede, Sophie Wu, Richard Jean So

    Abstract: This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulation… ▽ More

    Submitted 25 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Forthcoming at ACL2024 main conference. 1 figure

  2. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  3. arXiv:2302.14838  [pdf, other

    cs.NE cs.AI cs.CL cs.LG

    EvoPrompting: Language Models for Code-Level Neural Architecture Search

    Authors: Angelica Chen, David M. Dohan, David R. So

    Abstract: Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. While NAS still proves too difficult a task for LMs to succeed at solely through prompting, we find that the combination of evolutionary prompt engineering with soft prompt-tu… ▽ More

    Submitted 16 November, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  4. arXiv:2302.05433  [pdf, other

    cs.LG cs.NE

    Unified Functional Hashing in Automatic Machine Learning

    Authors: Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real

    Abstract: The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    ACM Class: I.2.2; I.2.6

  5. arXiv:2210.11399  [pdf, other

    cs.CL cs.AI cs.LG

    Transcending Scaling Laws with 0.1% Extra Compute

    Authors: Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, **feng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

    Abstract: Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objec… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: V2 has updated references/related work

  6. arXiv:2109.08668  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Primer: Searching for Efficient Transformers for Language Modeling

    Authors: David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

    Abstract: Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that d… ▽ More

    Submitted 24 January, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: "Primer: Searching for Efficient Transformers for Language Modeling" NeurIPS camera ready. 34 pages

  7. arXiv:2105.08050  [pdf, other

    cs.LG cs.CL cs.CV

    Pay Attention to MLPs

    Authors: Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

    Abstract: Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple network architecture, gMLP, based on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Tra… ▽ More

    Submitted 1 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

  8. arXiv:2102.02340  [pdf, other

    cs.LG cs.AI cs.CL

    MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

    Authors: Zhen Xu, David R. So, Andrew M. Dai

    Abstract: One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features -- all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representati… ▽ More

    Submitted 5 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

  9. arXiv:2003.03384  [pdf, other

    cs.LG cs.NE stat.ML

    AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

    Authors: Esteban Real, Chen Liang, David R. So, Quoc V. Le

    Abstract: Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spac… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at the 37th International Conference on Machine Learning (ICML 2020). Near camera-ready version

    ACM Class: I.2.2; I.2.6

  10. arXiv:2001.09977  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Towards a Human-like Open-Domain Chatbot

    Authors: Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

    Abstract: We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation.… ▽ More

    Submitted 27 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 38 pages, 12 figures

  11. arXiv:1901.11117  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    The Evolved Transformer

    Authors: David R. So, Chen Liang, Quoc V. Le

    Abstract: Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolu… ▽ More

    Submitted 17 May, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: ICML version with SOTA results

  12. arXiv:1812.03991  [pdf, ps, other

    cs.ET cs.AR

    Real-time Closed Loop Neural Decoding on a Neuromorphic Chip

    Authors: Shoeb Shaikh, Rosa So, Tafadzwa Sibindi, Camilo Libedinsky, Arindam Basu

    Abstract: This paper presents for the first time a real-time closed loop neuromorphic decoder chip-driven intra-cortical brain machine interface (iBMI) in a non-human primate (NHP) based experimental setup. Decoded results show trial success rates and mean times to target comparable to those obtained by hand-controlled joystick. Neural control trial success rates of approximately 96% of those obtained by ha… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: accepted at Neural Engineering Conference (NER), 2019

  13. arXiv:1803.10342  [pdf, other

    q-bio.BM cs.LG stat.ML

    Classification of crystallization outcomes using deep convolutional neural networks

    Authors: Andrew E. Bruno, Patrick Charbonneau, Janet Newman, Edward H. Snell, David R. So, Vincent Vanhoucke, Christopher J. Watkins, Shawn Williams, Julie Wilson

    Abstract: The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different parts of this data set. We find that more than 94% of the test images can be correctly labeled, irrespective… ▽ More

    Submitted 25 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: 11 pages, 4 figures, minor text and figure updates

  14. arXiv:1508.04562  [pdf, other

    cs.CL cs.IR

    Fast, Flexible Models for Discovering Topic Correlation across Weakly-Related Collections

    Authors: **gwei Zhang, Aaron Gerow, Jaan Altosaar, James Evans, Richard Jean So

    Abstract: Weak topic correlation across document collections with different numbers of topics in individual collections presents challenges for existing cross-collection topic models. This paper introduces two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections. Topic… ▽ More

    Submitted 19 August, 2015; originally announced August 2015.

    Comments: EMNLP 2015