Skip to main content

Showing 1–14 of 14 results for author: Snell, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.18232  [pdf, other

    cs.CL cs.AI cs.LG

    LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

    Authors: Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

    Abstract: Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necessitate considerable prompt tuning. This becomes particularly apparent in multi-turn conversations: even the best current LLMs rarely ask clarifying questions, engage in explicit information gathering,… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  2. arXiv:2311.14601  [pdf, other

    cs.LG cs.NE stat.ML

    A Metalearned Neural Circuit for Nonparametric Bayesian Inference

    Authors: Jake C. Snell, Gianluca Bencomo, Thomas L. Griffiths

    Abstract: Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution and it is unlikely that all classes are seen in a single sample. Nonparametric Bayesian models naturally capture this phenomenon, but have significant practical barriers to widesprea… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 13 pages, 3 figures. Code available at https://github.com/jakesnell/neural-circuits

  3. arXiv:2311.13628  [pdf, other

    cs.LG cs.AI cs.CL

    Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

    Authors: Thomas P. Zollo, Todd Morrill, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel

    Abstract: The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we pro… ▽ More

    Submitted 27 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 34 pages, 10 figures, published as conference paper at ICLR 2024, and accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023

  4. arXiv:2311.10580  [pdf, other

    cs.LG eess.SY stat.ML

    Implicit Maximum a Posteriori Filtering via Adaptive Optimization

    Authors: Gianluca M. Bencomo, Jake C. Snell, Thomas L. Griffiths

    Abstract: Bayesian filtering approximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. This process typically requires either storage, inversion, and multiplication of large matrices or Monte Carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of arti… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Under review at ICLR 2024

  5. arXiv:2309.13786  [pdf, other

    cs.LG stat.ML

    Distribution-Free Statistical Dispersion Control for Societal Applications

    Authors: Zhun Deng, Thomas P. Zollo, Jake C. Snell, Toniann Pitassi, Richard Zemel

    Abstract: Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the disp… ▽ More

    Submitted 6 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted by NeurIPS as spotlight (top 3% among submissions)

  6. arXiv:2305.17262  [pdf, other

    cs.CV cs.AI

    Im-Promptu: In-Context Composition from Image Prompts

    Authors: Bhishma Dedhia, Michael Chang, Jake C. Snell, Thomas L. Griffiths, Niraj K. Jha

    Abstract: Large language models are few-shot learners that can solve diverse tasks from a handful of demonstrations. This implicit understanding of tasks suggests that the attention mechanisms over word tokens may play a role in analogical reasoning. In this work, we investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli. First, we introduce a s… ▽ More

    Submitted 22 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  7. arXiv:2305.15717  [pdf, other

    cs.CL

    The False Promise of Imitating Proprietary LLMs

    Authors: Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

    Abstract: An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that i… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  8. arXiv:2212.13629  [pdf, other

    cs.LG stat.ML

    Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

    Authors: Jake C. Snell, Thomas P. Zollo, Zhun Deng, Toniann Pitassi, Richard Zemel

    Abstract: Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantile… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

    Comments: 24 pages, 4 figures. Code is available at https://github.com/jakesnell/quantile-risk-control

  9. arXiv:2209.15189  [pdf, other

    cs.CL cs.AI

    Learning by Distilling Context

    Authors: Charlie Snell, Dan Klein, Ruiqi Zhong

    Abstract: Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning capabilities by generating a scratch-pad before predicting the final answers. However, they do not \textit{internalize} these performance gains, which disappear when the context tokens are gone. Our work proposes… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  10. arXiv:2206.11871  [pdf, other

    cs.CL cs.LG

    Offline RL for Natural Language Generation with Implicit Language Q Learning

    Authors: Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine

    Abstract: Large language models distill broad knowledge from text corpora. However, they can be inconsistent when it comes to completing user specified tasks. This issue can be addressed by finetuning such models via supervised learning on curated datasets, or via reinforcement learning. In this work, we propose a novel offline RL method, implicit language Q-learning (ILQL), designed for use on language mod… ▽ More

    Submitted 1 May, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

  11. arXiv:2205.12422  [pdf, other

    cs.CL cs.AI cs.PL

    Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

    Authors: Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner

    Abstract: Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utteranc… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

  12. arXiv:2204.10198  [pdf, other

    cs.CL cs.AI

    Context-Aware Language Modeling for Goal-Oriented Dialogue Systems

    Authors: Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine

    Abstract: Goal-oriented dialogue systems face a trade-off between fluent language generation and task-specific control. While supervised learning with large language models is capable of producing realistic text, how to steer such responses towards completing a specific task without sacrificing language quality remains an open question. In this work, we formulate goal-oriented dialogue as a partially observ… ▽ More

    Submitted 21 April, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

  13. arXiv:2201.12323  [pdf, other

    cs.CL cs.AI cs.LG

    Describing Differences between Text Distributions with Natural Language

    Authors: Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt

    Abstract: How do two distributions of texts differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by "learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., "is military-related… ▽ More

    Submitted 18 May, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: International Conference on Machine Learning, 2022

  14. arXiv:2103.07601  [pdf, other

    cs.CL cs.AI

    Approximating How Single Head Attention Learns

    Authors: Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt

    Abstract: Why do models often attend to salient words, and how does this evolve throughout training? We approximate model training as a two stage process: early on in training when the attention weights are uniform, the model learns to translate individual input word `i` to `o` if they co-occur frequently. Later, the model learns to attend to `i` while the correct output is $o$ because it knows `i` translat… ▽ More

    Submitted 20 October, 2021; v1 submitted 12 March, 2021; originally announced March 2021.