Skip to main content

Showing 1–26 of 26 results for author: Tamkin, A

.
  1. arXiv:2406.10162  [pdf, other

    cs.AI cs.CL

    Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

    Authors: Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger

    Abstract: In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be to… ▽ More

    Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Make it easier to find samples from the model, and highlight that our operational definition of reward tampering has false positives where the model attempts to complete the task honestly but edits the reward. Add paragraph to conclusion to this effect, and add sentence to figure 1 to this effect

  2. arXiv:2406.07814  [pdf, other

    cs.AI cs.CL cs.HC

    Collective Constitutional AI: Aligning a Language Model with Public Input

    Authors: Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli

    Abstract: There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    ACM Class: I.2.7; K.4.2

    Journal ref: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1395-1417

  3. arXiv:2403.05534  [pdf, other

    cs.CL

    Bayesian Preference Elicitation with Language Models

    Authors: Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, Belinda Z. Li

    Abstract: Aligning AI systems to users' interests requires understanding and incorporating humans' complex values and preferences. Recently, language models (LMs) have been used to gather information about the preferences of human users. This preference data can be used to fine-tune or guide other LMs and/or AI systems. However, LMs have been shown to struggle with crucial aspects of preference learning: qu… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2312.03689  [pdf, other

    cs.CL

    Evaluating and Mitigating Discrimination in Language Model Decisions

    Authors: Alex Tamkin, Amanda Askell, Liane Lovitt, Esin Durmus, Nicholas Joseph, Shauna Kravec, Karina Nguyen, Jared Kaplan, Deep Ganguli

    Abstract: As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  5. arXiv:2310.17769  [pdf, other

    cs.CL cs.AI

    Social Contract AI: Aligning AI Assistants with Implicit Group Norms

    Authors: Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

    Abstract: We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies fro… ▽ More

    Submitted 3 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/)

  6. arXiv:2310.17230  [pdf, other

    cs.LG cs.CL

    Codebook Features: Sparse and Discrete Interpretability for Neural Networks

    Authors: Alex Tamkin, Mohammad Taufeeque, Noah D. Goodman

    Abstract: Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable by quantizing their continuous features into what we call codebook features. Codebook features are produced by finetuning neural networks with vector quantization… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  7. arXiv:2310.11589  [pdf, other

    cs.CL cs.AI cs.LG

    Eliciting Human Preferences with Language Models

    Authors: Belinda Z. Li, Alex Tamkin, Noah Goodman, Jacob Andreas

    Abstract: Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting examples or writing prompts for can be challenging--especially in tasks that involve unusual edge cases, demand precise articulation of nebulous preferences, or require an accurate mental model of LM behavior. We propose to use *LMs themselves* to guide the task specif… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 26 pages, 15 figures

  8. arXiv:2309.13457  [pdf, other

    cs.LG cs.CV physics.comp-ph physics.flu-dyn

    Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

    Authors: Wai Tong Chung, Bassem Akoush, Pushan Sharma, Alex Tamkin, Ki Sung Jung, Jacqueline H. Chen, Jack Guo, Davy Brouzet, Mohsen Talei, Bruno Savard, Alexei Y. Poludnenko, Matthias Ihme

    Abstract: Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent f… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted in Adv. in Neural Information Processing Systems 36 (NeurIPS 2023). Link: https://nips.cc/virtual/2023/poster/73433 . 55 pages, 21 figures. Keywords: Super-resolution, 3D, Neural Scaling, Physics-informed Loss, Computational Fluid Dynamics, Partial Differential Equations, Turbulent Reacting Flows, Direct Numerical Simulation, Fluid Mechanics, Combustion, Computer Vision

  9. arXiv:2308.03296  [pdf, other

    cs.LG cs.CL stat.ML

    Studying Large Language Model Generalization with Influence Functions

    Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

    Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 119 pages, 47 figures, 22 tables

  10. arXiv:2306.16388  [pdf, other

    cs.CL cs.AI

    Towards Measuring the Representation of Subjective Global Opinions in Language Models

    Authors: Esin Durmus, Karina Nguyen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

    Abstract: Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across dif… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

  11. arXiv:2306.02889  [pdf

    cs.CY cs.AI

    Operationalising the Definition of General Purpose AI Systems: Assessing Four Approaches

    Authors: Risto Uuk, Carlos Ignacio Gutierrez, Alex Tamkin

    Abstract: The European Union's Artificial Intelligence (AI) Act is set to be a landmark legal instrument for regulating AI technology. While stakeholders have primarily focused on the governance of fixed purpose AI applications (also known as narrow AI), more attention is required to understand the nature of highly and broadly capable systems. As of the beginning of 2023, several definitions for General Pur… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  12. arXiv:2304.08486  [pdf, other

    cs.CV

    BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors

    Authors: Kathryn Wantlin, Chenwei Wu, Shih-Cheng Huang, Oishi Banerjee, Farah Dadabhoy, Veeral Vipin Mehta, Ryan Wonhee Han, Fang Cao, Raja R. Narayan, Errol Colak, Adewole Adamson, Laura Heacock, Geoffrey H. Tison, Alex Tamkin, Pranav Rajpurkar

    Abstract: Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from a scarcity of examples and labels. Recent advances, including transformers and self-supervised learning, promise a more universal approach that can be applied flexibly across these diverse conditions. To measure and drive progress in this dir… ▽ More

    Submitted 26 June, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

  13. arXiv:2302.05757  [pdf, other

    cs.CV cs.AI

    Multispectral Contrastive Learning with Viewmaker Networks

    Authors: Jasmine Bayrooti, Noah Goodman, Alex Tamkin

    Abstract: Contrastive learning methods have been applied to a range of domains and modalities by training models to identify similar "views" of data points. However, specialized scientific modalities pose a challenge for this paradigm, as identifying good views for each scientific instrument is complex and time-intensive. In this paper, we focus on applying contrastive learning approaches to a variety of re… ▽ More

    Submitted 3 June, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Appearing in CVPR-PBVS 2023

  14. arXiv:2212.10711  [pdf, other

    cs.CL cs.LG

    Task Ambiguity in Humans and Language Models

    Authors: Alex Tamkin, Kunal Handa, Avash Shrestha, Noah Goodman

    Abstract: Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new bench… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  15. arXiv:2212.08378  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

    Authors: Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman

    Abstract: What role do augmentations play in contrastive learning? Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task. We complicate this picture by showing that label-destroying augmentations can be useful in the foundation model setting, where the goal is to learn diverse, general-purpose representations for multiple downstream tasks. We perform co… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  16. arXiv:2204.08491  [pdf, other

    cs.LG cs.CL cs.CV

    Active Learning Helps Pretrained Models Learn the Intended Task

    Authors: Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman

    Abstract: Models can fail in unpredictable ways during deployment due to task ambiguity, when multiple behaviors are consistent with the provided training data. An example is an object classifier trained on red squares and blue circles: when encountering blue squares, the intended behavior is undefined. We investigate whether pretrained models are better active learners, capable of disambiguating between th… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  17. arXiv:2202.12312  [pdf, other

    cs.CL

    Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

    Authors: Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou

    Abstract: When we transfer a pretrained language model to a new language, there are many axes of variation that change at once. To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of controlled transfer studies: we systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time, and then measu… ▽ More

    Submitted 23 January, 2024; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: EMNLP 2023

  18. arXiv:2112.05340  [pdf, other

    cs.CV cs.LG

    Tradeoffs Between Contrastive and Supervised Learning: An Empirical Study

    Authors: Ananya Karthik, Mike Wu, Noah Goodman, Alex Tamkin

    Abstract: Contrastive learning has made considerable progress in computer vision, outperforming supervised pretraining on a range of downstream datasets. However, is contrastive learning the better choice in all situations? We demonstrate two cases where it is not. First, under sufficiently small pretraining budgets, supervised pretraining on ImageNet consistently outperforms a comparable contrastive model… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice

  19. arXiv:2111.12062  [pdf, other

    cs.LG cs.CL cs.CV

    DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

    Authors: Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah Goodman

    Abstract: Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze pr… ▽ More

    Submitted 5 January, 2023; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021

  20. arXiv:2108.10307  [pdf, other

    cs.LG

    C5T5: Controllable Generation of Organic Molecules with Transformers

    Authors: Daniel Rothchild, Alex Tamkin, Julie Yu, Ujval Misra, Joseph Gonzalez

    Abstract: Methods for designing organic materials with desired properties have high potential impact across fields such as medicine, renewable energy, petrochemical engineering, and agriculture. However, using generative modeling to design substances with desired properties is difficult because candidate compounds must satisfy multiple constraints, including synthetic accessibility and other metrics that ar… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  21. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  22. arXiv:2102.02503  [pdf, ps, other

    cs.CL cs.LG

    Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

    Authors: Alex Tamkin, Miles Brundage, Jack Clark, Deep Ganguli

    Abstract: On October 14th, 2020, researchers from OpenAI, the Stanford Institute for Human-Centered Artificial Intelligence, and other universities convened to discuss open research questions surrounding GPT-3, the largest publicly-disclosed dense language model at the time. The meeting took place under Chatham House Rules. Discussants came from a variety of research backgrounds including computer science,… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  23. arXiv:2011.04823  [pdf, other

    cs.CL cs.LG

    Language Through a Prism: A Spectral Approach for Multiscale Language Representations

    Authors: Alex Tamkin, Dan Jurafsky, Noah Goodman

    Abstract: Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show tha… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020

  24. arXiv:2010.07432  [pdf, other

    cs.LG cs.CL cs.CV

    Viewmaker Networks: Learning Views for Unsupervised Representation Learning

    Authors: Alex Tamkin, Mike Wu, Noah Goodman

    Abstract: Many recent methods for unsupervised representation learning train models to be invariant to different "views," or distorted versions of an input. However, designing these views requires considerable trial and error by human experts, hindering widespread adoption of unsupervised representation learning methods across domains and modalities. To address this, we propose viewmaker networks: generativ… ▽ More

    Submitted 29 March, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: ICLR 2021

  25. arXiv:2004.14975  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Transferability in Pretrained Language Models

    Authors: Alex Tamkin, Trisha Singh, Davide Giovanardi, Noah Goodman

    Abstract: How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This method, partial reinitialization, involves replacing different layers of a pretrained model with random weights, then finetuning the entire model on the transfer task and observing the change in performance. This… ▽ More

    Submitted 9 November, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Findings of EMNLP 2020

  26. arXiv:1911.01546  [pdf, other

    cs.LG cs.AI

    Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

    Authors: Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

    Abstract: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal p… ▽ More

    Submitted 2 April, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Journal ref: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)