Skip to main content

Showing 1–50 of 152 results for author: Bowman, R

.
  1. arXiv:2407.04549  [pdf, other

    cs.CL cs.AI

    Spontaneous Reward Hacking in Iterative Self-Refinement

    Authors: Jane Pan, He He, Samuel R. Bowman, Shi Feng

    Abstract: Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.15518  [pdf, other

    cs.CL cs.LG

    Steering Without Side Effects: Improving Post-Deployment Control of Language Models

    Authors: Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman

    Abstract: Language models (LMs) have been shown to behave unexpectedly post-deployment. For example, new jailbreaks continually arise, allowing model misuse, despite extensive red-teaming and adversarial training from developers. Given most model queries are unproblematic and frequent retraining results in unstable user experience, methods for mitigation of worst-case behavior should be targeted. One such m… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.10162  [pdf, other

    cs.AI cs.CL

    Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

    Authors: Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger

    Abstract: In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be to… ▽ More

    Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Make it easier to find samples from the model, and highlight that our operational definition of reward tampering has false positives where the model attempts to complete the task honestly but edits the reward. Add paragraph to conclusion to this effect, and add sentence to figure 1 to this effect

  4. arXiv:2404.15758  [pdf, other

    cs.CL cs.AI

    Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

    Authors: Jacob Pfau, William Merrill, Samuel R. Bowman

    Abstract: Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that additional tokens allow. We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 17 pages, 10 figures

    ACM Class: I.2.6

  5. arXiv:2404.13076  [pdf, other

    cs.CL cs.AI

    LLM Evaluators Recognize and Favor Their Own Generations

    Authors: Arjun Panickssery, Samuel R. Bowman, Shi Feng

    Abstract: Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement. But new biases are introduced due to the same LLM acting as both the evaluator and the evaluatee. One such bias is self-preference, where an LLM evaluator scores its own outputs higher than others' while human annotators cons… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2403.05518  [pdf, other

    cs.CL cs.AI

    Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

    Authors: James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin

    Abstract: While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervis… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2402.06782  [pdf, other

    cs.AI cs.CL

    Debating with More Persuasive LLMs Leads to More Truthful Answers

    Authors: Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez

    Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this… ▽ More

    Submitted 30 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: For code please check: https://github.com/ucl-dark/llm_debate

  8. arXiv:2401.05566  [pdf, other

    cs.CR cs.AI cs.CL cs.LG cs.SE

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Authors: Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec , et al. (14 additional authors not shown)

    Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept exa… ▽ More

    Submitted 17 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: updated to add missing acknowledgements

  9. arXiv:2311.12022  [pdf, other

    cs.AI cs.CL

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

    Abstract: We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert v… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 28 pages, 5 figures, 7 tables

  10. arXiv:2311.08702  [pdf, other

    cs.AI cs.CL

    Debate Helps Supervise Unreliable Experts

    Authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman

    Abstract: As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 84 pages, 13 footnotes, 5 figures, 4 tables, 28 debate transcripts; data and code at https://github.com/julianmichael/debate/tree/2023-nyu-experiments

    ACM Class: I.2.0

  11. arXiv:2311.08200  [pdf, other

    physics.ins-det

    Using Old Laboratory Equipment with Modern Web-of-Things Standards: a Smart Laboratory with LabThings Retro

    Authors: Samuel McDermott, Jurij Kotar, Joel Collins, Leonardo Mancini, Richard Bowman, Pietro Cicuta

    Abstract: There has been an increasing, and welcome, Open Hardware trend towards science teams building and sharing their designs for new instruments. These devices, often built upon low-cost microprocessors and micro-controllers, can be readily connected to enable complex, automated, and smart experiments. When designed to use open communication web standards, devices from different laboratories and manufa… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Supplementary material - video demonstration available at https://zenodo.org/doi/10.5281/zenodo.10123735

  12. arXiv:2310.13548  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Towards Understanding Sycophancy in Language Models

    Authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

    Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 32 pages, 20 figures

    ACM Class: I.2.6

  13. arXiv:2310.06496  [pdf

    cond-mat.mtrl-sci physics.app-ph physics.optics

    Spatially resolved photoluminescence analysis of Se passivation and defect formation in CdSe$_{x}$Te$_{1-x}$ thin films

    Authors: Alan R Bowman, Jacob J Leaver, Kyle Frohna, Samuel D Stranks, Giulia Tagliabue, Jon D Major

    Abstract: CdTe is the most commercially successful thin-film photovoltaic technology to date. The recent development of Se-alloyed CdSe$_{x}$Te$_{1-x}$ layers in CdTe solar cells has led to higher device efficiencies, due to a lowered bandgap improving the photocurrent, improved voltage characteristics and longer carrier lifetimes. Evidence from cross-sectional electron microscopy is widely believed to indi… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  14. arXiv:2308.03296  [pdf, other

    cs.LG cs.CL stat.ML

    Studying Large Language Model Generalization with Influence Functions

    Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

    Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 119 pages, 47 figures, 22 tables

  15. arXiv:2307.13702  [pdf, other

    cs.AI cs.CL cs.LG

    Measuring Faithfulness in Chain-of-Thought Reasoning

    Authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume , et al. (5 additional authors not shown)

    Abstract: Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

  16. arXiv:2307.11768  [pdf, other

    cs.CL cs.AI cs.LG

    Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

    Authors: Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez

    Abstract: As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The reasoning may enable us to check the process that models use to perfo… ▽ More

    Submitted 25 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: For few-shot examples and prompts, see https://github.com/anthropics/DecompositionFaithfulnessPaper

  17. arXiv:2307.09324  [pdf

    physics.chem-ph cond-mat.mes-hall

    Interfacial Hot Carrier Collection Controls Plasmonic Chemistry

    Authors: Fatemeh Kiani, Alan R. Bowman, Milad Sabzehparvar, Can O. Karaman, Ravishankar Sundararaman, Giulia Tagliabue

    Abstract: Harnessing non-equilibrium hot carriers from plasmonic metal nanostructures constitutes a vibrant research field. It promises to enable control of activity and selectivity of photochemical reactions, especially for solar fuel generation. However, a comprehensive understanding of the interplay of plasmonic hot carrier-driven processes in metal/semiconducting heterostructures has remained elusive. I… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Journal ref: ACS Energy Lett. 2023, 8, 10, 4242-4250

  18. arXiv:2307.08477  [pdf

    cond-mat.mtrl-sci physics.optics

    Quantum-mechanical effects in photoluminescence from thin crystalline gold films

    Authors: Alan R. Bowman, Álvaro Rodríguez Echarri, Fatemeh Kiani, Fadil Iyikanat, Ted V. Tsoulos, Joel D. Cox, Ravishankar Sundararaman, F. Javier García de Abajo, Giulia Tagliabue

    Abstract: Luminescence constitutes a unique source of insight into hot carrier processes in metals, including those in plasmonic nanostructures used for sensing and energy applications. However, being weak in nature, metal luminescence remains poorly understood, its microscopic origin strongly debated, and its potential for unravelling nanoscale carrier dynamics largely unexploited. Here, we reveal quantum-… ▽ More

    Submitted 25 September, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Main text 21 pages and 4 figures. Supplemental Information 33 pages and 17 figures

    Journal ref: Light. Sci. Appl. 13, 91 (2024)

  19. arXiv:2306.09479  [pdf, other

    cs.CL cs.AI cs.CY

    Inverse Scaling: When Bigger Isn't Better

    Authors: Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zheng** Zhou, Najoung Kim , et al. (2 additional authors not shown)

    Abstract: Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling… ▽ More

    Submitted 12 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Published in TMLR (2023), 39 pages

    Journal ref: Transactions on Machine Learning Research (TMLR), 10/2023, https://openreview.net/forum?id=DwgRm72GQF

  20. ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

    Authors: **gyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

    Abstract: A number of recent benchmarks seek to assess how well models handle natural language negation. However, these benchmarks lack the controlled example paradigms that would allow us to infer whether a model had learned how negation morphemes semantically scope. To fill these analytical gaps, we present the Scoped Negation NLI (ScoNe-NLI) benchmark, which contains contrast sets of six examples with up… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  21. arXiv:2305.14279  [pdf, other

    cs.CL

    Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

    Authors: Angelica Chen, Jason Phang, Alicia Parrish, Vishakh Padmakumar, Chen Zhao, Samuel R. Bowman, Kyunghyun Cho

    Abstract: Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to TMLR: https://openreview.net/forum?id=5nBqY1y96B

    Journal ref: Transactions on Machine Learning Research (2024)

  22. arXiv:2305.04388  [pdf, other

    cs.CL cs.AI

    Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

    Authors: Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

    Abstract: Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). It is tempting to interpret these CoT explanations as the LLM's process for solving a task. This level of transparency into LLMs' predictions would yield significant safety benefits. However, we find that… ▽ More

    Submitted 9 December, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  23. arXiv:2304.00612  [pdf, other

    cs.CL cs.AI

    Eight Things to Know about Large Language Models

    Authors: Samuel R. Bowman

    Abstract: The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields. This attention is a timely response to the many urgent questions that this technology raises, but it can sometimes miss important considerations. This paper surveys the evidence for eight potentially surpr… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

  24. arXiv:2303.16749  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Improving Code Generation by Training with Natural Language Feedback

    Authors: Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

    Abstract: The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedbac… ▽ More

    Submitted 22 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Published in (and superceded by) TMLR: https://openreview.net/forum?id=xo3hI5MwvU

  25. arXiv:2303.08993  [pdf

    q-bio.BM physics.chem-ph

    Folding@home: achievements from over twenty years of citizen science herald the exascale era

    Authors: Vincent A. Voelz, Vijay S. Pande, Gregory R. Bowman

    Abstract: Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 24 pages, 6 figures

  26. arXiv:2302.13806  [pdf, other

    cond-mat.stat-mech physics.hist-ph

    Remembering the work of Phillip L. Geissler: A coda to his scientific trajectory

    Authors: Gregory R. Bowman, Stephen J. Cox, Christoph Dellago, Kateri H. DuBay, Joel D. Eaves, Daniel A. Fletcher, Layne B. Frechette, Michael Grünwald, Katherine Klymko, JiYeon Ku, Ahmad K. Omar, Eran Rabani, David R. Reichman, Julia R. Rogers, Andreana M. Rosnik, Grant M. Rotskoff, Anna R. Schneider, Nadine Schwierz, David A. Sivak, Suriyanarayanan Vaikuntanathan, Stephen Whitelam, Asaph Widmer-Cooper

    Abstract: Phillip L. Geissler made important contributions to the statistical mechanics of biological polymers, heterogeneous materials, and chemical dynamics in aqueous environments. He devised analytical and computational methods that revealed the underlying organization of complex systems at the frontiers of biology, chemistry, and materials science. In this retrospective, we celebrate his work at these… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Journal ref: Ann. Rev. Phys. Chem. 74, 11.1-11.27 (2023)

  27. arXiv:2302.08582  [pdf, other

    cs.CL cs.LG

    Pretraining Language Models with Human Preferences

    Authors: Tomasz Korbak, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Samuel R. Bowman, Ethan Perez

    Abstract: Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark… ▽ More

    Submitted 14 June, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: ICML 2023

  28. arXiv:2302.07459  [pdf, other

    cs.CL

    The Capacity for Moral Self-Correction in Large Language Models

    Authors: Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma , et al. (24 additional authors not shown)

    Abstract: We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability… ▽ More

    Submitted 18 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  29. arXiv:2212.13273  [pdf

    cond-mat.soft physics.chem-ph

    Discontinuous metric programming in liquid crystalline elastomers

    Authors: Tayler S. Hebner, Riley G. A. Bowman, Daniel Duffy, Cyrus Mostajeran, Itay Griniasty, Itai Cohen, Mark Warner, Christopher N. Bowman, Timothy J. White

    Abstract: Liquid crystalline elastomers (LCEs) are shape-changing materials that exhibit large deformations in response to applied stimuli. Local control of the orientation of LCEs spatially directs the deformation of these materials to realize spontaneous shape change in response to stimuli. Prior approaches to shape programming in LCEs utilize patterning techniques that involve the detailed inscription of… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

  30. arXiv:2212.10003  [pdf, other

    cs.CL

    (QA)$^2$: Question Answering with Questionable Assumptions

    Authors: Najoung Kim, Phu Mon Htut, Samuel R. Bowman, Jackson Petty

    Abstract: Naturally occurring information-seeking questions often contain questionable assumptions -- assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers for information-seeking questions. For instance, the question "When did Marie Curie discover Uranium?" cannot be answer… ▽ More

    Submitted 29 August, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023 camera-ready

  31. arXiv:2212.09251  [pdf, other

    cs.CL cs.AI cs.LG

    Discovering Language Model Behaviors with Model-Written Evaluations

    Authors: Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion , et al. (38 additional authors not shown)

    Abstract: As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from inst… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: for associated data visualizations, see https://www.evals.anthropic.com/model-written/ for full datasets, see https://github.com/anthropics/evals

  32. arXiv:2212.08073  [pdf, other

    cs.CL cs.AI

    Constitutional AI: Harmlessness from AI Feedback

    Authors: Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite , et al. (26 additional authors not shown)

    Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supe… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  33. arXiv:2211.03540  [pdf, other

    cs.HC cs.AI cs.CL

    Measuring Progress on Scalable Oversight for Large Language Models

    Authors: Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamilė Lukošiūtė, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse , et al. (21 additional authors not shown)

    Abstract: Develo** safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think abou… ▽ More

    Submitted 11 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: v2 fixes a few typos from v1

  34. arXiv:2210.10860  [pdf, other

    cs.CL

    Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

    Authors: Alicia Parrish, Harsh Trivedi, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Amanpreet Singh Saimbhi, Samuel R. Bowman

    Abstract: The use of language-model-based question-answering systems to aid humans in completing difficult tasks is limited, in part, by the unreliability of the text these systems generate. Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 12 pages, 6 figures, 7 tables

  35. arXiv:2209.14947  [pdf, ps, other

    physics.ins-det cs.HC

    Controlling and scripting laboratory hardware with open-source, intuitive interfaces: OpenFlexure Voice Control and OpenFlexure Blockly

    Authors: Samuel McDermott, Richard Bowman, Kerrianne Harrington, William Wadsworth, Pietro Cicuta

    Abstract: Making user interaction with laboratory equipment more convenient and intuitive should promote experimental work and help researchers to complete their tasks efficiently. The most common form of interaction in current instrumentation is either direct tactile, with buttons and knobs, or interfaced through a computer, using a mouse and keyboard. Scripting is another function typical of smart and aut… ▽ More

    Submitted 2 February, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

  36. arXiv:2208.12852  [pdf, other

    cs.CL cs.AI

    What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

    Authors: Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

    Abstract: We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, w… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: 31 pages, 19 figures, 3 tables; more information at https://nlpsurvey.net

    ACM Class: I.2.7

  37. arXiv:2208.07998  [pdf, other

    cs.CL

    What Artificial Neural Networks Can Tell Us About Human Language Acquisition

    Authors: Alex Warstadt, Samuel R. Bowman

    Abstract: Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. However, the learning environments and biases of current artificial learners and humans diverge in ways that weaken the impact of the evidence obtained from learning simulations. For example, today's most effective neural language models are trained on roughly… ▽ More

    Submitted 11 February, 2024; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Please cite the published version with the following information: @incollection{warstadt2022artificial, title={What artificial neural networks can tell us about human language acquisition}, author={Warstadt, Alex and Bowman, Samuel R.}, booktitle={Algebraic Structures in Natural Language}, pages={17--60}, year={2022}, publisher={CRC Press} }

  38. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  39. arXiv:2205.11465  [pdf, ps, other

    cs.CL cs.AI

    SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

    Authors: Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

    Abstract: Summarization datasets are often assembled either by scra** naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries. In this work, we turn to a slower but more straightforward approach to develo** summarization bench… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  40. arXiv:2205.10782  [pdf, other

    cs.CL

    Instruction Induction: From Few Examples to Natural Language Task Descriptions

    Authors: Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy

    Abstract: Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge,… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  41. arXiv:2204.09997  [pdf, other

    physics.optics cond-mat.mes-hall physics.bio-ph physics.ins-det quant-ph

    Computational ghost imaging for transmission electron microscopy

    Authors: Akhil Kallepalli, Lorenzo Viani, Daan Stellinga, Enzo Rotunno, Ming-Jie Sun, Richard Bowman, Paolo Rosi, Stefano Frabboni, Roberto Balboni, Andrea Migliori, Vincenzo Grillo, Miles Padgett

    Abstract: While transmission electron microscopes (TEM) can achieve a much higher resolution than optical microscopes, they face challenges of damage to samples during the high energy processes involved. Here, we explore using computational ghost imaging techniques in electron microscopy to reduce the total required intensity. The technological lack of the equivalent high-resolution, optical spatial light m… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: 6 figures, 10 pages

  42. arXiv:2204.05212  [pdf, other

    cs.CL

    Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

    Authors: Alicia Parrish, Harsh Trivedi, Ethan Perez, Angelica Chen, Nikita Nangia, Jason Phang, Samuel R. Bowman

    Abstract: Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model's answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing si… ▽ More

    Submitted 13 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to the 2022 ACL Workshop on Learning with Natural Language Supervision. 12 pages total, 9 figures, 2 tables

  43. arXiv:2203.06342  [pdf, other

    cs.CL cs.AI

    What Makes Reading Comprehension Questions Difficult?

    Authors: Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman

    Abstract: For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems. However, we do not yet know how best to select text sources to collect a variety of challenging examples. In this study, we crowdsource multiple-choice reading comprehension questions for… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  44. arXiv:2201.02163  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Optical properties of Au-Hf thin films

    Authors: Hugh Littlehailes, William R. Hendren, Robert M. Bowman, Fumin Huang

    Abstract: The optical properties of thin films of intermetallic Au$_{3}$Hf were experimentally investigated for the first time, which display clear plasmonic properties in the optical and near infrared region with negative permittivity. In contrast to similar alloys, such as films of Au$_{3}$Zr, the films express more negative $ε'$ values and lower $ε''$ values across most of the wavelengths (370-1570 nm) i… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Comments: 19 pages, including references, plus 3 pages of supplementary material. 8 figures and 1 table in main text, 1 figure and 1 table in supplementary material

  45. arXiv:2112.08608  [pdf, other

    cs.CL

    QuALITY: Question Answering with Long Input Texts, Yes!

    Authors: Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

    Abstract: To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than rely… ▽ More

    Submitted 11 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  46. arXiv:2112.05804  [pdf, other

    physics.ins-det physics.optics

    Multi-modal microscopy imaging with the OpenFlexure Delta Stage

    Authors: Samuel McDermott, Filip Ayazi, Joel Collins, Joe Knapper, Julian Stirling, Richard Bowman, Pietro Cicuta

    Abstract: Microscopes are vital pieces of equipment in much of biological research and medical diagnostics. However, access to a microscope can represent a bottleneck in research, especially in lower-income countries. `Smart' computer controlled motorized microscopes, which can perform automated routines or acquire images in a range of modalities are even more expensive and inaccessible. Develo** low-cost… ▽ More

    Submitted 30 June, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  47. arXiv:2112.05631  [pdf, other

    physics.ins-det physics.bio-ph

    autohaem: 3D printed devices for automated preparation of blood smears

    Authors: Samuel McDermott, Jaehyeon Kim, Aikaterini Anna Leledaki, Duncan Parry, Louis Lee, Alexandre Kabla, Catherine Mkindi, Richard Bowman, Pietro Cicuta

    Abstract: The process of making blood smears is common in both research and clinical settings, for investigating the health of blood cells and the presence of blood-borne parasites. It is very often carried out manually. We focus here on smears for malaria diagnosis and research which are frequently analyzed by optical microscopy and require a high quality. Automating the smear preparation promises to incre… ▽ More

    Submitted 18 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  48. arXiv:2112.03078  [pdf, other

    cond-mat.mtrl-sci

    A study of singlet fission-halide perovskite interfaces

    Authors: Alan R. Bowman, Samuel D. Stranks, Bartomeu Monserrat

    Abstract: A method for improving the efficiency of solar cells is combining a low-bandgap semiconductor with a singlet fission material (which converts one high energy singlet into two low energy triplets following photoexcitation). Here we present a study of the interface between singlet fission molecules and low-bandgap halide pervoskites. We briefly show a range of experiments screening for triplet trans… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  49. arXiv:2111.08181  [pdf, other

    cs.CL

    Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair

    Authors: Jason Phang, Angelica Chen, William Huang, Samuel R. Bowman

    Abstract: More capable language models increasingly saturate existing task benchmarks, in some cases outperforming humans. This has left little headroom with which to measure further progress. Adversarial dataset creation has been proposed as a strategy to construct more challenging datasets, and two common approaches are: (1) filtering out easy examples and (2) model-in-the-loop data collection. In this wo… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  50. arXiv:2110.08355  [pdf, other

    cs.CL

    Clean or Annotate: How to Spend a Limited Data Collection Budget

    Authors: Derek Chen, Zhou Yu, Samuel R. Bowman

    Abstract: Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise. The first involves aggregating redundant annotations, but comes at the expense of labeling substantially fewer examples. Secondly, prior works have also consider… ▽ More

    Submitted 12 June, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: 17 pages, 3 figures, 6 tables. Accepted to NAACL 2022 workshop