Skip to main content

Showing 1–12 of 12 results for author: Stuhlmüller, A

.
  1. arXiv:2310.10627  [pdf, other

    cs.CL cs.AI

    Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

    Authors: Charlie George, Andreas Stuhlmüller

    Abstract: Hallucination plagues even frontier LLMs--but how bad is it really for summarizing academic papers? We evaluate Factored Verification, a simple automated method for detecting hallucinations in abstractive summaries. This method sets a new SotA on hallucination detection in the summarization task of the HaluEval benchmark, achieving 76.2% accuracy. We then use this method to estimate how often lang… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Second Workshop on Information Extraction from Scientific Publications (WIESP) at IJCNLP-AACL 2023

  2. arXiv:2301.01751  [pdf, other

    cs.CL cs.AI cs.HC

    Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

    Authors: Justin Reppert, Ben Rachbach, Charlie George, Luke Stebbing, Jungwon Byun, Maggie Appleton, Andreas Stuhlmüller

    Abstract: Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for develo** and refining compositional LM pro… ▽ More

    Submitted 4 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

  3. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  4. arXiv:2109.14076  [pdf, other

    cs.CL cs.AI cs.LG

    RAFT: A Real-World Few-Shot Text Classification Benchmark

    Authors: Neel Alex, Eli Lifland, Lewis Tunstall, Abhishek Thakur, Pegah Maham, C. Jess Riedel, Emmie Hine, Carolyn Ashurst, Paul Sedille, Alexis Carlier, Michael Noetel, Andreas Stuhlmüller

    Abstract: Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-wo… ▽ More

    Submitted 18 January, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Dataset, submission instructions, code and leaderboard available at https://raft.elicit.org

  5. arXiv:1802.04302  [pdf, other

    cs.CL stat.ML

    Evaluating Compositionality in Sentence Embeddings

    Authors: Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman

    Abstract: An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionali… ▽ More

    Submitted 17 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

  6. arXiv:1707.05173  [pdf, other

    cs.AI cs.LG cs.NE

    Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

    Authors: William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans

    Abstract: AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a hum… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

  7. arXiv:1701.04079  [pdf, other

    cs.LG cs.AI

    Agent-Agnostic Human-in-the-Loop Reinforcement Learning

    Authors: David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans

    Abstract: Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedur… ▽ More

    Submitted 15 January, 2017; originally announced January 2017.

    Comments: Presented at the NIPS Workshop on the Future of Interactive Learning Machines, 2016

  8. arXiv:1512.05832  [pdf, other

    cs.AI

    Learning the Preferences of Ignorant, Inconsistent Agents

    Authors: Owain Evans, Andreas Stuhlmueller, Noah D. Goodman

    Abstract: An important use of machine learning is to learn what people value. What posts or photos should a user be shown? Which jobs or activities would a person find rewarding? In each case, observations of people's past choices can inform our inferences about their likes and preferences. If we assume that choices are approximately optimal according to some utility function, we can treat preference infere… ▽ More

    Submitted 17 December, 2015; originally announced December 2015.

    Comments: AAAI 2016

  9. arXiv:1509.02962  [pdf, other

    cs.AI stat.ML

    Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs

    Authors: Andreas Stuhlmüller, Robert X. D. Hawkins, N. Siddharth, Noah D. Goodman

    Abstract: Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be u… ▽ More

    Submitted 9 September, 2015; originally announced September 2015.

  10. arXiv:1509.02151  [pdf, other

    cs.AI cs.PL

    C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching

    Authors: Daniel Ritchie, Andreas Stuhlmüller, Noah D. Goodman

    Abstract: Lightweight, source-to-source transformation approaches to implementing MCMC for probabilistic programming languages are popular for their simplicity, support of existing deterministic code, and ability to execute on existing fast runtimes. However, they are also slow, requiring a complete re-execution of the program on every Metropolis Hastings proposal. We present a new extension to the lightwei… ▽ More

    Submitted 8 September, 2015; v1 submitted 7 September, 2015; originally announced September 2015.

    Comments: Fix typo in author name

  11. arXiv:1206.3555  [pdf, other

    cs.AI cs.DS

    A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs

    Authors: Andreas Stuhlmüller, Noah D. Goodman

    Abstract: We describe a dynamic programming algorithm for computing the marginal distribution of discrete probabilistic programs. This algorithm takes a functional interpreter for an arbitrary probabilistic programming language and turns it into an efficient marginalizer. Because direct caching of sub-distributions is impossible in the presence of recursion, we build a graph of dependencies between sub-dist… ▽ More

    Submitted 10 September, 2012; v1 submitted 15 June, 2012; originally announced June 2012.

    Comments: Second Statistical Relational AI workshop at UAI 2012 (StaRAI-12)

  12. arXiv:1110.5667  [pdf, other

    cs.AI cs.LG

    Inducing Probabilistic Programs by Bayesian Program Merging

    Authors: Irvin Hwang, Andreas Stuhlmüller, Noah D. Goodman

    Abstract: This report outlines an approach to learning generative models from data. We express models as probabilistic programs, which allows us to capture abstract patterns within the examples. By choosing our language for programs to be an extension of the algebraic data type of the examples, we can begin with a program that generates all and only the examples. We then introduce greater abstraction, and h… ▽ More

    Submitted 25 October, 2011; originally announced October 2011.