Skip to main content

Showing 1–12 of 12 results for author: Burns, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.09390  [pdf, other

    cs.CL

    Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

    Authors: Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

    Abstract: Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly su… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  2. arXiv:2301.00832  [pdf

    cs.DL

    The Issues with Journal Issues: Let Journals Be Digital Libraries

    Authors: C. Sean Burns

    Abstract: Science depends on a communication system, and today that is largely provided by digital technologies such as the internet and web. Despite that digital technologies provide the infrastructure for that communication system, peer-reviewed journals continue to mimic workflows and processes from the print era. This paper focuses on one artifact from the print era, the journal issue, and describes how… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: 17 pages, submitted to Publications

  3. arXiv:2212.03827  [pdf, other

    cs.CL cs.AI cs.LG

    Discovering Latent Knowledge in Language Models Without Supervision

    Authors: Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

    Abstract: Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a l… ▽ More

    Submitted 2 March, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: ICLR 2023

  4. arXiv:2105.09938  [pdf, other

    cs.SE cs.CL cs.LG

    Measuring Coding Challenge Competence With APPS

    Authors: Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt

    Abstract: While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating code generation, and it can be difficult to accurately assess code generation performance rigorously. To meet this challenge, we introduce APPS, a benchmark for c… ▽ More

    Submitted 8 November, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: NeurIPS 2021. Code and the APPS dataset is available at https://github.com/hendrycks/apps

  5. arXiv:2103.06268  [pdf, other

    cs.CL cs.LG

    CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

    Authors: Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball

    Abstract: Many specialized domains remain untouched by deep learning, as large labeled datasets require expensive expert annotators. We address this bottleneck within the legal domain by introducing the Contract Understanding Atticus Dataset (CUAD), a new dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The tas… ▽ More

    Submitted 8 November, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: NeurIPS 2021. Code and the CUAD dataset are available at https://github.com/TheAtticusProject/cuad/

  6. arXiv:2103.05898  [pdf, other

    cs.CV cs.AI cs.LG

    Limitations of Post-Hoc Feature Alignment for Robustness

    Authors: Collin Burns, Jacob Steinhardt

    Abstract: Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution. A particularly simple but effective approach to feature alignment involves aligning the batch normalization statistics between the two distributions in a trained neural network. This technique has received renew… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  7. arXiv:2103.03874  [pdf, other

    cs.LG cs.AI cs.CL

    Measuring Mathematical Problem Solving With the MATH Dataset

    Authors: Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

    Abstract: Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanati… ▽ More

    Submitted 8 November, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

    Comments: NeurIPS 2021. Code and the MATH dataset is available at https://github.com/hendrycks/math/

  8. arXiv:2009.03300  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Measuring Massive Multitask Language Understanding

    Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

    Abstract: We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over… ▽ More

    Submitted 12 January, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: ICLR 2021; the test and code is available at https://github.com/hendrycks/test

  9. arXiv:2008.02275  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Aligning AI With Shared Human Values

    Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

    Abstract: We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable… ▽ More

    Submitted 17 February, 2023; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: ICLR 2021; the ETHICS dataset is available at https://github.com/hendrycks/ethics/

  10. arXiv:2007.03633  [pdf, ps, other

    cs.DS cs.CC cs.CG cs.LG

    Streaming Complexity of SVMs

    Authors: Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

    Abstract: We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for develo** fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires onl… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: APPROX 2020

  11. Interpreting Black Box Models via Hypothesis Testing

    Authors: Collin Burns, Jesse Thomason, Wesley Tansey

    Abstract: In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore benefit from control over the finite-sample error rate of interpretations. We reframe black box model interpretability as a multiple hypothesis testing p… ▽ More

    Submitted 17 August, 2020; v1 submitted 29 March, 2019; originally announced April 2019.

    Comments: FODS 2020

  12. arXiv:1702.05398  [pdf, other

    cs.CL

    Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

    Authors: Pradeep Dasigi, Gully A. P. C. Burns, Eduard Hovy, Anita de Waard

    Abstract: We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot a… ▽ More

    Submitted 17 February, 2017; originally announced February 2017.