Skip to main content

Showing 1–15 of 15 results for author: Marks, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14546  [pdf, other

    cs.CL cs.AI cs.LG

    Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

    Authors: Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

    Abstract: One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.10162  [pdf, other

    cs.AI cs.CL

    Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

    Authors: Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger

    Abstract: In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be to… ▽ More

    Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Make it easier to find samples from the model, and highlight that our operational definition of reward tampering has false positives where the model attempts to complete the task honestly but edits the reward. Add paragraph to conclusion to this effect, and add sentence to figure 1 to this effect

  3. arXiv:2403.19647  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

    Authors: Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

    Abstract: We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse featur… ▽ More

    Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz

  4. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  5. arXiv:2310.06824  [pdf, other

    cs.AI

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

    Authors: Samuel Marks, Max Tegmark

    Abstract: Large Language Models (LLMs) have impressive capabilities, but are also prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual iss… ▽ More

    Submitted 8 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  7. arXiv:2304.03775  [pdf, other

    stat.ML cs.LG q-bio.QM

    Biological Sequence Kernels with Guaranteed Flexibility

    Authors: Alan Nawzad Amin, Eli Nathan Weinstein, Debora Susan Marks

    Abstract: Applying machine learning to biological sequences - DNA, RNA and protein - has enormous potential to advance human health, environmental sustainability, and fundamental biological understanding. However, many existing machine learning methods are ineffective or unreliable in this problem domain. We study these challenges theoretically, through the lens of kernels. Methods based on kernels are ubiq… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  8. arXiv:2209.02126  [pdf, other

    eess.IV cs.CV

    Domain Generalization for Prostate Segmentation in Transrectal Ultrasound Images: A Multi-center Study

    Authors: Sulaiman Vesal, Iani Gayo, Indrani Bhattacharya, Shyam Natarajan, Leonard S. Marks, Dean C Barratt, Richard E. Fan, Yipeng Hu, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate biopsy and image-guided treatment procedures are often performed under the guidance of ultrasound fused with magnetic resonance images (MRI). Accurate image fusion relies on accurate segmentation of the prostate on ultrasound images. Yet, the reduced signal-to-noise ratio and artifacts (e.g., speckle and shadowing) in ultrasound images limit the performance of automated prostate segmentat… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: Accepted to the journal of Medical Image Analysis (MedIA)

  9. arXiv:2205.13760  [pdf, other

    cs.LG

    Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

    Authors: Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan Gomez, Debora S. Marks, Yarin Gal

    Abstract: The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful ap… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: ICML 2022

  10. arXiv:2001.08383  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-site Study of a Breast Density Deep Learning Model for Full-field Digital Mammography Images and Synthetic Mammography Images

    Authors: Thomas P. Matthews, Sadanand Singh, Brent Mombourquette, Jason Su, Meet P. Shah, Stefano Pedemonte, Aaron Long, David Maffit, Jenny Gurney, Rodrigo Morales Hoil, Nikita Ghare, Douglas Smith, Stephen M. Moore, Susan C. Marks, Richard L. Wahl

    Abstract: Purpose: To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multi-site setting for synthetic two-dimensional mammography (SM) images derived from digital breast tomosynthesis exams using full-field digital mammography (FFDM) images and limited SM data. Materials and Methods: A DL model was trained to predict BI-RADS breast density using F… ▽ More

    Submitted 2 October, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    MSC Class: 68T45 ACM Class: I.5.4; J.3; I.2.10; I.4.8

  11. Development of a wearable haptic game interface

    Authors: Jacques Foottit, Dave Brown, Stefan Marks, Andy M. Connor

    Abstract: This paper outlines the development and evaluation of a wearable haptic game interface. The device differs from many traditional haptic feedback implementation in that it combines vibrotactile feedback with gesture based input, thus becoming a two way conduit between the user and the virtual environment. The device is intended to challenge what is considered an "interface" and sets out to purposef… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1604.05479

    Journal ref: EAI Endorsed Transactions on Creative Technologies, 3(6), e5 (2016)

  12. An Intuitive Tangible Game Controller

    Authors: Jacques Foottit, Dave Brown, Stefan Marks, Andy M. Connor

    Abstract: This paper outlines the development of a sensory feedback device providing a tangible interface for controlling digital environments, in this example a flight simulator, where the intention for the device is that it is relatively low cost, versatile and intuitive. Gesture based input allows for a more immersive experience, so rather than making the user feel like they are controlling an aircraft t… ▽ More

    Submitted 20 April, 2016; originally announced April 2016.

    Comments: in Proceedings of the 2014 Conference on Interactive Entertainment

  13. Towards the Holodeck: Fully Immersive Virtual Reality Visualisation of Scientific and Engineering Data

    Authors: Stefan Marks, Javier E. Estevez, Andy M. Connor

    Abstract: In this paper, we describe the development and operating principles of an immersive virtual reality (VR) visualisation environment that is designed around the use of consumer VR headsets in an existing wide area motion capture suite. We present two case studies in the application areas of visualisation of scientific and engineering data. Each of these case studies utilise a different render engine… ▽ More

    Submitted 19 April, 2016; originally announced April 2016.

  14. A wearable haptic game controller

    Authors: Jacques Foottit, Dave Brown, Stefan Marks, Andy M. Connor

    Abstract: This paper outlines the development of a wearable game controller incorporating vibrotacticle haptic feedback that provides a low cost, versatile and intuitive interface for controlling digital games. The device differs from many traditional haptic feedback implementation in that it combines vibrotactile based haptic feedback with gesture based input, thus becoming a two way conduit between the us… ▽ More

    Submitted 19 April, 2016; originally announced April 2016.

    Journal ref: International Journal of Game Theory & Technology, 2(1), 1-19 (2016)

  15. arXiv:1110.5091  [pdf, other

    q-bio.BM cs.CE physics.bio-ph physics.data-an

    3D Protein Structure Predicted from Sequence

    Authors: Debora S. Marks, Lucy J. Colwell, Robert Sheridan, Thomas A. Hopf, Andrea Pagnani, Riccardo Zecchina, Chris Sander

    Abstract: The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a larg… ▽ More

    Submitted 25 October, 2011; v1 submitted 23 October, 2011; originally announced October 2011.

    Comments: Debora S Marks and Lucy J Colwell are joint first authors. Supplement and Appendices at: http://cbio.mskcc.org/foldingproteins. Updated version 25-Oct-2011 with '3D' added to the title and corrections of details in the methods section to make it compatible with derivation of equations in the main text and in the supplement