Skip to main content

Showing 1–3 of 3 results for author: van Merwijk, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05540  [pdf, other

    cs.CY cs.LG

    Extinction Risks from AI: Invisible to Science?

    Authors: Vojtech Kovarik, Christian van Merwijk, Ida Mattsson

    Abstract: In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law hold… ▽ More

    Submitted 2 February, 2024; originally announced March 2024.

  2. arXiv:2202.11629  [pdf, other

    cs.AI stat.ML

    A Complete Criterion for Value of Information in Soluble Influence Diagrams

    Authors: Chris van Merwijk, Ryan Carey, Tom Everitt

    Abstract: Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems. A key building block for this analysis is a graphical criterion for value of information (VoI). This paper establishes the first complete graphical criterion for VoI in influence diagrams with multiple decisions. Along the way, we establish two important techniques for proving properties of mult… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: In Proceedings of the AAAI 2022 Conference

  3. arXiv:1906.01820  [pdf, other

    cs.AI

    Risks from Learned Optimization in Advanced Machine Learning Systems

    Authors: Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

    Abstract: We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances… ▽ More

    Submitted 1 December, 2021; v1 submitted 5 June, 2019; originally announced June 2019.