Skip to main content

Showing 1–2 of 2 results for author: Thornley, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00805  [pdf, other

    cs.AI

    Towards shutdownable agents via stochastic choice

    Authors: Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho, Louis Thomson

    Abstract: Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn't happen. A key part of the IPP is using a novel 'Discounted REward for Same-Length Trajectories (DREST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2403.04471  [pdf

    cs.AI

    The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

    Authors: Elliott Thornley

    Abstract: I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don't try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prev… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.