Skip to main content

Showing 1–1 of 1 results for author: Grietzer, P

.
  1. arXiv:2310.08043  [pdf, other

    cs.AI

    Understanding and Controlling a Maze-Solving Policy Network

    Authors: Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

    Abstract: To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares. We find this network pursues multiple context-dependent goals, and we further identify circuits within the network that correspond to one of these goals. In particular, we identified eleven channels that track th… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 46 pages