Skip to main content

Showing 1–13 of 13 results for author: Jenner, E

.
  1. arXiv:2406.00877  [pdf, other

    cs.LG cs.AI

    Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

    Authors: Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell

    Abstract: Do neural networks learn to implement algorithms such as look-ahead or search "in the wild"? Or do they rely purely on collections of simple heuristics? We present evidence of learned look-ahead in the policy network of Leela Chess Zero, the currently strongest neural chess engine. We find that Leela internally represents future optimal moves and that these representations are crucial for its fina… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project page: https://leela-interp.github.io/

  2. arXiv:2405.20519  [pdf, other

    cs.AI

    Diffusion On Syntax Trees For Program Synthesis

    Authors: Shreyas Kapur, Erik Jenner, Stuart Russell

    Abstract: Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion mode… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: https://tree-diffusion.github.io

  3. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2402.17747  [pdf, other

    cs.LG cs.AI stat.ML

    When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

    Authors: Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

    Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is… ▽ More

    Submitted 8 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  5. arXiv:2309.15257  [pdf, other

    cs.LG cs.AI

    STARC: A General Framework For Quantifying Differences Between Reward Functions

    Authors: Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

    Abstract: In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward fun… ▽ More

    Submitted 11 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  6. arXiv:2211.11972  [pdf, other

    cs.LG cs.AI

    imitation: Clean Imitation Learning Implementations

    Authors: Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

    Abstract: imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  7. arXiv:2208.09570  [pdf, ps, other

    cs.LG

    Calculus on MDPs: Potential Sha** as a Gradient

    Authors: Erik Jenner, Herke van Hoof, Adam Gleave

    Abstract: In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly well-known and important example is potential sha**, a class of functions that can be added to any reward function without changing the optimal policy set under arbitrary transition dynamics. Potential sha** is conceptually similar to potentials, conservative vec… ▽ More

    Submitted 2 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Fixed mistake in proof that affected several results

  8. arXiv:2203.13553  [pdf, other

    cs.LG

    Preprocessing Reward Functions for Interpretability

    Authors: Erik Jenner, Adam Gleave

    Abstract: In many real-world applications, the reward function is too complex to be manually specified. In such cases, reward functions must instead be learned from human feedback. Since the learned reward may fail to represent user preferences, it is important to be able to validate the learned reward function prior to deployment. One promising approach is to apply interpretability tools to the reward func… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Presented at the NeurIPS 2021 Cooperative AI workshop. Code available at https://github.com/HumanCompatibleAI/reward-preprocessing

  9. arXiv:2110.02750  [pdf, other

    cs.DS cs.CV cs.LG stat.ML

    Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice

    Authors: Erik Jenner, Enrique Fita Sanmartín, Fred A. Hamprecht

    Abstract: The minimum graph cut and minimum $s$-$t$-cut problems are important primitives in the modeling of combinatorial problems in computer science, including in computer vision and machine learning. Some of the most efficient algorithms for finding global minimum cuts are randomized algorithms based on Karger's groundbreaking contraction algorithm. Here, we study whether Karger's algorithm can be succe… ▽ More

    Submitted 16 December, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Oral at ICCV 2021; added acknowledgements

  10. arXiv:2106.10163  [pdf, other

    cs.LG cs.CV

    Steerable Partial Differential Operators for Equivariant Neural Networks

    Authors: Erik Jenner, Maurice Weiler

    Abstract: Recent work in equivariant deep learning bears strong similarities to physics. Fields over a base space are fundamental entities in both subjects, as are equivariant maps between these fields. In deep learning, however, these maps are usually defined by convolutions with a kernel, whereas they are partial differential operators (PDOs) in physics. Develo** the theory of equivariant PDOs in the co… ▽ More

    Submitted 23 April, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Published at ICLR 2022, code available at https://github.com/ejnnr/steerable_pdos

  11. arXiv:1506.07344  [pdf, ps, other

    cond-mat.soft

    Stability of Superhydrophobic Ring & Axle Liquid Bearings

    Authors: Elliot Jenner, Brian D'Urso

    Abstract: Friction between contacting solid surfaces is a dominant force on the micro-scale and a major consideration in the design of MEMS. Non-contact fluid bearings have been investigated as a way to mitigate this issue. Here we discuss a new design for surface tension-supported thrust bearings utilizing patterned superhydrophobic surfaces to achieve improved drag reduction. We examine sources of instabi… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

    Comments: 6 Pages

  12. Absolute Measurement Of Laminar Shear Rate Using Photon Correlation Spectroscopy

    Authors: Elliot Jenner, Brian D'Urso

    Abstract: An absolute measurement of the components of the shear rate tensor $\mathcal{S}$ in a fluid can be found by measuring the photon correlation function of light scattered from particles in the fluid. Previous methods of measuring $\mathcal{S}$ involve reading the velocity at various points and extrapolating the shear, which can be time consuming and is limited in its ability to examine small spatial… ▽ More

    Submitted 11 May, 2015; v1 submitted 8 May, 2015; originally announced May 2015.

    Comments: 9 page main article + 6 page appendices containing detailed theoretical derivations (previous submission failed to provide access to separate supplemental material files, so these are inserted into the main article as appendices. Figure 2a title and axis labels changed to improve readability.)

  13. arXiv:1406.0787  [pdf, ps, other

    physics.flu-dyn

    Large Drag Reduction over Superhydrophobic Riblets

    Authors: Charlotte Barbier, Elliot Jenner, Brian D'Urso

    Abstract: Riblets and superhydrophobic surfaces are two demonstrated passive drag reduction techniques. We describe a method to fabricate surfaces that combine both of these techniques in order to increase drag reduction properties. Samples have been tested with a cone-and-plate rheometer system, and have demonstrated significant drag reduction even in the transitional-turbulent regime. Direct Numerical Sim… ▽ More

    Submitted 2 June, 2014; originally announced June 2014.