Skip to main content

Showing 1–18 of 18 results for author: Kenton, Z

.
  1. arXiv:2404.16244  [pdf, other

    cs.CY

    The Ethics of Advanced AI Assistants

    Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

    Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  2. arXiv:2404.15058  [pdf, other

    cs.CY cs.AI

    A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

    Authors: Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt, Sasha Brown

    Abstract: Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  3. arXiv:2312.10029  [pdf, other

    cs.LG cs.AI

    Challenges with unsupervised LLM knowledge discovery

    Authors: Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah

    Abstract: We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most prominent. The idea behind unsupervised knowledge elicitation is that knowledge satisfies a consistency structure, which can be used to discover knowledge. We first prove theoretically that arbitrary features (no… ▽ More

    Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 12 pages (38 including references and appendices). First three authors equal contribution, randomised order

  4. arXiv:2309.02390  [pdf, other

    cs.LG

    Explaining grokking through circuit efficiency

    Authors: Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar

    Abstract: One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing la… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  5. arXiv:2210.01790  [pdf, other

    cs.LG

    Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

    Authors: Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton

    Abstract: The field of AI alignment is concerned with AI systems that pursue unintended goals. One commonly studied mechanism by which an unintended goal might arise is specification gaming, in which the designer-provided specification is flawed in a way that the designers did not foresee. However, an AI system may pursue an undesired goal even when the specification is correct, in the case of goal misgener… ▽ More

    Submitted 2 November, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  6. arXiv:2208.08345  [pdf, other

    cs.AI cs.LG

    Discovering Agents

    Authors: Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

    Abstract: Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that woul… ▽ More

    Submitted 24 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Some typos corrected

  7. arXiv:2201.08102  [pdf, other

    cs.LG

    Safe Deep RL in 3D Environments using Human Feedback

    Authors: Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

    Abstract: Agents should avoid unsafe behaviour during both training and deployment. This typically requires a simulator and a procedural specification of unsafe behaviour. Unfortunately, a simulator is not always available, and procedurally specifying constraints can be difficult or impossible for many real-world tasks. A recently introduced technique, ReQueST, aims to solve this problem by learning a neura… ▽ More

    Submitted 21 January, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  8. arXiv:2112.04359  [pdf, other

    cs.CL cs.AI cs.CY

    Ethical and social risks of harm from Language Models

    Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

    Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  9. arXiv:2103.14659  [pdf, other

    cs.AI cs.LG

    Alignment of Language Agents

    Authors: Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

    Abstract: For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  10. arXiv:2012.05672  [pdf, other

    cs.LG cs.AI cs.MA

    Imitating Interactive Intelligence

    Authors: Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne , et al. (4 additional authors not shown)

    Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha… ▽ More

    Submitted 20 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

  11. arXiv:1912.10481  [pdf, other

    stat.ML cs.LG eess.IV

    A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

    Authors: Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal

    Abstract: Evaluation of Bayesian deep learning (BDL) methods is challenging. We often seek to evaluate the methods' robustness and scalability, assessing whether new tools give `better' uncertainty estimates than old ones. These evaluations are paramount for practitioners when choosing BDL tools on-top of which they build their applications. Current popular evaluations of BDL methods, such as the UCI experi… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  12. arXiv:1907.01475  [pdf, other

    cs.LG cs.AI stat.ML

    Generalizing from a few environments in safety-critical reinforcement learning

    Authors: Zachary Kenton, Angelos Filos, Owain Evans, Yarin Gal

    Abstract: Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training, allowing them to learn about every possible danger, but this is often impractical. This paper investigates safety and generalization from a limited number of training environments in deep rein… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  13. arXiv:1807.05031  [pdf, other

    stat.ML cs.LG

    On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

    Authors: Stanisław Jastrzębski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly… ▽ More

    Submitted 23 December, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

    Journal ref: International Conference on Learning Representations (ICLR) 2019

  14. arXiv:1711.04623  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Three Factors Influencing Minima in SGD

    Authors: Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic… ▽ More

    Submitted 13 September, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: First two authors contributed equally. Short version accepted into ICLR workshop. Accepted to Artificial Neural Networks and Machine Learning, ICANN 2018

  15. arXiv:1605.03435  [pdf, other

    astro-ph.CO gr-qc hep-th

    The Separate Universe Approach to Soft Limits

    Authors: Zachary Kenton, David J. Mulryne

    Abstract: We develop a formalism for calculating soft limits of $n$-point inflationary correlation functions using separate universe techniques. Our method naturally allows for multiple fields and leads to an elegant diagrammatic approach. As an application we focus on the trispectrum produced by inflation with multiple light fields, giving explicit formulae for all possible single- and double-soft limits.… ▽ More

    Submitted 9 December, 2016; v1 submitted 11 May, 2016; originally announced May 2016.

    Comments: 28 pages, 7 figures. This is an author-created, un-copyedited version of an article published in JCAP. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at the DOI below. v3: Updated to match version published in JCAP

  16. The squeezed limit of the bispectrum in multi-field inflation

    Authors: Zachary Kenton, David J. Mulryne

    Abstract: We calculate the squeezed limit of the bispectrum produced by inflation with multiple light fields. To achieve this we allow for different horizon exit times for each mode and calculate the intrinsic field-space three-point function in the squeezed limit using soft-limit techniques. We then use the $δN$ formalism from the time the last mode exits the horizon to calculate the bispectrum of the prim… ▽ More

    Submitted 6 April, 2016; v1 submitted 30 July, 2015; originally announced July 2015.

    Comments: 34 pages, 7 figures. Updated to match published version (corrected typos)

  17. arXiv:1504.05736  [pdf, other

    astro-ph.CO hep-th

    Generating the cosmic microwave background power asymmetry with $g_{NL}$

    Authors: Zachary Kenton, David J. Mulryne, Steven Thomas

    Abstract: We consider a higher order term in the $δN$ expansion for the CMB power asymmetry generated by a superhorizon isocurvature field fluctuation. The term can generate the asymmetry without requiring a large value of $f_{NL}$. Instead it produces a non-zero value of $g_{NL}$. A combination of constraints lead to an allowed region in $f_{NL}-g_{NL}$ space. To produce the asymmetry with this term withou… ▽ More

    Submitted 7 July, 2015; v1 submitted 22 April, 2015; originally announced April 2015.

    Comments: 6 pages, 1 figure. Updated to match published version. Minor typographical corrections

    Journal ref: Phys. Rev. D 92, 023505 (2015)

  18. arXiv:1409.1221  [pdf, other

    hep-th astro-ph.CO gr-qc

    D-brane Potentials in the Warped Resolved Conifold and Natural Inflation

    Authors: Zachary Kenton, Steven Thomas

    Abstract: In this paper we obtain a model of Natural Inflation from string theory with a Planckian decay constant. We investigate D-brane dynamics in the background of the warped resolved conifold (WRC) throat approximation of Type IIB string compactifications on Calabi-Yau manifolds. When we glue the throat to a compact bulk Calabi-Yau, we generate a D-brane potential which is a solution to the Laplace equ… ▽ More

    Submitted 27 February, 2015; v1 submitted 3 September, 2014; originally announced September 2014.

    Comments: 41 pages, 3 appendices, 1 figure, PDFLaTex; various clarifications added along with a new appendix on b-axions and wrapped D5 branes;version matches the one published in JHEP

    Journal ref: JHEP02(2015);127