Search | arXiv e-print repository

Steering Llama 2 via Contrastive Activation Addition

Authors: Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

Abstract: We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes. CAA computes "steering vectors" by averaging the difference in residual stream activations between pairs of positive and negative examples of a particular behavior, such as factual versus hallucinatory responses. During inference, these steerin… ▽ More We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes. CAA computes "steering vectors" by averaging the difference in residual stream activations between pairs of positive and negative examples of a particular behavior, such as factual versus hallucinatory responses. During inference, these steering vectors are added at all token positions after the user's prompt with either a positive or negative coefficient, allowing precise control over the degree of the targeted behavior. We evaluate CAA's effectiveness on Llama 2 Chat using multiple-choice behavioral question datasets and open-ended generation tasks. We demonstrate that CAA significantly alters model behavior, is effective over and on top of traditional methods like finetuning and system prompt design, and minimally reduces capabilities. Moreover, we gain deeper insights into CAA's mechanisms by employing various activation space interpretation methods. CAA accurately steers model outputs and sheds light on how high-level concepts are represented in Large Language Models (LLMs). △ Less

Submitted 6 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2310.08043 [pdf, other]

Understanding and Controlling a Maze-Solving Policy Network

Authors: Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

Abstract: To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares. We find this network pursues multiple context-dependent goals, and we further identify circuits within the network that correspond to one of these goals. In particular, we identified eleven channels that track th… ▽ More To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares. We find this network pursues multiple context-dependent goals, and we further identify circuits within the network that correspond to one of these goals. In particular, we identified eleven channels that track the location of the goal. By modifying these channels, either with hand-designed interventions or by combining forward passes, we can partially control the policy. We show that this network contains redundant, distributed, and retargetable goal representations, shedding light on the nature of goal-direction in trained policy networks. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 46 pages

arXiv:2309.05440 [pdf]

Emissions and energy efficiency on large-scale high performance computing facilities: ARCHER2 UK national supercomputing service case study

Authors: Adrian Jackson, Alan Simpson, Andrew Turner

Abstract: Large supercomputing facilities are critical to research in many areas that impact on decisions such as how to address the current climate emergency. For example, climate modelling, renewable energy facility design and new battery technologies. However, these systems themselves are a source of large amounts of emissions due to the embodied emissions associated with their construction, transport, a… ▽ More Large supercomputing facilities are critical to research in many areas that impact on decisions such as how to address the current climate emergency. For example, climate modelling, renewable energy facility design and new battery technologies. However, these systems themselves are a source of large amounts of emissions due to the embodied emissions associated with their construction, transport, and decommissioning; and the power consumption associated with running the facility. Recently, the UK National Supercomputing Service, ARCHER2, has been analysing the impact of the facility in terms of energy and emissions. Based on this work, we have made changes to the operation of the service that give a cumulative saving of more than 20% in power draw of the computational resources with all application benchmarks showing reduced power to solution. In this paper, we describe our analysis and the changes made to the operation of the service to improve its energy efficiency, and thereby reduce its climate impacts. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.13561 [pdf, other]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data. △ Less

Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.10248 [pdf, other]

Activation Addition: Steering Language Models Without Optimization

Authors: Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid

Abstract: Reliably controlling the behavior of large language models is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback, prompt engineering and guided decoding. We instead investigate activation engineering: modifying activations at inference-time to predictably alter model behavior. We bias the forward pass with a 'steering vector' implici… ▽ More Reliably controlling the behavior of large language models is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback, prompt engineering and guided decoding. We instead investigate activation engineering: modifying activations at inference-time to predictably alter model behavior. We bias the forward pass with a 'steering vector' implicitly specified through natural language. Past work learned these steering vectors; our Activation Addition (ActAdd) method instead computes them by taking activation differences resulting from pairs of prompts. We demonstrate ActAdd on a range of LLMs (LLaMA-3, OPT, GPT-2, and GPT-J), obtaining SOTA on detoxification and negative-to-positive sentiment control. Our approach yields inference-time control over high-level properties of output like topic and sentiment while preserving performance on off-target tasks. ActAdd takes far less compute and implementation effort than finetuning or RLHF, allows users control through natural language, and its computational overhead (as a fraction of inference time) appears stable or improving over increasing model size. △ Less

Submitted 4 June, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2303.11731 [pdf, other]

Automated service monitoring in the deployment of ARCHER2

Authors: Kieran Leach, Philip Cass, Steven Robson, Eimantas Kazakevicius, Martin Lafferty, Andrew Turner, Alan Simpson

Abstract: The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and developmen… ▽ More The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 7 pages

ACM Class: C.5.1; C.4

arXiv:2302.02477 [pdf, other]

Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment

Authors: Qitong Gao, Stephen L. Schimdt, Afsana Chowdhury, Guangyu Feng, Jennifer J. Peters, Katherine Genty, Warren M. Grill, Dennis A. Turner, Miroslav Pajic

Abstract: Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifet… ▽ More Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifetime of the device, cannot adapt treatment dynamically for activity, and may cause significant side-effects (e.g., gait impairment). In this work, we introduce an offline reinforcement learning (RL) framework, allowing the use of past clinical data to train an RL policy to adjust the stimulation amplitude in real time, with the goal of reducing energy use while maintaining the same level of treatment (i.e., control) efficacy as cDBS. Moreover, clinical protocols require the safety and performance of such RL controllers to be demonstrated ahead of deployments in patients. Thus, we also introduce an offline policy evaluation (OPE) method to estimate the performance of RL policies using historical data, before deploying them on patients. We evaluated our framework on four PD patients equipped with the RC+S DBS system, employing the RL controllers during monthly clinical visits, with the overall control efficacy evaluated by severity of symptoms (i.e., bradykinesia and tremor), changes in PD biomakers (i.e., local field potentials), and patient ratings. The results from clinical experiments show that our RL-based controller maintains the same level of control efficacy as cDBS, but with significantly reduced stimulation energy. Further, the OPE method is shown effective in accurately estimating and ranking the expected returns of RL controllers. △ Less

Submitted 15 March, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Accepted to International Conference on Cyber Physical Systems (ICCPS) 2023

arXiv:2206.13477 [pdf, other]

Parametrically Retargetable Decision-Makers Tend To Seek Power

Authors: Alexander Matt Turner, Prasad Tadepalli

Abstract: If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by kee** options open and staying alive. However, the real world is neither fully observable, nor must trained… ▽ More If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by kee** options open and staying alive. However, the real world is neither fully observable, nor must trained agents be even approximately reward-optimal. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We discover that many decision-making functions are retargetable, and that retargetability is sufficient to cause power-seeking tendencies. Our functional criterion is simple and broad. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, retargetable training procedures may train real-world agents which seek power over humans. △ Less

Submitted 11 October, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 10-page main paper, 36 pages total, poster at NeurIPS 2022

arXiv:2206.11831 [pdf, other]

On Avoiding Power-Seeking by Artificial Intelligence

Authors: Alexander Matt Turner

Abstract: We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether -- absent a full solution to this AI alignment problem -- we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power. In this thesis, I introduce the attainable utility preservation (AUP) method. I demonstrate that AUP produces conservati… ▽ More We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether -- absent a full solution to this AI alignment problem -- we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power. In this thesis, I introduce the attainable utility preservation (AUP) method. I demonstrate that AUP produces conservative, option-preserving behavior within toy gridworlds and within complex environments based off of Conway's Game of Life. I formalize the problem of side effect avoidance, which provides a way to quantify the side effects an agent had on the world. I also give a formal definition of power-seeking in the context of AI agents and show that optimal policies tend to seek power. In particular, most reward functions have optimal policies which avoid deactivation. This is a problem if we want to deactivate or correct an intelligent agent after we have deployed it. My theorems suggest that since most agent goals conflict with ours, the agent would very probably resist correction. I extend these theorems to show that power-seeking incentives occur not just for optimal decision-makers, but under a wide range of decision-making procedures. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: 287 pages, PhD thesis

arXiv:2206.11812 [pdf, other]

Formalizing the Problem of Side Effect Regularization

Authors: Alexander Matt Turner, Aseem Saxena, Prasad Tadepalli

Abstract: AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI's side effects: Agents must weigh off "how much of a mess they make" with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process… ▽ More AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI's side effects: Agents must weigh off "how much of a mess they make" with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process (POMDP) representing its uncertainty about the objective function it should optimize. We consider the setting where the true objective is revealed to the agent at a later time step. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks. We empirically demonstrate the reasonableness of our problem formalization via ground-truth evaluation in two gridworld environments. △ Less

Submitted 8 November, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: 14 pages, accepted to ML Safety Workshop at NeurIPS 2022. Alexander Turner and Aseem Saxena contributed equally

arXiv:2206.11061 [pdf, other]

An Ontological Approach to Analysing Social Service Provisioning

Authors: Mark S. Fox, Bart Gajderowicz, Daniela Rosu, Alina Turner, Lester Lyu

Abstract: This paper introduces ontological concepts required to evaluate and manage the coverage of social services in a Smart City context. Here, we focus on the perspective of key stakeholders, namely social purpose organizations and the clients they serve. The Compass ontology presented here extends the Common Impact Data Standard by introducing new concepts related to key dimensions: the who (Stakehold… ▽ More This paper introduces ontological concepts required to evaluate and manage the coverage of social services in a Smart City context. Here, we focus on the perspective of key stakeholders, namely social purpose organizations and the clients they serve. The Compass ontology presented here extends the Common Impact Data Standard by introducing new concepts related to key dimensions: the who (Stakeholder), the what (Need, Need Satisfier, Outcome), the how (Service, Event), and the contributions (tracking resources). The paper first introduces key stakeholders, services, outcomes, events, needs and need satisfiers, along with their definitions. Second, a subset of competency questions are presented to illustrate the types of questions key stakeholders have posed. Third, the extension's ability to answer questions is evaluated by presenting SPARQL queries executed on a Compass-based knowledge graph and analysing their results. △ Less

Submitted 24 June, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Update: corrected email, header text

arXiv:2203.07989 [pdf, ps, other]

Approximability and Generalisation

Authors: Andrew J. Turner, Ata Kabán

Abstract: Approximate learning machines have become popular in the era of small devices, including quantised, factorised, hashed, or otherwise compressed predictors, and the quest to explain and guarantee good generalisation abilities for such methods has just begun. In this paper we study the role of approximability in learning, both in the full precision and the approximated settings of the predictor that… ▽ More Approximate learning machines have become popular in the era of small devices, including quantised, factorised, hashed, or otherwise compressed predictors, and the quest to explain and guarantee good generalisation abilities for such methods has just begun. In this paper we study the role of approximability in learning, both in the full precision and the approximated settings of the predictor that is learned from the data, through a notion of sensitivity of predictors to the action of the approximation operator at hand. We prove upper bounds on the generalisation of such predictors, yielding the following main findings, for any PAC-learnable class and any given approximation operator. 1) We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample, provided sufficient unlabelled data. 2) We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees. 3) We highlight natural examples of structure in the class of sensitivities, which reduce, and possibly even eliminate the otherwise abundant requirement of additional unlabelled data, and henceforth shed new light onto what makes one problem instance easier to learn than another. These results embed the scope of modern model compression approaches into the general goal of statistical learning theory, which in return suggests appropriate algorithms through minimising uniform bounds. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 25 pages

arXiv:2202.07590 [pdf, other]

Identifying equivalent Calabi--Yau topologies: A discrete challenge from math and physics for machine learning

Authors: Vishnu Jejjala, Washington Taylor, Andrew Turner

Abstract: We review briefly the characteristic topological data of Calabi--Yau threefolds and focus on the question of when two threefolds are equivalent through related topological data. This provides an interesting test case for machine learning methodology in discrete mathematics problems motivated by physics. We review briefly the characteristic topological data of Calabi--Yau threefolds and focus on the question of when two threefolds are equivalent through related topological data. This provides an interesting test case for machine learning methodology in discrete mathematics problems motivated by physics. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 6 pages, 3 figures; Contribution to proceedings of 2021 Nankai symposium on Mathematical Dialogues in celebration of S. S. Chern's 110th anniversary

Report number: MIT-CTP-5406

arXiv:2009.11806 [pdf, other]

Investigating Applications on the A64FX

Authors: Adrian Jackson, Michèle Weiland, Nick Brown, Andrew Turner, Mark Parsons

Abstract: The A64FX processor from Fujitsu, being designed for computational simulation and machine learning applications, has the potential for unprecedented performance in HPC systems. In this paper, we evaluate the A64FX by benchmarking against a range of production HPC platforms that cover a number of processor technologies. We investigate the performance of complex scientific applications across multip… ▽ More The A64FX processor from Fujitsu, being designed for computational simulation and machine learning applications, has the potential for unprecedented performance in HPC systems. In this paper, we evaluate the A64FX by benchmarking against a range of production HPC platforms that cover a number of processor technologies. We investigate the performance of complex scientific applications across multiple nodes, as well as single node and mini-kernel benchmarks. This paper finds that the performance of the A64FX processor across our chosen benchmarks often significantly exceeds other platforms, even without specific application optimisations for the processor instruction set or hardware. However, this is not true for all the benchmarks we have undertaken. Furthermore, the specific configuration of applications can have an impact on the runtime and performance experienced. △ Less

Submitted 24 September, 2020; originally announced September 2020.

arXiv:2006.06547 [pdf, other]

Avoiding Side Effects in Complex Environments

Authors: Alexander Matt Turner, Neale Ratzlaff, Prasad Tadepalli

Abstract: Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on… ▽ More Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid many side effects. Videos and code are available at https://avoiding-side-effects.github.io/. △ Less

Submitted 22 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Accepted as spotlight paper at NeurIPS 2020. 10 pages main paper; 19 pages with appendices

arXiv:2001.01707 [pdf]

doi 10.1109/TBME.2020.2964724

Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia

Authors: Haleh Falakshahi, Victor M. Vergara, **gyu Liu, Daniel H. Mathalon, Judith M. Ford, James Voyvodic, Bryon A. Mueller, Aysenil Belger, Sarah McEwen, Steven G. Potkin, Adrian Preda, Hooman Rokham, **g Sui, Jessica A. Turner, Sergey Plis, Vince D. Calhoun

Abstract: Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. In this paper, we introduce a method that takes advantage of multimodal data in addressing the hyp… ▽ More Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). Methods: We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. Results: Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. Conclusion: We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. Significance: The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Journal ref: IEEE Transactions on Biomedical Engineering, 2019

arXiv:1912.02771 [pdf, other]

Label-Consistent Backdoor Attacks

Authors: Alexander Turner, Dimitris Tsipras, Aleksander Madry

Abstract: Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, t… ▽ More Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are---often blatantly---mislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain label-consistency---the condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger. △ Less

Submitted 6 December, 2019; v1 submitted 5 December, 2019; originally announced December 2019.

arXiv:1912.01683 [pdf, other]

Optimal Policies Tend to Seek Power

Authors: Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Abstract: Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decisio… ▽ More Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by kee** a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states. △ Less

Submitted 28 January, 2023; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Accepted to NeurIPS 2021 as spotlight paper. 12 pages, 44 pages with appendices. Since the 2021 acceptance, we updated the paper to point out that optimal policies can be qualitatively divorced from real-world learned policies

arXiv:1906.03891 [pdf, other]

Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using Cray LASSi and EPCC SAFE

Authors: Andrew Turner, Dominic Sloan-Murphy, Karthee Sivalingam, Harvey Richardson, Julian Kunkel

Abstract: In this paper, we describe how we have used a combination of the LASSi tool (developed by Cray) and the SAFE software (developed by EPCC) to collect and analyse Lustre I/O performance data for all jobs running on the UK national supercomputing service, ARCHER; and to provide reports on I/O usage for users in our standard reporting framework. We also present results from analysis of parallel I/O us… ▽ More In this paper, we describe how we have used a combination of the LASSi tool (developed by Cray) and the SAFE software (developed by EPCC) to collect and analyse Lustre I/O performance data for all jobs running on the UK national supercomputing service, ARCHER; and to provide reports on I/O usage for users in our standard reporting framework. We also present results from analysis of parallel I/O use on ARCHER and analysis on the potential impact of different applications on file system performance using metrics we have derived from the LASSi data. We show that the performance data from LASSi reveals how the same application can stress different components of the file system depending on how it is run, and how the LASSi risk metrics allow us to identify use cases that could potentially cause issues for global I/O performance and work with users to improve their I/O use. We use the IO-500 benchmark to help us understand how LASSi risk metrics correspond to observed performance on the ARCHER file systems. We also use LASSi data imported into SAFE to identify I/O use patterns associated with different research areas, understand how the research workflow gives rise to the observed patterns and project how this will affect I/O requirements in the future. Finally, we provide an overview of likely future directions for the continuation of this work. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 15 pages, 19 figures, 5 tables, 2019 Cray User Group Meeting (CUG) , Montreal, Canada

arXiv:1904.04250 [pdf, other]

Evaluating the Arm Ecosystem for High Performance Computing

Authors: Adrian Jackson, Andrew Turner, Michele Weiland, Nick Johnson, Olly Perks, Mark Parsons

Abstract: In recent years, Arm-based processors have arrived on the HPC scene, offering an alternative the existing status quo, which was largely dominated by x86 processors. In this paper, we evaluate the Arm ecosystem, both the hardware offering and the software stack that is available to users, by benchmarking a production HPC platform that uses Marvell's ThunderX2 processors. We investigate the performa… ▽ More In recent years, Arm-based processors have arrived on the HPC scene, offering an alternative the existing status quo, which was largely dominated by x86 processors. In this paper, we evaluate the Arm ecosystem, both the hardware offering and the software stack that is available to users, by benchmarking a production HPC platform that uses Marvell's ThunderX2 processors. We investigate the performance of complex scientific applications across multiple nodes, and we also assess the maturity of the software stack and the ease of use from a users' perspective. This papers finds that the performance across our benchmarking applications is generally as good as, or better, than that of well-established platforms, and we can conclude from our experience that there are no major hurdles that might hinder wider adoption of this ecosystem within the HPC community. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: 18 pages, accepted at PASC19, 1 figure

arXiv:1902.09725 [pdf, other]

doi 10.1145/3375627.3375851

Conservative Agency via Attainable Utility Preservation

Authors: Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli

Abstract: Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes optimization of the correctly specified reward function, then correction is futile. For example, a robotic factory assistant could break expensive equipment due to a… ▽ More Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes optimization of the correctly specified reward function, then correction is futile. For example, a robotic factory assistant could break expensive equipment due to a reward misspecification; even if the designers immediately correct the reward function, the damage is done. To mitigate this risk, we introduce an approach that balances optimization of the primary reward function with preservation of the ability to optimize auxiliary reward functions. Surprisingly, even when the auxiliary reward functions are randomly generated and therefore uninformative about the correctly specified reward function, this approach induces conservative, effective behavior. △ Less

Submitted 10 June, 2020; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: Published in AI, Ethics, and Society 2020

arXiv:1810.08677 [pdf, other]

A neural network to classify metaphorical violence on cable news

Authors: Matthew A. Turner

Abstract: I present here an experimental system for identifying and annotating metaphor in corpora. It is designed to plug in to Metacorps, an experimental web app for annotating metaphor. As Metacorps users annotate metaphors, the system will use user annotations as training data. When the system is confident, it will suggest an identification and an annotation. Once approved by the user, this becomes more… ▽ More I present here an experimental system for identifying and annotating metaphor in corpora. It is designed to plug in to Metacorps, an experimental web app for annotating metaphor. As Metacorps users annotate metaphors, the system will use user annotations as training data. When the system is confident, it will suggest an identification and an annotation. Once approved by the user, this becomes more training data. This naturally allows for transfer learning, where the system can, with some known degree of reliability, classify one class of metaphor after only being trained on another class of metaphor. For example, in our metaphorical violence project, metaphors may be classified by the network they were observed on, the grammatical subject or object of the violence metaphor, or the violent word used (hit, attack, beat, etc.). △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: 6 pages, 1 figure, 1 table

arXiv:1805.12152 [pdf, other]

Robustness May Be at Odds with Accuracy

Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry

Abstract: We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in… ▽ More We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed empirically in more complex settings. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception. △ Less

Submitted 9 September, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

Comments: ICLR'19

arXiv:1407.2206 [pdf]

doi 10.1109/ICNIT.2010.5508469

Sequential Data Mining using Correlation Matrix Memory

Authors: Sanil Shanker KP, Aaron Turner, Elizabeth Sherly, Jim Austin

Abstract: This paper proposes a method for sequential data mining using correlation matrix memory. Here, we use the concept of the Logical Match to mine the indices of the sequential pattern. We demonstrate the uniqueness of the method with both the artificial and the real datum taken from NCBI databank. This paper proposes a method for sequential data mining using correlation matrix memory. Here, we use the concept of the Logical Match to mine the indices of the sequential pattern. We demonstrate the uniqueness of the method with both the artificial and the real datum taken from NCBI databank. △ Less

Submitted 6 July, 2014; originally announced July 2014.

Comments: Networking and Information Technology (ICNIT), 2010 International Conference on

arXiv:1309.1101 [pdf, ps, other]

Simplifying the Development, Use and Sustainability of HPC Software

Authors: Jeremy Cohen, Chris Cantwell, Neil Chue Hong, David Moxey, Malcolm Illingworth, Andrew Turner, John Darlington, Spencer Sherwin

Abstract: Develo** software to undertake complex, compute-intensive scientific processes requires a challenging combination of both specialist domain knowledge and software development skills to convert this knowledge into efficient code. As computational platforms become increasingly heterogeneous and newer types of platform such as Infrastructure-as-a-Service (IaaS) cloud computing become more widely ac… ▽ More Develo** software to undertake complex, compute-intensive scientific processes requires a challenging combination of both specialist domain knowledge and software development skills to convert this knowledge into efficient code. As computational platforms become increasingly heterogeneous and newer types of platform such as Infrastructure-as-a-Service (IaaS) cloud computing become more widely accepted for HPC computations, scientists require more support from computer scientists and resource providers to develop efficient code and make optimal use of the resources available to them. As part of the libhpc stage 1 and 2 projects we are develo** a framework to provide a richer means of job specification and efficient execution of complex scientific software on heterogeneous infrastructure. The use of such frameworks has implications for the sustainability of scientific software. In this paper we set out our develo** understanding of these challenges based on work carried out in the libhpc project. △ Less

Submitted 4 September, 2013; originally announced September 2013.

Comments: 4 page position paper, submission to WSSSPE13 workshop

arXiv:1209.5922 [pdf]

Towards structured sharing of raw and derived neuroimaging data across existing resources

Authors: D. B. Keator, K. Helmer, J. Steffener, J. A. Turner, T. G. M. Van Erp, S. Gadde, N. Ashish, G. A. Burns, B. N. Nichols, S. S. Ghosh

Abstract: Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from… ▽ More Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery. △ Less

Submitted 6 March, 2013; v1 submitted 26 September, 2012; originally announced September 2012.

arXiv:cs/0607072 [pdf]

Effect of Interface Style in Peer Review Comments for UML Designs

Authors: Scott A. Turner, Manuel A. Perez-Quinones, Stephen H. Edwards

Abstract: This paper presents our evaluation of using a Tablet-PC to provide peer-review comments in the first year Computer Science course. Our exploration consisted of an evaluation of how students write comments on other students' assignments using three different methods: pen and paper, a Tablet-PC, and a desktop computer. Our ultimate goal is to explore the effect that interface style (Tablet vs. Des… ▽ More This paper presents our evaluation of using a Tablet-PC to provide peer-review comments in the first year Computer Science course. Our exploration consisted of an evaluation of how students write comments on other students' assignments using three different methods: pen and paper, a Tablet-PC, and a desktop computer. Our ultimate goal is to explore the effect that interface style (Tablet vs. Desktop) has on the quality and quantity of the comments provided. △ Less

Submitted 14 July, 2006; originally announced July 2006.

Comments: 8 pages, 7 figures

ACM Class: H.1; H.4; H.5

Showing 1–27 of 27 results for author: Turner, A