Search | arXiv e-print repository

Transformers in Reinforcement Learning: A Survey

Authors: Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J. D. Prince, Samira Ebrahimi Kahou

Abstract: Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability,… ▽ More Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 35 pages, 11 figures

arXiv:2305.15239 [pdf, other]

Deep Learning and Ethics

Authors: Travis LaCroix, Simon J. D. Prince

Abstract: This article appears as chapter 21 of Prince (2023, Understanding Deep Learning); a complete draft of the textbook is available here: http://udlbook.com. This chapter considers potential harms arising from the design and use of AI systems. These include algorithmic bias, lack of explainability, data privacy violations, militarization, fraud, and environmental concerns. The aim is not to provide ad… ▽ More This article appears as chapter 21 of Prince (2023, Understanding Deep Learning); a complete draft of the textbook is available here: http://udlbook.com. This chapter considers potential harms arising from the design and use of AI systems. These include algorithmic bias, lack of explainability, data privacy violations, militarization, fraud, and environmental concerns. The aim is not to provide advice on being more ethical. Instead, the goal is to express ideas and start conversations in key areas that have received attention in philosophy, political science, and the broader social sciences. △ Less

Submitted 20 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Copyright in this Work has been licensed exclusively to The MIT Press, https://mitpress.mit.edu, which will be releasing the final version to the public in 2023. All inquiries regarding rights should be addressed to The MIT Press, Rights and Permissions Department

arXiv:2102.10049 [pdf, other]

PCaaD: Towards Automated Determination and Exploitation of Industrial Processes

Authors: B. Green, W. Knowles, M. Krotofil, R. Derbyshire, D. Prince, N. Suri

Abstract: Over the last decade, Programmable Logic Controllers (PLCs) have been increasingly targeted by attackers to obtain control over industrial processes that support critical services. Such targeted attacks typically require detailed knowledge of system-specific attributes, including hardware configurations, adopted protocols, and PLC control-logic, i.e. process comprehension. The consensus from both… ▽ More Over the last decade, Programmable Logic Controllers (PLCs) have been increasingly targeted by attackers to obtain control over industrial processes that support critical services. Such targeted attacks typically require detailed knowledge of system-specific attributes, including hardware configurations, adopted protocols, and PLC control-logic, i.e. process comprehension. The consensus from both academics and practitioners suggests stealthy process comprehension obtained from a PLC alone, to conduct targeted attacks, is impractical. In contrast, we assert that current PLC programming practices open the door to a new vulnerability class based on control-logic constructs. To support this, we propose the concept of Process Comprehension at a Distance (PCaaD), as a novel methodological and automatable approach for system-agnostic exploitation of PLC library functions, leading to the targeted exfiltration of operational data, manipulation of control-logic behavior, and establishment of covert command and control channels through unused memory. We validate PCaaD on widely used PLCs, by identification of practical attacks. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Comments: 17 pages, 10 figures, 2 tables

arXiv:2012.15355 [pdf, other]

Optimizing Deeper Transformers on Small Datasets

Authors: Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi Tang, Chenyang Huang, Jackie Chi Kit Cheung, Simon J. D. Prince, Yanshuai Cao

Abstract: It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to chal… ▽ More It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train $48$ layers of transformers, comprising $24$ fine-tuned layers from pre-trained RoBERTa and $24$ relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state-of-the-art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider. We achieve this by deriving a novel Data-dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding. △ Less

Submitted 31 May, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

Comments: Accepted at ACL 2021 main conference

arXiv:1908.09257 [pdf, other]

doi 10.1109/TPAMI.2020.2992934

Normalizing Flows: An Introduction and Review of Current Methods

Authors: Ivan Kobyzev, Simon J. D. Prince, Marcus A. Brubaker

Abstract: Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current st… ▽ More Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions. △ Less

Submitted 5 June, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

Comments: This paper appears in: IEEE Transactions on Pattern Analysis and Machine Intelligence On page(s): 1-16 Print ISSN: 0162-8828 Online ISSN: 0162-8828

arXiv:1702.08478 [pdf, ps, other]

Risks and Transaction Costs of Distributed-Ledger Fintech: Boundary Effects and Consequences

Authors: Kim Kaivanto, Daniel Prince

Abstract: Fintech business models based on distributed ledgers -- and their smart-contract variants in particular -- offer the prospect of democratizing access to faster, anywhere-accessible, lower cost, reliable-and-secure high-quality financial services. In addition to holding great, economically transformative promise, these business models pose new, little-studied risks and transaction costs. However, t… ▽ More Fintech business models based on distributed ledgers -- and their smart-contract variants in particular -- offer the prospect of democratizing access to faster, anywhere-accessible, lower cost, reliable-and-secure high-quality financial services. In addition to holding great, economically transformative promise, these business models pose new, little-studied risks and transaction costs. However, these risks and transaction costs are not evident during the demonstration and testing phases of development, when adopters and users are drawn from the community of developers themselves, as well as from among non-programmer fintech evangelists. Hence, when the new risks and transaction costs become manifest -- as the fintech business models are rolled out across the wider economy -- the consequences may also appear to be new and surprising. The present study represents an effort to get ahead of these developments by delineating risks and transaction costs inherent in distributed-ledger- and smart-contracts-based fintech business models. The analysis focuses on code risk and moral-hazard risk, as well as on mixed-economy risks and the unintended consequences of replicating bricks-and-mortar-generation contract forms within the ultra-low transaction-cost environment of fintech. △ Less

Submitted 27 February, 2017; originally announced February 2017.

Comments: 12 pages, 1 figure

Showing 1–6 of 6 results for author: Prince, D