-
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Authors:
Hadi Nekoei,
Akilesh Badrinaaraayanan,
Amit Sinha,
Mohammad Amini,
Janarthanan Rajendran,
Aditya Mahajan,
Sarath Chandar
Abstract:
Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme…
▽ More
Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.
△ Less
Submitted 17 August, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Continuous Coordination As a Realistic Scenario for Lifelong Learning
Authors:
Hadi Nekoei,
Akilesh Badrinaaraayanan,
Aaron Courville,
Sarath Chandar
Abstract:
Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of L…
▽ More
Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.
△ Less
Submitted 14 June, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks
Authors:
Mojtaba Faramarzi,
Mohammad Amini,
Akilesh Badrinaaraayanan,
Vikas Verma,
Sarath Chandar
Abstract:
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (…
▽ More
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks.
△ Less
Submitted 7 January, 2023; v1 submitted 14 June, 2020;
originally announced June 2020.
-
Using Fluorescence Recovery After Photobleaching (FRAP) to study dynamics of the Structural Maintenance of Chromosome (SMC) complex in vivo
Authors:
Anjana Badrinarayanan,
Mark C. Leake
Abstract:
The SMC complex, MukBEF, is important for chromosome organization and segregation in Escherichia coli. Fluorescently tagged MukBEF forms distinct spots (or 'foci') in the cell, where it is thought to carry out most of its chromosome associated activities. This chapter outlines the technique of Fluorescence Recovery After Photobleaching (FRAP) as a method to study the properties of YFP-tagged MukB…
▽ More
The SMC complex, MukBEF, is important for chromosome organization and segregation in Escherichia coli. Fluorescently tagged MukBEF forms distinct spots (or 'foci') in the cell, where it is thought to carry out most of its chromosome associated activities. This chapter outlines the technique of Fluorescence Recovery After Photobleaching (FRAP) as a method to study the properties of YFP-tagged MukB in fluorescent foci. This method can provide important insight into the dynamics of MukB on DNA and be used to study its biochemical properties in vivo.
△ Less
Submitted 25 May, 2016;
originally announced May 2016.