Skip to main content

Showing 1–7 of 7 results for author: Jafferjee, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.05910  [pdf, other

    cs.MA

    MANSA: Learning Fast and Slow in Multi-Agent Systems

    Authors: David Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Long Fei, Xidong Feng, Stephen McAleer, Feifei Tong, Jun Wang, Yaodong Yang

    Abstract: In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate t… ▽ More

    Submitted 4 June, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  2. arXiv:2209.01054  [pdf, other

    cs.MA cs.LG

    Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

    Authors: Taher Jafferjee, Juliusz Ziomek, Tianpei Yang, Zipeng Dai, Jianhong Wang, Matthew Taylor, Kun Shao, Jun Wang, David Mguni

    Abstract: Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly rep… ▽ More

    Submitted 22 June, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

  3. arXiv:2202.06558  [pdf, other

    cs.LG cs.AI

    Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

    Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David Mguni, Jun Wang, Haitham Bou-Ammar

    Abstract: Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting… ▽ More

    Submitted 22 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  4. arXiv:2202.06557  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning in Presence of Discrete Markovian Context Evolution

    Authors: Hang Ren, Aivar Sootla, Taher Jafferjee, Junxiao Shen, Jun Wang, Haitham Bou-Ammar

    Abstract: We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inferenc… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted to ICLR 2022

  5. arXiv:2112.02618  [pdf, other

    cs.MA

    LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

    Authors: David Henry Mguni, Taher Jafferjee, Jianhong Wang, Oliver Slumbers, Nicolas Perez-Nieves, Feifei Tong, Li Yang, Jiangcheng Zhu, Yaodong Yang, Jun Wang

    Abstract: Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agent systems, coordinated exploration and behaviour is critical for agents to jointly achieve optimal outcomes. In this paper, we introduce a new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL). Our framework, named Learnable Intrinsic-Reward Ge… ▽ More

    Submitted 16 March, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.09159

  6. arXiv:2103.09159  [pdf, other

    cs.LG cs.AI cs.GT

    Learning to Shape Rewards using a Game of Two Partners

    Authors: David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Tianpei Yang, Matthew Taylor, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng Zhu, Jun Wang, Yaodong Yang

    Abstract: Reward sha** (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered sha**-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimisi… ▽ More

    Submitted 6 February, 2023; v1 submitted 16 March, 2021; originally announced March 2021.

  7. arXiv:2006.04363  [pdf, other

    cs.LG cs.AI stat.ML

    Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

    Authors: Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

    Abstract: Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we investigate one type of model error: hallucinated s… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 9 pages, 7 figures,