Skip to main content

Showing 1–25 of 25 results for author: Mguni, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.12061  [pdf, other

    cs.LG cs.AI cs.CL

    All Language Models Large and Small

    Authors: Zhixun Chen, Yali Du, David Mguni

    Abstract: Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  2. arXiv:2312.11063  [pdf, ps, other

    cs.GT cs.AI cs.DS cs.LG econ.TH

    A survey on algorithms for Nash equilibria in finite normal-form games

    Authors: Hanyu Li, Wenhan Huang, Zhijian Duan, David Henry Mguni, Kun Shao, Jun Wang, Xiaotie Deng

    Abstract: Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form ga… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The published version is in Computer Science Review

  3. arXiv:2310.18127  [pdf, other

    cs.LG cs.AI cs.CL

    Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models

    Authors: Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang

    Abstract: Large language models (LLMs) demonstrate their promise in tackling complicated practical challenges by combining action-based policies with chain of thought (CoT) reasoning. Having high-quality prompts on hand, however, is vital to the framework's effectiveness. Currently, these prompts are handcrafted utilising extensive human labor, resulting in CoT policies that frequently fail to generalise. H… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

  4. arXiv:2306.09200  [pdf, other

    cs.LG cs.AI

    ChessGPT: Bridging Policy Learning and Language Modeling

    Authors: Xidong Feng, Yicheng Luo, Ziyan Wang, Hongrui Tang, Mengyue Yang, Kun Shao, David Mguni, Yali Du, Jun Wang

    Abstract: When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use his… ▽ More

    Submitted 21 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Published as a conference article in NeurIPS 2023

  5. arXiv:2302.05910  [pdf, other

    cs.MA

    MANSA: Learning Fast and Slow in Multi-Agent Systems

    Authors: David Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Long Fei, Xidong Feng, Stephen McAleer, Feifei Tong, Jun Wang, Yaodong Yang

    Abstract: In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate t… ▽ More

    Submitted 4 June, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  6. arXiv:2302.03439  [pdf, other

    cs.MA cs.LG

    Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

    Authors: Lukas Schäfer, Oliver Slumbers, Stephen McAleer, Yali Du, Stefano V. Albrecht, David Mguni

    Abstract: Existing value-based algorithms for cooperative multi-agent reinforcement learning (MARL) commonly rely on random exploration, such as $ε$-greedy, to explore the environment. However, such exploration is inefficient at finding effective joint actions in states that require cooperation of multiple agents. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a genera… ▽ More

    Submitted 16 April, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Preprint. Previously presented at the Adaptive and Learning Agents Workshop (ALA) at the AAMAS conference 2023

  7. arXiv:2209.01054  [pdf, other

    cs.MA cs.LG

    Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

    Authors: Taher Jafferjee, Juliusz Ziomek, Tianpei Yang, Zipeng Dai, Jianhong Wang, Matthew Taylor, Kun Shao, Jun Wang, David Mguni

    Abstract: Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly rep… ▽ More

    Submitted 22 June, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

  8. arXiv:2205.15953  [pdf, other

    cs.LG

    Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

    Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng Dai, Kun Shao, Jun Wang

    Abstract: Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving su… ▽ More

    Submitted 4 June, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

  9. arXiv:2205.15434  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

    Authors: Oliver Slumbers, David Henry Mguni, Stephen Marcus McAleer, Stefano B. Blumberg, Jun Wang, Yaodong Yang

    Abstract: In order for agents in multi-agent systems (MAS) to be safe, they need to take into account the risks posed by the actions of other agents. However, the dominant paradigm in game theory (GT) assumes that agents are not affected by risk from other agents and only strive to maximise their expected utility. For example, in hybrid human-AI driving systems, it is necessary to limit large deviations in… ▽ More

    Submitted 2 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  10. arXiv:2205.15064  [pdf, other

    cs.LG

    SEREN: Knowing When to Explore and When to Exploit

    Authors: Changmin Yu, David Mguni, Dong Li, Aivar Sootla, Jun Wang, Neil Burgess

    Abstract: Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessari… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2112.02618, arXiv:2103.09159, arXiv:2110.14468

  11. arXiv:2205.01469  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    On the Convergence of Fictitious Play: A Decomposition Approach

    Authors: Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang

    Abstract: Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both and the convergence property of FP has not been… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  12. arXiv:2202.06558  [pdf, other

    cs.LG cs.AI

    Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

    Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David Mguni, Jun Wang, Haitham Bou-Ammar

    Abstract: Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting… ▽ More

    Submitted 22 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  13. arXiv:2112.02618  [pdf, other

    cs.MA

    LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

    Authors: David Henry Mguni, Taher Jafferjee, Jianhong Wang, Oliver Slumbers, Nicolas Perez-Nieves, Feifei Tong, Li Yang, Jiangcheng Zhu, Yaodong Yang, Jun Wang

    Abstract: Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agent systems, coordinated exploration and behaviour is critical for agents to jointly achieve optimal outcomes. In this paper, we introduce a new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL). Our framework, named Learnable Intrinsic-Reward Ge… ▽ More

    Submitted 16 March, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.09159

  14. arXiv:2110.14468  [pdf, other

    cs.LG

    DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

    Authors: David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

    Abstract: Reinforcement learning (RL) involves performing exploratory actions in an unknown system. This can place a learning agent in dangerous and potentially catastrophic system states. Current approaches for tackling safe learning in RL simultaneously trade-off safe exploration and task fulfillment. In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.09159

  15. arXiv:2110.03604  [pdf, ps, other

    cs.LG cs.AI cs.GT cs.MA

    Online Markov Decision Processes with Non-oblivious Strategic Adversary

    Authors: Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

    Abstract: We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of… ▽ More

    Submitted 27 January, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted at Autonomous Agents and Multi-Agent Systems (2023)

    Report number: 15

  16. arXiv:2109.01795  [pdf, ps, other

    cs.GT cs.CC cs.LG cs.MA

    On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

    Authors: Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang

    Abstract: Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a finite-state discounted Stochastic Game within the exponential precision is \textbf{PPAD}-compl… ▽ More

    Submitted 11 January, 2023; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: This paper has been fully published at National Science Review. Please refer to https://doi.org/10.1093/nsr/nwac256 for the official version

  17. arXiv:2108.08612  [pdf, other

    cs.LG cs.AI cs.MA

    Settling the Variance of Multi-Agent Policy Gradients

    Authors: Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang

    Abstract: Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer… ▽ More

    Submitted 4 April, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

  18. arXiv:2103.09284  [pdf, other

    cs.MA

    Learning in Nonzero-Sum Stochastic Games with Potentials

    Authors: David Mguni, Yutong Wu, Yali Du, Yaodong Yang, Ziyi Wang, Minne Li, Ying Wen, Joel Jennings, Jun Wang

    Abstract: Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agent systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In par… ▽ More

    Submitted 15 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: ICML 2021

  19. arXiv:2103.09159  [pdf, other

    cs.LG cs.AI cs.GT

    Learning to Shape Rewards using a Game of Two Partners

    Authors: David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Tianpei Yang, Matthew Taylor, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng Zhu, Jun Wang, Yaodong Yang

    Abstract: Reward sha** (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered sha**-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimisi… ▽ More

    Submitted 6 February, 2023; v1 submitted 16 March, 2021; originally announced March 2021.

  20. arXiv:2103.07927  [pdf, other

    cs.AI cs.GT cs.MA

    Modelling Behavioural Diversity for Learning in Open-Ended Games

    Authors: Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang

    Abstract: Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce… ▽ More

    Submitted 10 June, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: corresponds to <[email protected]>

  21. arXiv:2103.07780  [pdf, other

    cs.AI cs.GT

    Online Double Oracle

    Authors: Le Cong Dinh, Yaodong Yang, Stephen McAleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

    Abstract: Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods… ▽ More

    Submitted 15 February, 2023; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR)

    Journal ref: Transactions on Machine Learning Research 2022

  22. arXiv:2006.01482  [pdf, other

    cs.LG cs.MA

    Multi-Agent Determinantal Q-Learning

    Authors: Yaodong Yang, Ying Wen, Liheng Chen, Jun Wang, Kun Shao, David Mguni, Weinan Zhang

    Abstract: Centralized training with decentralized execution has become an important paradigm in multi-agent learning. Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution. In this paper, we eliminate this restriction by proposing multi-agent determinantal Q-learning. Our method is established on Q-DPP, an extension of deter… ▽ More

    Submitted 9 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  23. arXiv:2005.13527   

    math.OC cs.MA

    Stochastic Potential Games

    Authors: David Mguni

    Abstract: Computing the Nash equilibrium (NE) for N-player non-zerosum stochastic games is a formidable challenge. Currently, algorithmic methods in stochastic game theory are unable to compute NE for stochastic games (SGs) for settings in all but extreme cases in which the players either play as a team or have diametrically opposed objectives in a two-player setting. This greatly impedes the application of… ▽ More

    Submitted 24 March, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: The submission contains an overlap with and has been superseded by arXiv:2103.09284

  24. arXiv:1901.10923  [pdf, other

    cs.MA cs.GT

    Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems

    Authors: David Mguni, Joel Jennings, Sergio Valcarcel Macua, Emilio Sison, Sofia Ceppi, Enrique Munoz de Cote

    Abstract: Many real-world systems such as taxi systems, traffic networks and smart grids involve self-interested actors that perform individual tasks in a shared environment. However, in such systems, the self-interested behaviour of agents produces welfare inefficient and globally suboptimal outcomes that are detrimental to all - some common examples are congestion in traffic networks, demand spikes for re… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

  25. arXiv:1803.05028  [pdf, other

    cs.MA

    Decentralised Learning in Systems with Many, Many Strategic Agents

    Authors: David Mguni, Joel Jennings, Enrique Munoz de Cote

    Abstract: Although multi-agent reinforcement learning can tackle systems of strategically interacting entities, it currently fails in scalability and lacks rigorous convergence guarantees. Crucially, learning in multi-agent systems can become intractable due to the explosion in the size of the state-action space as the number of agents increases. In this paper, we propose a method for computing closed-loop… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.