Search | arXiv e-print repository

arXiv:2406.06500 [pdf, ps, other]

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

Authors: Mohidul Haque Mridul, Mohammad Foysal Khan, Redwan Ahmed Rizvee, Md Mosaddek Khan

Abstract: In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perfo… ▽ More In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Policy Switch-Detection Model), an online algorithm that employs dynamic error decay to detect changes in opponents' policies. OPS-DeMo continuously updates its beliefs using an Assumed Opponent Policy (AOP) Bank and selects corresponding responses from a pre-trained Response Policy Bank. Each response policy is trained against consistently strategizing opponents, reducing training uncertainty and enabling the effective use of algorithms like PPO in multi-agent environments. Comparative assessments show that our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting, providing greater robustness to sudden policy shifts and enabling more informed decision-making through precise opponent policy insights. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2311.16277 [pdf, other]

A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning

Authors: Redwan Ahmed Rizvee, Raheeb Hassan, Md. Mosaddek Khan

Abstract: Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard Combinatorial Optimization problems (CO) in the form of binary variables. Ising Hamiltonian is used to model the energy function of a system. QUBO to Ising Hamiltonian is regarded as a technique to solve various canonical optimization problems through quantum optimization algorithms. Recently, PI-GNN… ▽ More Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard Combinatorial Optimization problems (CO) in the form of binary variables. Ising Hamiltonian is used to model the energy function of a system. QUBO to Ising Hamiltonian is regarded as a technique to solve various canonical optimization problems through quantum optimization algorithms. Recently, PI-GNN, a generic framework, has been proposed to address CO problems over graphs based on Graph Neural Network (GNN) architecture. They introduced a generic QUBO-formulated Hamiltonian-inspired loss function that was directly optimized using GNN. PI-GNN is highly scalable but there lies a noticeable decrease in the number of satisfied constraints when compared to problem-specific algorithms and becomes more pronounced with increased graph densities. Here, We identify a behavioral pattern related to it and devise strategies to improve its performance. Another group of literature uses Reinforcement learning (RL) to solve the aforementioned NP-hard problems using problem-specific reward functions. In this work, we also focus on creating a bridge between the RL-based solutions and the QUBO-formulated Hamiltonian. We formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the RL-based paradigm in the form of rewards. Furthermore, we also introduce a novel Monty Carlo Tree Search-based strategy with GNN where we apply a guided search through manual perturbation of node labels during training. We empirically evaluated our methods and observed up to 44% improvement in the number of constraint violations compared to the PI-GNN. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2308.13978 [pdf, other]

A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning

Authors: Redwan Ahmed Rizvee, Md. Mosaddek Khan

Abstract: Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinator… ▽ More Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinatorial Optimization (CO) problems over graphs based on a simple Graph Neural Network (GNN) architecture. Their novel contribution was a generic QUBO-formulated Hamiltonian-inspired loss function that was optimized using GNN. In this study, we address a crucial issue related to the aforementioned setup especially observed in denser graphs. The reinforcement learning-based paradigm has also been widely used to address numerous CO problems. Here we also formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement Learning paradigm to directly integrate the actual node projection status during training as the form of rewards. In our experiments, we observed up to 44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our implementation can be found in https://github.com/rizveeredwan/learning-graph-structure. △ Less

Submitted 28 September, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.04453 [pdf, other]

Towards Immutability: A Secure and Efficient Auditing Framework for Cloud Supporting Data Integrity and File Version Control

Authors: Faisal Haque Bappy, Saklain Zaman, Tariqul Islam, Redwan Ahmed Rizvee, Joon S. Park, Kamrul Hasan

Abstract: Although wide-scale integration of cloud services with myriad applications increases quality of services (QoS) for enterprise users, verifying the existence and manipulation of stored cloud information remains an open research problem. Decentralized blockchain-based solutions are becoming more appealing for cloud auditing environments because of the immutable nature of blockchain. However, the dec… ▽ More Although wide-scale integration of cloud services with myriad applications increases quality of services (QoS) for enterprise users, verifying the existence and manipulation of stored cloud information remains an open research problem. Decentralized blockchain-based solutions are becoming more appealing for cloud auditing environments because of the immutable nature of blockchain. However, the decentralized structure of blockchain results in considerable synchronization and communication overhead, which increases maintenance costs for cloud service providers (CSP). This paper proposes a Merkle Hash Tree based architecture named Entangled Merkle Forest to support version control and dynamic auditing of information in centralized cloud environments. We utilized a semi-trusted third-party auditor to conduct the auditing tasks with minimal privacy-preserving file metadata. To the best of our knowledge, we are the first to design a node sharing Merkle Forest to offer a cost-effective auditing framework for centralized cloud infrastructures while achieving the immutable feature of blockchain, mitigating the synchronization and performance challenges of the decentralized architectures. Our proposed scheme outperforms it's equivalent Blockchain-based schemes by ensuring time and storage efficiency with minimum overhead as evidenced by performance analysis. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Showing 1–4 of 4 results for author: Rizvee, R A