Skip to main content

Showing 1–19 of 19 results for author: Mondal, W U

.
  1. arXiv:2406.11481  [pdf, other

    cs.LG cs.AI

    Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

    Authors: Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai

    Abstract: Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific co… ▽ More

    Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.02042; text overlap with arXiv:2202.00150 by other authors

  2. arXiv:2405.10624  [pdf, ps, other

    cs.LG cs.AI

    Sample-Efficient Constrained Reinforcement Learning with General Parameterization

    Authors: Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold. Building on the idea of momentum-based acceleration, we develop the Primal-Dual Accelerated Natural Policy Gradient (PD-ANPG) algorithm that guarantees… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  3. arXiv:2404.02108  [pdf, ps, other

    cs.LG

    Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

    Authors: Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order $\tilde{\mathcal{O}}(T^{3/5})$. The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 34 pages

  4. arXiv:2402.06901  [pdf, other

    cs.NI

    Near-perfect Coverage Manifold Estimation in Cellular Networks via conditional GAN

    Authors: Washim Uddin Mondal, Veni Goyal, Satish V. Ukkusuri, Goutam Das, Di Wang, Mohamed-Slim Alouini, Vaneet Aggarwal

    Abstract: This paper presents a conditional generative adversarial network (cGAN) that translates base station location (BSL) information of any Region-of-Interest (RoI) to location-dependent coverage probability values within a subset of that region, called the region-of-evaluation (RoE). We train our network utilizing the BSL data of India, the USA, Germany, and Brazil. In comparison to the state-of-the-a… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Journal ref: IEEE Networking Letters, 2024

  5. arXiv:2402.02042  [pdf, ps, other

    cs.LG cs.AI

    Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

    Authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual based policy gradient algorithm that adeptly manages the const… ▽ More

    Submitted 3 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: We fixed Lemma 6 in v2 which changed the final result

  6. arXiv:2312.01826  [pdf, other

    cs.NI

    Terrain-based Coverage Manifold Estimation: Machine Learning, Stochastic Geometry, or Simulation?

    Authors: Ruibo Wang, Washim Uddin Mondal, Mustafa A. Kishk, Vaneet Aggarwal, Mohamed-Slim Alouini

    Abstract: Given the necessity of connecting the unconnected, covering blind spots has emerged as a critical task in the next-generation wireless communication network. A direct solution involves obtaining a coverage manifold that visually showcases network coverage performance at each position. Our goal is to devise different methods that minimize the absolute error between the estimated coverage manifold a… ▽ More

    Submitted 11 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  7. arXiv:2310.11677  [pdf, ps, other

    cs.LG cs.AI

    Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

    Authors: Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({ε^{-2}})$ sample complexity and… ▽ More

    Submitted 5 February, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Journal ref: AISTATS 2024

  8. arXiv:2309.01922  [pdf, ps, other

    cs.LG cs.AI

    Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

    Authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We th… ▽ More

    Submitted 2 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Journal ref: AAAI 2024

  9. arXiv:2307.11464  [pdf, other

    cs.CY

    Supporting Post-disaster Recovery with Agent-based Modeling in Multilayer Socio-physical Networks

    Authors: Jiawei Xue, Sangung Park, Washim Uddin Mondal, Sandro Martinelli Reia, Tong Yao, Satish V. Ukkusuri

    Abstract: The examination of post-disaster recovery (PDR) in a socio-physical system enables us to elucidate the complex relationships between humans and infrastructures. Although existing studies have identified many patterns in the PDR process, they fall short of describing how individual recoveries contribute to the overall recovery of the system. To enhance the understanding of individual return behavio… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 28 pages, 10 figures

  10. arXiv:2305.05163  [pdf, other

    q-bio.PE cs.AI cs.LG

    Cooperating Graph Neural Networks with Deep Reinforcement Learning for Vaccine Prioritization

    Authors: Lu Ling, Washim Uddin Mondal, Satish V, Ukkusuri

    Abstract: This study explores the vaccine prioritization strategy to reduce the overall burden of the pandemic when the supply is limited. Existing methods conduct macro-level or simplified micro-level vaccine distribution by assuming the homogeneous behavior within subgroup populations and lacking mobility dynamics integration. Directly applying these models for micro-level vaccine allocation leads to sub-… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  11. arXiv:2305.02527  [pdf, ps, other

    cs.LG cs.AI

    Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

    Authors: Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. The delay and compositeness of rewards mean that rewards generated as a result of taking an action at a given state are fragmented into different components, and they are sequentially realized at delayed time instances. The partial anonymity attribute im… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

  12. arXiv:2301.06889  [pdf, other

    cs.LG cs.AI cs.MA

    Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

    Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems. However, the success of MFC relies on the presumption that given the local states and actions of all the agents, the next (local) states of the agents evolve conditionally independent of each other. Here we demonstrate that even in a MARL setting where agents share a c… ▽ More

    Submitted 26 May, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Journal ref: Transactions on Machine Learning Research, May 2023

  13. arXiv:2209.07437  [pdf, other

    cs.LG cs.MA

    Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

    Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specificall… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  14. arXiv:2209.03491  [pdf, other

    cs.LG cs.MA

    On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

    Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for suf… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Journal ref: Transactions on Machine Learning Research, 2022

  15. arXiv:2203.00035  [pdf, other

    cs.LG cs.MA

    Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction?

    Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement Learning (MARL) problems. Recent studies have shown that MFC can well-approximate MARL when the population size is large and the agents are exchangeable. Unfortunately, the presumption of exchangeability implies that all agents uniformly interact with one another which is not true in many practical scenarios. In this ar… ▽ More

    Submitted 1 June, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

    Journal ref: UAI 2022

  16. Deep Learning based Coverage and Rate Manifold Estimation in Cellular Networks

    Authors: Washim Uddin Mondal, Praful D. Mankar, Goutam Das, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: This article proposes Convolutional Neural Network-based Auto Encoder (CNN-AE) to predict location-dependent rate and coverage probability of a network from its topology. We train the CNN utilising BS location data of India, Brazil, Germany, and the USA and compare its performance with stochastic geometry (SG) based analytical models. In comparison to the best-fitted SG-based model, CNN-AE improve… ▽ More

    Submitted 21 August, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2022

  17. arXiv:2109.04024  [pdf, ps, other

    cs.LG cs.AI cs.GT cs.MA

    On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

    Authors: Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri

    Abstract: Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this… ▽ More

    Submitted 8 May, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 46 pages

    Journal ref: Journal of Machine Learning Research 23(129): 1-46, 2022

  18. arXiv:2103.08875  [pdf, other

    cs.PF cs.NI

    Queuing Analysis of Opportunistic Cognitive Radio IoT Network with Imperfect Sensing

    Authors: Asif Ahmed Sardar, Dibbendu Roy, Washim Uddin Mondal, Goutam Das

    Abstract: In this paper, we analyze a Cognitive Radio-based Internet-of-Things (CR-IoT) network comprising a Primary Network Provider (PNP) and an IoT operator. The PNP uses its licensed spectrum to serve its users. The IoT operator identifies the white-space in the licensed band at regular intervals and opportunistically exploits them to serve the IoT nodes under its coverage. IoT nodes are battery-operate… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  19. On Exact Distribution of Poisson-Voronoi Area in $K$-tier HetNets with Generalized Association Rule

    Authors: Washim Uddin Mondal, Goutam Das

    Abstract: This letter characterizes the exact distribution function of a typical Voronoi area in a $K$-tier Poisson network. The users obey a generalized association (GA) rule, which is a superset of nearest base station association and maximum received power based association (with arbitrary fading) rules that are commonly adopted in the literature. Combining the Robbins' theorem and the probability genera… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Journal ref: IEEE Communications Letters, 2020