-
Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes
Authors:
Diego Goldsztajn,
Konstantin Avrachenkov
Abstract:
We consider the problem of maximizing the expected average reward obtained over an infinite time horizon by $n$ weakly coupled Markov decision processes. Our setup is a substantial generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We establish a connection with a deterministic and continuous-variable control problem where the objective is t…
▽ More
We consider the problem of maximizing the expected average reward obtained over an infinite time horizon by $n$ weakly coupled Markov decision processes. Our setup is a substantial generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We establish a connection with a deterministic and continuous-variable control problem where the objective is to maximize the average reward derived from an occupancy measure that represents the empirical distribution of the processes when $n \to \infty$. We show that a solution of this fluid problem can be used to construct policies for the weakly coupled processes that achieve the maximum expected average reward as $n \to \infty$, and we give sufficient conditions for the existence of solutions. Under certain assumptions on the constraints, we prove that these conditions are automatically satisfied if the unconstrained single-process problem admits a suitable unichain and aperiodic policy. In particular, the assumptions include multi-armed restless bandits and a broad class of problems with multiple actions and inequality constraints. Also, the policies can be constructed in an explicit way in these cases. Our theoretical results are complemented by several concrete examples and numerical experiments, which include multichain setups that are covered by the theoretical results.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Server saturation in skewed networks
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tas…
▽ More
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tasks, or to a server with the least number of tasks among $d$ compatible servers selected uniformly at random. We focus on networks where the neighborhood of at least one server is skewed in a limiting regime. This means that a diverging number of dispatchers are in the neighborhood which are each compatible with a uniformly bounded number of servers; thus, the degree of the central server approaches infinity while the degrees of many neighboring dispatchers remain bounded. We prove that each server with a skewed neighborhood saturates, in the sense that the mean number of tasks queueing in front of it in steady state approaches infinity. Paradoxically, this pathological behavior can even arise in random networks where nearly all the servers have at most one task in the limit.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Fluid limits for interacting queues in sparse dynamic graphs
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a network of $n$ single-server queues where tasks arrive independently at each of the servers at rate $λ_n$. The servers are interconnected by a graph that is resampled at rate $μ_n$ in a way that is symmetric with respect to the servers, and each task is dispatched to the shortest queue in the graph neighborhood where it appears. The so-called occupancy process describes the empirical di…
▽ More
Consider a network of $n$ single-server queues where tasks arrive independently at each of the servers at rate $λ_n$. The servers are interconnected by a graph that is resampled at rate $μ_n$ in a way that is symmetric with respect to the servers, and each task is dispatched to the shortest queue in the graph neighborhood where it appears. The so-called occupancy process describes the empirical distribution of the number of tasks across the servers. This stochastic process evolves on the underlying dynamic graph, and its dynamics depend on the the number of tasks at each individual server and the neighborhood structure of the graph. We prove that this dependency disappears in the limit as $n \to \infty$ when $λ_n / n \to λ$ and $μ_n \to \infty$, and establish that the limit of the occupancy process is given by a system of differential equations that depends solely on $λ$ and the limiting degree distribution of the graph. We further show that the stationary distribution of the occupancy process converges to an equilibrium point of the differential equations, and derive properties of this equilibrium that reflect the impact of the degree distribution. Our focus is on truly sparse graphs where the maximum degree is uniformly bounded across $n$, making neighboring servers strongly correlated.
△ Less
Submitted 23 May, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Utility maximizing load balancing policies
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve t…
▽ More
Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve this upper bound in a large-scale regime. Furthermore, the transient and stationary behavior of these asymptotically optimal load balancing policies is characterized on the scale of the number of server pools, in the same large-scale regime.
△ Less
Submitted 10 February, 2024; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Learning and balancing unknown loads in large-scale systems
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a…
▽ More
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time where the normalized offered load per server pool is suitably bounded, and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.
△ Less
Submitted 5 April, 2024; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Self-Learning Threshold-Based Load Balancing
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden,
Debankur Mukherjee,
Philip A. Whiting
Abstract:
We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the flui…
▽ More
We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions which guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments which support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns.
△ Less
Submitted 11 September, 2023; v1 submitted 29 October, 2020;
originally announced October 2020.