Search | arXiv e-print repository

Online Matching on $3$-Uniform Hypergraphs

Authors: Sander Borst, Danish Kashaev, Zhuan Khye Koh

Abstract: The online matching problem was introduced by Karp, Vazirani and Vazirani (STOC 1990) on bipartite graphs with vertex arrivals. It is well-known that the optimal competitive ratio is $1-1/e$ for both integral and fractional versions of the problem. Since then, there has been considerable effort to find optimal competitive ratios for other related settings. In this work, we go beyond the graph case… ▽ More The online matching problem was introduced by Karp, Vazirani and Vazirani (STOC 1990) on bipartite graphs with vertex arrivals. It is well-known that the optimal competitive ratio is $1-1/e$ for both integral and fractional versions of the problem. Since then, there has been considerable effort to find optimal competitive ratios for other related settings. In this work, we go beyond the graph case and study the online matching problem on $k$-uniform hypergraphs. For $k=3$, we provide an optimal primal-dual fractional algorithm, which achieves a competitive ratio of $(e-1)/(e+1)\approx 0.4621$. As our main technical contribution, we present a carefully constructed adversarial instance, which shows that this ratio is in fact optimal. It combines ideas from known hard instances for bipartite graphs under the edge-arrival and vertex-arrival models. For $k\geq 3$, we give a simple integral algorithm which performs better than greedy when the online nodes have bounded degree. As a corollary, it achieves the optimal competitive ratio of 1/2 on 3-uniform hypergraphs when every online node has degree at most 2. This is because the special case where every online node has degree 1 is equivalent to the edge-arrival model on graphs, for which an upper bound of 1/2 is known. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2302.03669 [pdf, other]

Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems

Authors: Xiao-Yang Liu, Ming Zhu, Sem Borst, Anwar Walid

Abstract: Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep rei… ▽ More Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms. △ Less

Submitted 5 March, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 17 pages

Journal ref: IEEE Transactions on Network Science and Engineering, 2023

arXiv:2210.05982 [pdf, other]

A nearly optimal randomized algorithm for explorable heap selection

Authors: Sander Borst, Daniel Dadush, Sophie Huiberts, Danish Kashaev

Abstract: Explorable heap selection is the problem of selecting the $n$th smallest value in a binary heap. The key values can only be accessed by traversing through the underlying infinite binary tree, and the complexity of the algorithm is measured by the total distance traveled in the tree (each edge has unit cost). This problem was originally proposed as a model to study search strategies for the branch-… ▽ More Explorable heap selection is the problem of selecting the $n$th smallest value in a binary heap. The key values can only be accessed by traversing through the underlying infinite binary tree, and the complexity of the algorithm is measured by the total distance traveled in the tree (each edge has unit cost). This problem was originally proposed as a model to study search strategies for the branch-and-bound algorithm with storage restrictions by Karp, Saks and Widgerson (FOCS '86), who gave deterministic and randomized $n\cdot \exp(O(\sqrt{\log{n}}))$ time algorithms using $O(\log(n)^{2.5})$ and $O(\sqrt{\log n})$ space respectively. We present a new randomized algorithm with running time $O(n\log(n)^3)$ using $O(\log n)$ space, substantially improving the previous best randomized running time at the expense of slightly increased space usage. We also show an $Ω(\log(n)n/\log(\log(n)))$ for any algorithm that solves the problem in the same amount of space, indicating that our algorithm is nearly optimal. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2112.08958 [pdf, other]

doi 10.1287/stsy.2022.0103

Utility maximizing load balancing policies

Authors: Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden

Abstract: Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve t… ▽ More Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve this upper bound in a large-scale regime. Furthermore, the transient and stationary behavior of these asymptotically optimal load balancing policies is characterized on the scale of the number of server pools, in the same large-scale regime. △ Less

Submitted 10 February, 2024; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 73 pages, 6 figures

MSC Class: 60K25 (Primary) 60F15; 60F17 (Secondary) ACM Class: G.3

Journal ref: Stochastic systems, 13(2):211-246, 2023

arXiv:2111.05777 [pdf, other]

Power-of-two sampling in redundancy systems: the impact of assignment constraints

Authors: Ellen Cardinaels, Sem Borst, Johan S. H. van Leeuwaarden

Abstract: A classical sampling strategy for load balancing policies is power-of-two, where any server pair is sampled with equal probability. This does not cover practical settings with assignment constraints which force non-uniform sampling. While intuition suggests that non-uniform sampling adversely impacts performance, this was only supported through simulations, and rigorous statements have remained el… ▽ More A classical sampling strategy for load balancing policies is power-of-two, where any server pair is sampled with equal probability. This does not cover practical settings with assignment constraints which force non-uniform sampling. While intuition suggests that non-uniform sampling adversely impacts performance, this was only supported through simulations, and rigorous statements have remained elusive. Building on product-form distributions for redundancy systems, we prove the stochastic dominance of uniform sampling for a four-server system as well as arbitrary-size systems in light traffic. △ Less

Submitted 15 July, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

arXiv:2105.13738 [pdf, ps, other]

Fork-join and redundancy systems with heavy-tailed job sizes

Authors: Youri Raaijmakers, Sem Borst, Onno Boxma

Abstract: We investigate the tail asymptotics of the response time distribution for the cancel-on-start (c.o.s.) and cancel-on-completion (c.o.c.) variants of redundancy-$d$ scheduling and the fork-join model with heavy-tailed job sizes. We present bounds, which only differ in the pre-factor, for the tail probability of the response time in the case of the first-come first-served (FCFS) discipline. For the… ▽ More We investigate the tail asymptotics of the response time distribution for the cancel-on-start (c.o.s.) and cancel-on-completion (c.o.c.) variants of redundancy-$d$ scheduling and the fork-join model with heavy-tailed job sizes. We present bounds, which only differ in the pre-factor, for the tail probability of the response time in the case of the first-come first-served (FCFS) discipline. For the c.o.s. variant we restrict ourselves to redundancy-$d$ scheduling, which is a special case of the fork-join model. In particular, for regularly varying job sizes with tail index $-ν$ the tail index of the response time for the c.o.s. variant of redundancy-$d$ equals $-\min\{d_{\mathrm{cap}}(ν-1),ν\}$, where $d_{\mathrm{cap}} = \min\{d,N-k\}$, $N$ is the number of servers and $k$ is the integer part of the load. This result indicates that for $d_{\mathrm{cap}} < \fracν{ν-1}$ the waiting time component is dominant, whereas for $d_{\mathrm{cap}} > \fracν{ν-1}$ the job size component is dominant. Thus, having $d = \lceil \min\{\fracν{ν-1},N-k\} \rceil$ replicas is sufficient to achieve the optimal asymptotic tail behavior of the response time. For the c.o.c. variant of the fork-join($n_{\mathrm{F}},n_{\mathrm{J}}$) model the tail index of the response time, under some assumptions on the load, equals $1-ν$ and $1-(n_{\mathrm{F}}+1-n_{\mathrm{J}})ν$, for identical and i.i.d. replicas, respectively; here the waiting time component is always dominant. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2012.13306 [pdf, ps, other]

Majorizing Measures for the Optimizer

Authors: Sander Borst, Daniel Dadush, Neil Olver, Makrand Sinha

Abstract: The theory of majorizing measures, extensively developed by Fernique, Talagrand and many others, provides one of the most general frameworks for controlling the behavior of stochastic processes. In particular, it can be applied to derive quantitative bounds on the expected suprema and the degree of continuity of sample paths for many processes. One of the crowning achievements of the theory is T… ▽ More The theory of majorizing measures, extensively developed by Fernique, Talagrand and many others, provides one of the most general frameworks for controlling the behavior of stochastic processes. In particular, it can be applied to derive quantitative bounds on the expected suprema and the degree of continuity of sample paths for many processes. One of the crowning achievements of the theory is Talagrand's tight alternative characterization of the suprema of Gaussian processes in terms of majorizing measures. The proof of this theorem was difficult, and thus considerable effort was put into the task of develo** both shorter and easier to understand proofs. A major reason for this difficulty was considered to be theory of majorizing measures itself, which had the reputation of being opaque and mysterious. As a consequence, most recent treatments of the theory (including by Talagrand himself) have eschewed the use of majorizing measures in favor of a purely combinatorial approach (the generic chaining) where objects based on sequences of partitions provide roughly matching upper and lower bounds on the desired expected supremum. In this paper, we return to majorizing measures as a primary object of study, and give a viewpoint that we think is natural and clarifying from an optimization perspective. As our main contribution, we give an algorithmic proof of the majorizing measures theorem based on two parts: (1) We make the simple (but apparently new) observation that finding the best majorizing measure can be cast as a convex program. This also allows for efficiently computing the measure using off-the-shelf methods from convex optimization. (2) We obtain tree-based upper and lower bound certificates by rounding, in a series of steps, the primal and dual solutions to this convex program. [...] △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 37 pages. Extended Abstract to appear in ITCS 2021

MSC Class: 60G15; 68Q87 ACM Class: G.3

arXiv:2012.10142 [pdf, other]

Learning and balancing unknown loads in large-scale systems

Authors: Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden

Abstract: Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a… ▽ More Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time where the normalized offered load per server pool is suitably bounded, and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor. △ Less

Submitted 5 April, 2024; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: 56 pages, 3 figures

MSC Class: 60K25 (Primary) 60F15; 60F17 (Secondary) ACM Class: G.3

arXiv:2012.08357 [pdf, other]

Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit

Authors: Mark van der Boor, Sem Borst, Johan van Leeuwaarden

Abstract: Load balancing plays a critical role in efficiently dispatching jobs in parallel-server systems such as cloud networks and data centers. A fundamental challenge in the design of load balancing algorithms is to achieve an optimal trade-off between delay performance and implementation overhead (e.g. communication or memory usage). This trade-off has primarily been studied so far from the angle of th… ▽ More Load balancing plays a critical role in efficiently dispatching jobs in parallel-server systems such as cloud networks and data centers. A fundamental challenge in the design of load balancing algorithms is to achieve an optimal trade-off between delay performance and implementation overhead (e.g. communication or memory usage). This trade-off has primarily been studied so far from the angle of the amount of overhead required to achieve asymptotically optimal performance, particularly vanishing delay in large-scale systems. In contrast, in the present paper, we focus on an arbitrarily sparse communication budget, possibly well below the minimum requirement for vanishing delay, referred to as the hyper-scalable operating region. Furthermore, jobs may only be admitted when a specific limit on the queue position of the job can be guaranteed. The centerpiece of our analysis is a universal upper bound for the achievable throughput of any dispatcher-driven algorithm for a given communication budget and queue limit. We also propose a specific hyper-scalable scheme which can operate at any given message rate and enforce any given queue limit, while allowing the server states to be captured via a closed product-form network, in which servers act as customers traversing various nodes. The product-form distribution is leveraged to prove that the bound is tight and that the proposed hyper-scalable scheme is throughput-optimal in a many-server regime given the communication and queue limit constraints. Extensive simulation experiments are conducted to illustrate the results. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2012.08346 [pdf, ps, other]

On the Integrality Gap of Binary Integer Programs with Gaussian Data

Authors: Sander Borst, Daniel Dadush, Sophie Huiberts, Samarth Tiwari

Abstract: For a binary integer program (IP) ${\rm max} ~ c^\mathsf{T} x, Ax \leq b, x \in \{0,1\}^n$, where $A \in \mathbb{R}^{m \times n}$ and $c \in \mathbb{R}^n$ have independent Gaussian entries and the right-hand side $b \in \mathbb{R}^m$ satisfies that its negative coordinates have $\ell_2$ norm at most $n/10$, we prove that the gap between the value of the linear programming relaxation and the IP is… ▽ More For a binary integer program (IP) ${\rm max} ~ c^\mathsf{T} x, Ax \leq b, x \in \{0,1\}^n$, where $A \in \mathbb{R}^{m \times n}$ and $c \in \mathbb{R}^n$ have independent Gaussian entries and the right-hand side $b \in \mathbb{R}^m$ satisfies that its negative coordinates have $\ell_2$ norm at most $n/10$, we prove that the gap between the value of the linear programming relaxation and the IP is upper bounded by $\operatorname{poly}(m)(\log n)^2 / n$ with probability at least $1-2/n^7-2^{-\operatorname{poly}(m)}$. Our results give a Gaussian analogue of the classical integrality gap result of Dyer and Frieze (Math. of O.R., 1989) in the case of random packing IPs. In constrast to the packing case, our integrality gap depends only polynomially on $m$ instead of exponentially. Building upon recent breakthrough work of Dey, Dubey and Molinaro (SODA, 2021), we show that the integrality gap implies that branch-and-bound requires $n^{\operatorname{poly}(m)}$ time on random Gaussian IPs with good probability, which is polynomial when the number of constraints $m$ is fixed. We derive this result via a novel meta-theorem, which relates the size of branch-and-bound trees and the integrality gap for random logconcave IPs. △ Less

Submitted 2 June, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2010.15525 [pdf, other]

doi 10.1287/ijoc.2021.1100

Self-Learning Threshold-Based Load Balancing

Authors: Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden, Debankur Mukherjee, Philip A. Whiting

Abstract: We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the flui… ▽ More We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions which guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments which support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns. △ Less

Submitted 11 September, 2023; v1 submitted 29 October, 2020; originally announced October 2020.

Comments: 52 pages, 6 figures

MSC Class: 60F17; 60K25 (Primary) 68M20 (Secondary) ACM Class: C.4; G.3

Journal ref: INFORMS Journal on Computing, 34(1):39-54, 2022

arXiv:2008.03478 [pdf, ps, other]

Achievable Stability in Redundancy Systems

Authors: Youri Raaijmakers, Sem Borst

Abstract: We consider a system with $N$ parallel servers where incoming jobs are immediately replicated to, say, $d$ servers. Each of the $N$ servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous s… ▽ More We consider a system with $N$ parallel servers where incoming jobs are immediately replicated to, say, $d$ servers. Each of the $N$ servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$. △ Less

Submitted 8 August, 2020; originally announced August 2020.

arXiv:2007.13615 [pdf, other]

doi 10.1007/s00453-022-00946-8

New FPT algorithms for finding the temporal hybridization number for sets of phylogenetic trees

Authors: Sander Borst, Leo van Iersel, Mark Jones, Steven Kelk

Abstract: We study the problem of finding a temporal hybridization network for a set of phylogenetic trees that minimizes the number of reticulations. First, we introduce an FPT algorithm for this problem on an arbitrary set of $m$ binary trees with $n$ leaves each with a running time of $O(5^k\cdot n\cdot m)$, where $k$ is the minimum temporal hybridization number. We also present the concept of temporal d… ▽ More We study the problem of finding a temporal hybridization network for a set of phylogenetic trees that minimizes the number of reticulations. First, we introduce an FPT algorithm for this problem on an arbitrary set of $m$ binary trees with $n$ leaves each with a running time of $O(5^k\cdot n\cdot m)$, where $k$ is the minimum temporal hybridization number. We also present the concept of temporal distance, which is a measure for how close a tree-child network is to being temporal. Then we introduce an algorithm for computing a tree-child network with temporal distance at most $d$ and at most $k$ reticulations in $O((8k)^d5^ k\cdot n\cdot m)$ time. Lastly, we introduce a $O(6^kk!\cdot k\cdot n^2)$ time algorithm for computing a minimum temporal hybridization network for a set of two nonbinary trees. We also provide an implementation of all algorithms and an experimental analysis on their performance. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2005.13353 [pdf, other]

Threshold-based rerouting and replication for resolving job-server affinity relations

Authors: Youri Raaijmakers, Sem Borst, Onno Boxma

Abstract: We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be a… ▽ More We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be available. Information about the job type can however be obtained while the job is in service; as the service progresses, the likelihood that the service speed of this job type is low increases, creating an incentive to execute the job on different, possibly faster, server(s). Two policies are considered: reroute the job to the other server pool, or replicate it there. We determine the effective load per server under both the rerouting and replication policy for completely unknown as well as partly known job types. We also examine the impact of these policies on the stability bound, and find that the uncertainty in job types may significantly degrade the performance. For (highly) unbalanced service speeds full replication achieves the largest stability bound while for (nearly) balanced service speeds no replication maximizes the stability bound. Finally, we discuss how the use of threshold-based policies can help improve the expected latency for completely or partly unknown job types. △ Less

Submitted 27 May, 2020; originally announced May 2020.

arXiv:1812.00979 [pdf, other]

Deep Reinforcement Learning for Intelligent Transportation Systems

Authors: Xiao-Yang Liu, Zihan Ding, Sem Borst, Anwar Walid

Abstract: Intelligent Transportation Systems (ITSs) are envisioned to play a critical role in improving traffic flow and reducing congestion, which is a pervasive issue impacting urban areas around the globe. Rapidly advancing vehicular communication and edge cloud computation technologies provide key enablers for smart traffic management. However, operating viable real-time actuation mechanisms on a practi… ▽ More Intelligent Transportation Systems (ITSs) are envisioned to play a critical role in improving traffic flow and reducing congestion, which is a pervasive issue impacting urban areas around the globe. Rapidly advancing vehicular communication and edge cloud computation technologies provide key enablers for smart traffic management. However, operating viable real-time actuation mechanisms on a practically relevant scale involves formidable challenges, e.g., policy iteration and conventional Reinforcement Learning (RL) techniques suffer from poor scalability due to state space explosion. Motivated by these issues, we explore the potential for Deep Q-Networks (DQN) to optimize traffic light control policies. As an initial benchmark, we establish that the DQN algorithms yield the "thresholding" policy in a single-intersection. Next, we examine the scalability properties of DQN algorithms and their performance in a linear network topology with several intersections along a main artery. We demonstrate that DQN algorithms produce intelligent behavior, such as the emergence of "greenwave" patterns, reflecting their ability to learn favorable traffic light actuations. △ Less

Submitted 3 December, 2018; originally announced December 2018.

arXiv:1806.05444 [pdf, other]

doi 10.1137/20M1323746

Scalable load balancing in networked systems: A survey of recent advances

Authors: Mark van der Boor, Sem C. Borst, Johan S. H. van Leeuwaarden, Debankur Mukherjee

Abstract: The basic load balancing scenario involves a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. We discuss recent advances on scalable load balancing schemes which provide favorable delay performance when $N$ grows large, and yet only require minimal implementation overhead. Join-the-Shortest-Queue (JSQ) yields vanishing delays as $N$ grows… ▽ More The basic load balancing scenario involves a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. We discuss recent advances on scalable load balancing schemes which provide favorable delay performance when $N$ grows large, and yet only require minimal implementation overhead. Join-the-Shortest-Queue (JSQ) yields vanishing delays as $N$ grows large, as in a centralized queueing arrangement, but involves a prohibitive communication burden. In contrast, power-of-$d$ or JSQ($d$) schemes that assign an incoming task to a server with the shortest queue among $d$ servers selected uniformly at random require little communication, but lead to constant delays. In order to examine this fundamental trade-off between delay performance and implementation overhead, we consider JSQ($d(N)$) schemes where the diversity parameter $d(N)$ depends on $N$ and investigate what growth rate of $d(N)$ is required to asymptotically match the optimal JSQ performance on fluid and diffusion scale. Stochastic coupling techniques and stochastic-process limits play an instrumental role in establishing the asymptotic optimality. We demonstrate how this methodology carries over to infinite-server settings, finite buffers, multiple dispatchers, servers arranged on graph topologies, and token-based load balancing including the popular Join-the-Idle-Queue (JIQ) scheme. In this way we provide a broad overview of the many recent advances in the field. This survey extends the short review presented at ICM 2018 (arXiv:1712.08555). △ Less

Submitted 4 November, 2021; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: To appear in SIAM Review. arXiv admin note: substantial text overlap with arXiv:1712.08555

Journal ref: SIAM Rev. 64 3 (2022) 554-622

arXiv:1712.08555 [pdf, other]

Scalable Load Balancing in Networked Systems: Universality Properties and Stochastic Coupling Methods

Authors: Mark van der Boor, Sem C. Borst, Johan S. H. van Leeuwaarden, Debankur Mukherjee

Abstract: We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario, consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server… ▽ More We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario, consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. A popular class of load balancing algorithms are so-called power-of-$d$ or JSQ($d$) policies, where an incoming task is assigned to a server with the shortest queue among $d$ servers selected uniformly at random. This class includes the Join-the-Shortest-Queue (JSQ) policy as a special case ($d = N$), which has strong stochastic optimality properties and yields a mean waiting time that vanishes as $N$ grows large for any fixed subcritical load. However, a nominal implementation of the JSQ policy involves a prohibitive communication burden in large-scale deployments. In contrast, a random assignment policy ($d = 1$) does not entail any communication overhead, but the mean waiting time remains constant as $N$ grows large for any fixed positive load. In order to examine the fundamental trade-off between performance and implementation overhead, we consider an asymptotic regime where $d(N)$ depends on $N$. We investigate what growth rate of $d(N)$ is required to match the performance of the JSQ policy on fluid and diffusion scale. The results demonstrate that the asymptotics for the JSQ($d(N)$) policy are insensitive to the exact growth rate of $d(N)$, as long as the latter is sufficiently fast, implying that the optimality of the JSQ policy can asymptotically be preserved while dramatically reducing the communication overhead. We additionally show how the communication overhead can be reduced yet further by the so-called Join-the-Idle-Queue scheme, leveraging memory at the dispatcher. △ Less

Submitted 22 December, 2017; originally announced December 2017.

Comments: Survey paper. Contribution to the Proceedings of the ICM 2018

arXiv:1707.05866 [pdf, other]

doi 10.1145/3179417

Asymptotically Optimal Load Balancing Topologies

Authors: Debankur Mukherjee, Sem C. Borst, Johan S. H. van Leeuwaarden

Abstract: We consider a system of $N$ servers inter-connected by some underlying graph topology $G_N$. Tasks arrive at the various servers as independent Poisson processes of rate $λ$. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in $G_N$. Tasks have unit-mean exponential service times and leave the system up… ▽ More We consider a system of $N$ servers inter-connected by some underlying graph topology $G_N$. Tasks arrive at the various servers as independent Poisson processes of rate $λ$. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in $G_N$. Tasks have unit-mean exponential service times and leave the system upon service completion. The above model has been extensively investigated in the case $G_N$ is a clique. Since the servers are exchangeable in that case, the queue length process is quite tractable, and it has been proved that for any $λ< 1$, the fraction of servers with two or more tasks vanishes in the limit as $N \to \infty$. For an arbitrary graph $G_N$, the lack of exchangeability severely complicates the analysis, and the queue length process tends to be worse than for a clique. Accordingly, a graph $G_N$ is said to be $N$-optimal or $\sqrt{N}$-optimal when the occupancy process on $G_N$ is equivalent to that on a clique on an $N$-scale or $\sqrt{N}$-scale, respectively. We prove that if $G_N$ is an Erdős-Rényi random graph with average degree $d(N)$, then it is with high probability $N$-optimal and $\sqrt{N}$-optimal if $d(N) \to \infty$ and $d(N) / (\sqrt{N} \log(N)) \to \infty$ as $N \to \infty$, respectively. This demonstrates that optimality can be maintained at $N$-scale and $\sqrt{N}$-scale while reducing the number of connections by nearly a factor $N$ and $\sqrt{N} / \log(N)$ compared to a clique, provided the topology is suitably random. It is further shown that if $G_N$ contains $Θ(N)$ bounded-degree nodes, then it cannot be $N$-optimal. In addition, we establish that an arbitrary graph $G_N$ is $N$-optimal when its minimum degree is $N - o(N)$, and may not be $N$-optimal even when its minimum degree is $c N + o(N)$ for any $0 < c < 1/2$. △ Less

Submitted 6 April, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

Comments: A few relevant results from arXiv:1612.00723 are included for convenience

Journal ref: Proc. ACM Meas. Anal. Comput. Syst. 2 1 (2018)

arXiv:1706.01059 [pdf, other]

Load Balancing in Large-Scale Systems with Multiple Dispatchers

Authors: Mark van der Boor, Sem Borst, Johan van Leeuwaarden

Abstract: Load balancing algorithms play a crucial role in delivering robust application performance in data centers and cloud networks. Recently, strong interest has emerged in Join-the-Idle-Queue (JIQ) algorithms, which rely on tokens issued by idle servers in dispatching tasks and outperform power-of-$d$ policies. Specifically, JIQ strategies involve minimal information exchange, and yet achieve zero blo… ▽ More Load balancing algorithms play a crucial role in delivering robust application performance in data centers and cloud networks. Recently, strong interest has emerged in Join-the-Idle-Queue (JIQ) algorithms, which rely on tokens issued by idle servers in dispatching tasks and outperform power-of-$d$ policies. Specifically, JIQ strategies involve minimal information exchange, and yet achieve zero blocking and wait in the many-server limit. The latter property prevails in a multiple-dispatcher scenario when the loads are strictly equal among dispatchers. For various reasons it is not uncommon however for skewed load patterns to occur. We leverage product-form representations and fluid limits to establish that the blocking and wait then no longer vanish, even for arbitrarily low overall load. Remarkably, it is the least-loaded dispatcher that throttles tokens and leaves idle servers stranded, thus acting as bottleneck. Motivated by the above issues, we introduce two enhancements of the ordinary JIQ scheme where tokens are either distributed non-uniformly or occasionally exchanged among the various dispatchers. We prove that these extensions can achieve zero blocking and wait in the many-server limit, for any subcritical overall load and arbitrarily skewed load profiles. Extensive simulation experiments demonstrate that the asymptotic results are highly accurate, even for moderately sized systems. △ Less

Submitted 4 June, 2017; originally announced June 2017.

arXiv:1703.10575 [pdf, other]

Delay versus Stickiness Violation Trade-offs for Load Balancing in Large-Scale Data Centers

Authors: Qingkai Liang, Sem Borst

Abstract: Most load balancing techniques implemented in current data centers tend to rely on a map** from packets to server IP addresses through a hash value calculated from the flow five-tuple. The hash calculation allows extremely fast packet forwarding and provides flow `stickiness', meaning that all packets belonging to the same flow get dispatched to the same server. Unfortunately, such static hashin… ▽ More Most load balancing techniques implemented in current data centers tend to rely on a map** from packets to server IP addresses through a hash value calculated from the flow five-tuple. The hash calculation allows extremely fast packet forwarding and provides flow `stickiness', meaning that all packets belonging to the same flow get dispatched to the same server. Unfortunately, such static hashing may not yield an optimal degree of load balancing, e.g., due to variations in server processing speeds or traffic patterns. On the other hand, dynamic schemes, such as the Join-the-Shortest-Queue (JSQ) scheme, provide a natural way to mitigate load imbalances, but at the expense of stickiness violation. In the present paper we examine the fundamental trade-off between stickiness violation and packet-level latency performance in large-scale data centers. We establish that stringent flow stickiness carries a significant performance penalty in terms of packet-level delay. Moreover, relaxing the stickiness requirement by a minuscule amount is highly effective in clip** the tail of the latency distribution. We further propose a bin-based load balancing scheme that achieves a good balance among scalability, stickiness violation and packet-level delay performance. Extensive simulation experiments corroborate the analytical results and validate the effectiveness of the bin-based load balancing scheme. △ Less

Submitted 8 July, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

arXiv:1703.08373 [pdf, other]

doi 10.1145/3084463

Optimal Service Elasticity in Large-Scale Distributed Systems

Authors: Debankur Mukherjee, Souvik Dhara, Sem Borst, Johan S. H. van Leeuwaarden

Abstract: A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance… ▽ More A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all. Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. △ Less

Submitted 24 March, 2017; originally announced March 2017.

Comments: Accepted in ACM SIGMETRICS, Urbana-Champaign, Illinois, USA, 2017

Journal ref: Proc. ACM Meas. Anal. Comput. Syst. 1 1 (2017)

arXiv:1611.05070 [pdf, ps, other]

doi 10.1016/j.dam.2016.10.009

Scaling Laws for Maximum Coloring of Random Geometric Graphs

Authors: Sem Borst, Milan Bradonjić

Abstract: We examine maximum vertex coloring of random geometric graphs, in an arbitrary but fixed dimension, with a constant number of colors. Since this problem is neither scale-invariant nor smooth, the usual methodology to obtain limit laws cannot be applied. We therefore leverage different concepts based on subadditivity to establish convergence laws for the maximum number of vertices that can be color… ▽ More We examine maximum vertex coloring of random geometric graphs, in an arbitrary but fixed dimension, with a constant number of colors. Since this problem is neither scale-invariant nor smooth, the usual methodology to obtain limit laws cannot be applied. We therefore leverage different concepts based on subadditivity to establish convergence laws for the maximum number of vertices that can be colored. For the constants that appear in these results, we provide the exact value in dimension one, and upper and lower bounds in higher dimensions. △ Less

Submitted 15 November, 2016; originally announced November 2016.

MSC Class: 60C05; 60D05; 60G55; 05C15; 05C80; 68R05; 68R10

arXiv:1509.08665 [pdf]

doi 10.1007/s11134-015-9438-x

On the Scalability and Message Count of Trickle-based Broadcasting Schemes

Authors: Thomas M. M. Meyfroyt, Sem C. Borst, Onno J. Boxma, Dee Denteneer

Abstract: As the use of wireless sensor networks increases, the need for efficient and reliable broadcasting algorithms grows. Ideally, a broadcasting algorithm should have the ability to quickly disseminate data, while kee** the number of transmissions low. In this paper, we analyze the popular Trickle algorithm, which has been proposed as a suitable communication protocol for code maintenance and propag… ▽ More As the use of wireless sensor networks increases, the need for efficient and reliable broadcasting algorithms grows. Ideally, a broadcasting algorithm should have the ability to quickly disseminate data, while kee** the number of transmissions low. In this paper, we analyze the popular Trickle algorithm, which has been proposed as a suitable communication protocol for code maintenance and propagation in wireless sensor networks. We show that the broadcasting process of a network using Trickle can be modeled by a Markov chain and that this chain falls under a class of Markov chains, closely related to residual lifetime distributions. It is then shown that this class of Markov chains admits a stationary distribution of a special form. These results are used to analyze the Trickle algorithm and its message count. Our results prove conjectures made in the literature concerning the effect of a listen-only period. Besides providing a mathematical analysis of the algorithm, we propose a generalized version of Trickle, with an additional parameter defining the length of a listen-only period. △ Less

Submitted 29 September, 2015; originally announced September 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1407.6034

MSC Class: 60J05 60J20 90B18

Journal ref: Queueing Systems: Volume 81, Issue 2 (2015), Page 203-230

arXiv:1407.6396 [pdf, other]

doi 10.1016/j.peva.2015.01.001.

A Data Propagation Model for Wireless Gossi**

Authors: Thomas M. M. Meyfroyt, Sem C. Borst, Onno J. Boxma, Dee Denteneer

Abstract: Wireless sensor networks require communication protocols for efficiently propagating data in a distributed fashion. The Trickle algorithm is a popular protocol serving as the basis for many of the current standard communication protocols. In this paper we develop a mathematical model describing how Trickle propagates new data across a network consisting of nodes placed on a line. The model is anal… ▽ More Wireless sensor networks require communication protocols for efficiently propagating data in a distributed fashion. The Trickle algorithm is a popular protocol serving as the basis for many of the current standard communication protocols. In this paper we develop a mathematical model describing how Trickle propagates new data across a network consisting of nodes placed on a line. The model is analyzed and asymptotic results on the hop count and end-to-end delay distributions in terms of the Trickle parameters and network density are given. Additionally, we show that by only a small modification of the Trickle algorithm the expected end-to-end delay can be greatly decreased. Lastly, we demonstrate how one can derive the exact hop count and end-to-end delay distributions for small network sizes. △ Less

Submitted 4 November, 2014; v1 submitted 23 July, 2014; originally announced July 2014.

MSC Class: 60K20 ACM Class: C.2.1

Journal ref: Performance Evaluation 85-86 (2015) 19-32

arXiv:1407.6034 [pdf, ps, other]

doi 10.1145/2637364.2591981

Data Dissemination Performance in Large-Scale Sensor Networks

Authors: Thomas M. M. Meyfroyt, Sem C. Borst, Onno J. Boxma, Dee Denteneer

Abstract: As the use of wireless sensor networks increases, the need for (energy-)efficient and reliable broadcasting algorithms grows. Ideally, a broadcasting algorithm should have the ability to quickly disseminate data, while kee** the number of transmissions low. In this paper we develop a model describing the message count in large-scale wireless sensor networks. We focus our attention on the popular… ▽ More As the use of wireless sensor networks increases, the need for (energy-)efficient and reliable broadcasting algorithms grows. Ideally, a broadcasting algorithm should have the ability to quickly disseminate data, while kee** the number of transmissions low. In this paper we develop a model describing the message count in large-scale wireless sensor networks. We focus our attention on the popular Trickle algorithm, which has been proposed as a suitable communication protocol for code maintenance and propagation in wireless sensor networks. Besides providing a mathematical analysis of the algorithm, we propose a generalized version of Trickle, with an additional parameter defining the length of a listen-only period. This generalization proves to be useful for optimizing the design and usage of the algorithm. For single-cell networks we show how the message count increases with the size of the network and how this depends on the Trickle parameters. Furthermore, we derive distributions of inter-broadcasting times and investigate their asymptotic behavior. Our results prove conjectures made in the literature concerning the effect of a listen-only period. Additionally, we develop an approximation for the expected number of transmissions in multi-cell networks. All results are validated by simulations. △ Less

Submitted 22 July, 2014; originally announced July 2014.

MSC Class: 90B18 ACM Class: C.2.1

Journal ref: ACM SIGMETRICS Performance Evaluation Review, Volume 42 Issue 1, June 2014, Pages 395-406

arXiv:1403.3325 [pdf, ps, other]

Slow transitions, slow mixing and starvation in dense random-access networks

Authors: Alessandro Zocca, Sem C. Borst, Johan S. H. van Leeuwaarden

Abstract: We consider dense wireless random-access networks, modeled as systems of particles with hard-core interaction. The particles represent the network users that try to become active after an exponential back-off time, and stay active for an exponential transmission time. Due to wireless interference, active users prevent other nearby users from simultaneous activity, which we describe as hard-core in… ▽ More We consider dense wireless random-access networks, modeled as systems of particles with hard-core interaction. The particles represent the network users that try to become active after an exponential back-off time, and stay active for an exponential transmission time. Due to wireless interference, active users prevent other nearby users from simultaneous activity, which we describe as hard-core interaction on a conflict graph. We show that dense networks with aggressive back-off schemes lead to extremely slow transitions between dominant states, and inevitably cause long mixing times and starvation effects. △ Less

Submitted 13 March, 2014; originally announced March 2014.

Comments: 29 pages, 5 figures

arXiv:1307.1532 [pdf, ps, other]

Delay performance in random-access grid networks

Authors: Alessandro Zocca, Sem C. Borst, Johan S. H. van Leeuwaarden, Francesca R. Nardi

Abstract: We examine the impact of torpid mixing and meta-stability issues on the delay performance in wireless random-access networks. Focusing on regular meshes as prototypical scenarios, we show that the mean delays in an $L\times L$ toric grid with normalized load $ρ$ are of the order $(\frac{1}{1-ρ})^L$. This superlinear delay scaling is to be contrasted with the usual linear growth of the order… ▽ More We examine the impact of torpid mixing and meta-stability issues on the delay performance in wireless random-access networks. Focusing on regular meshes as prototypical scenarios, we show that the mean delays in an $L\times L$ toric grid with normalized load $ρ$ are of the order $(\frac{1}{1-ρ})^L$. This superlinear delay scaling is to be contrasted with the usual linear growth of the order $\frac{1}{1-ρ}$ in conventional queueing networks. The intuitive explanation for the poor delay characteristics is that (i) high load requires a high activity factor, (ii) a high activity factor implies extremely slow transitions between dominant activity states, and (iii) slow transitions cause starvation and hence excessively long queues and delays. Our proof method combines both renewal and conductance arguments. A critical ingredient in quantifying the long transition times is the derivation of the communication height of the uniformized Markov chain associated with the activity process. We also discuss connections with Glauber dynamics, conductance and mixing times. Our proof framework can be applied to other topologies as well, and is also relevant for the hard-core model in statistical physics and the sampling from independent sets using single-site update Markov chains. △ Less

Submitted 5 July, 2013; originally announced July 2013.

arXiv:1305.3774 [pdf, ps, other]

Delay Performance and Mixing Times in Random-Access Networks

Authors: Niek Bouman, Sem Borst, Johan van Leeuwaarden

Abstract: We explore the achievable delay performance in wireless random-access networks. While relatively simple and inherently distributed in nature, suitably designed queue-based random-access schemes provide the striking capability to match the optimal throughput performance of centralized scheduling mechanisms in a wide range of scenarios. The specific type of activation rules for which throughput opti… ▽ More We explore the achievable delay performance in wireless random-access networks. While relatively simple and inherently distributed in nature, suitably designed queue-based random-access schemes provide the striking capability to match the optimal throughput performance of centralized scheduling mechanisms in a wide range of scenarios. The specific type of activation rules for which throughput optimality has been established, may however yield excessive queues and delays. Motivated by that issue, we examine whether the poor delay performance is inherent to the basic operation of these schemes, or caused by the specific kind of activation rules. We derive delay lower bounds for queue-based activation rules, which offer fundamental insight in the cause of the excessive delays. For fixed activation rates we obtain lower bounds indicating that delays and mixing times can grow dramatically with the load in certain topologies as well. △ Less

Submitted 16 May, 2013; originally announced May 2013.

arXiv:1302.5945 [pdf, other]

Queue-Based Random-Access Algorithms: Fluid Limits and Stability Issues

Authors: Javad Ghaderi, Sem Borst, Phil Whiting

Abstract: We use fluid limits to explore the (in)stability properties of wireless networks with queue-based random-access algorithms. Queue-based random-access schemes are simple and inherently distributed in nature, yet provide the capability to match the optimal throughput performance of centralized scheduling mechanisms in a wide range of scenarios. Unfortunately, the type of activation rules for which t… ▽ More We use fluid limits to explore the (in)stability properties of wireless networks with queue-based random-access algorithms. Queue-based random-access schemes are simple and inherently distributed in nature, yet provide the capability to match the optimal throughput performance of centralized scheduling mechanisms in a wide range of scenarios. Unfortunately, the type of activation rules for which throughput optimality has been established, may result in excessive queue lengths and delays. The use of more aggressive/persistent access schemes can improve the delay performance, but does not offer any universal maximum-stability guarantees. In order to gain qualitative insight and investigate the (in)stability properties of more aggressive/persistent activation rules, we examine fluid limits where the dynamics are scaled in space and time. In some situations, the fluid limits have smooth deterministic features and maximum stability is maintained, while in other scenarios they exhibit random oscillatory characteristics, giving rise to major technical challenges. In the latter regime, more aggressive access schemes continue to provide maximum stability in some networks, but may cause instability in others. Simulation experiments are conducted to illustrate and validate the analytical results. △ Less

Submitted 24 February, 2013; originally announced February 2013.

arXiv:1302.2824 [pdf, ps, other]

Lingering Issues in Distributed Scheduling

Authors: Florian Simatos, Niek Bouman, Sem Borst

Abstract: Recent advances have resulted in queue-based algorithms for medium access control which operate in a distributed fashion, and yet achieve the optimal throughput performance of centralized scheduling algorithms. However, fundamental performance bounds reveal that the "cautious" activation rules involved in establishing throughput optimality tend to produce extremely large delays, typically growing… ▽ More Recent advances have resulted in queue-based algorithms for medium access control which operate in a distributed fashion, and yet achieve the optimal throughput performance of centralized scheduling algorithms. However, fundamental performance bounds reveal that the "cautious" activation rules involved in establishing throughput optimality tend to produce extremely large delays, typically growing exponentially in 1/(1-r), with r the load of the system, in contrast to the usual linear growth. Motivated by that issue, we explore to what extent more "aggressive" schemes can improve the delay performance. Our main finding is that aggressive activation rules induce a lingering effect, where individual nodes retain possession of a shared resource for excessive lengths of time even while a majority of other nodes idle. Using central limit theorem type arguments, we prove that the idleness induced by the lingering effect may cause the delays to grow with 1/(1-r) at a quadratic rate. To the best of our knowledge, these are the first mathematical results illuminating the lingering effect and quantifying the performance impact. In addition extensive simulation experiments are conducted to illustrate and validate the various analytical results. △ Less

Submitted 23 May, 2013; v1 submitted 12 February, 2013; originally announced February 2013.

arXiv:1209.2859 [pdf, other]

Mixing Properties of CSMA Networks on Partite Graphs

Authors: Alessandro Zocca, Sem C. Borst, Johan S. H. van Leeuwaarden

Abstract: We consider a stylized stochastic model for a wireless CSMA network. Experimental results in prior studies indicate that the model provides remarkably accurate throughput estimates for IEEE 802.11 systems. In particular, the model offers an explanation for the severe spatial unfairness in throughputs observed in such networks with asymmetric interference conditions. Even in symmetric scenarios, ho… ▽ More We consider a stylized stochastic model for a wireless CSMA network. Experimental results in prior studies indicate that the model provides remarkably accurate throughput estimates for IEEE 802.11 systems. In particular, the model offers an explanation for the severe spatial unfairness in throughputs observed in such networks with asymmetric interference conditions. Even in symmetric scenarios, however, it may take a long time for the activity process to move between dominant states, giving rise to potential starvation issues. In order to gain insight in the transient throughput characteristics and associated starvation effects, we examine in the present paper the behavior of the transition time between dominant activity states. We focus on partite interference graphs, and establish how the magnitude of the transition time scales with the activation rate and the sizes of the various network components. We also prove that in several cases the scaled transition time has an asymptotically exponential distribution as the activation rate grows large, and point out interesting connections with related exponentiality results for rare events and meta-stability phenomena in statistical physics. In addition, we investigate the convergence rate to equilibrium of the activity process in terms of mixing times. △ Less

Submitted 13 September, 2012; originally announced September 2012.

Comments: Valuetools, 6th International Conference on Performance Evaluation Methodologies and Tools, October 9-12, 2012, Cargèse, France

arXiv:1201.2292 [pdf, ps, other]

Network iso-elasticity and weighted $α$-fairness

Authors: S. C. Borst, N. S. Walton, A. P. Zwart

Abstract: When a communication network's capacity increases, it is natural to want the bandwidth allocated to increase to exploit this capacity. But, if the same relative capacity increase occurs at each network resource, it is also natural to want each user to see the same relative benefit, so the bandwidth allocated to each route should remain proportional. We will be interested in bandwidth allocations w… ▽ More When a communication network's capacity increases, it is natural to want the bandwidth allocated to increase to exploit this capacity. But, if the same relative capacity increase occurs at each network resource, it is also natural to want each user to see the same relative benefit, so the bandwidth allocated to each route should remain proportional. We will be interested in bandwidth allocations which scale in this \textit{iso-elastic} manner and, also, maximize a utility function. Utility optimizing bandwidth allocations have been frequently studied, and a popular choice of utility function are the weighted $α$-fair utility functions introduced by Mo and Walrand \cite{MoWa00}. Because weighted $α$-fair utility functions possess this iso-elastic property, they are frequently used to form fluid models of bandwidth sharing networks. In this paper, we present results that show, in many settings, the only utility functions which are iso-elastic are weighted $α$-fair utility functions. Thus, if bandwidth is allocated according to a network utility function which scales with relative network changes then that utility function must be a weighted $α$-fair utility function, and hence, a control protocol that is robust to the future relative changes in network capacity and usage ought to allocate bandwidth inorder to maximize a weighted $α$-fair utility function. △ Less

Submitted 11 January, 2012; originally announced January 2012.

arXiv:1012.3364 [pdf, ps, other]

Stability of Random Admissible-Set Scheduling in Spatially Continuous Wireless Systems

Authors: Niek Bouman, Sem Borst, Johan van Leeuwaarden

Abstract: We examine the stability of wireless networks whose users are distributed over a compact space. A subset of users is called {\it admissible} when their simultaneous activity obeys the prevailing interference constraints and, in each time slot, an admissible subset of users is selected uniformly at random to transmit one packet. We show that, under a mild condition, this random admissible-set sched… ▽ More We examine the stability of wireless networks whose users are distributed over a compact space. A subset of users is called {\it admissible} when their simultaneous activity obeys the prevailing interference constraints and, in each time slot, an admissible subset of users is selected uniformly at random to transmit one packet. We show that, under a mild condition, this random admissible-set scheduling mechanism achieves maximum stability in a broad set of scenarios, and in particular in symmetric cases. The proof relies on a description of the system as a measure-valued process and the identification of a Lyapunov function. △ Less

Submitted 4 September, 2013; v1 submitted 15 December, 2010; originally announced December 2010.

Showing 1–33 of 33 results for author: Borst, S