-
Fork-join and redundancy systems with heavy-tailed job sizes
Authors:
Youri Raaijmakers,
Sem Borst,
Onno Boxma
Abstract:
We investigate the tail asymptotics of the response time distribution for the cancel-on-start (c.o.s.) and cancel-on-completion (c.o.c.) variants of redundancy-$d$ scheduling and the fork-join model with heavy-tailed job sizes. We present bounds, which only differ in the pre-factor, for the tail probability of the response time in the case of the first-come first-served (FCFS) discipline. For the…
▽ More
We investigate the tail asymptotics of the response time distribution for the cancel-on-start (c.o.s.) and cancel-on-completion (c.o.c.) variants of redundancy-$d$ scheduling and the fork-join model with heavy-tailed job sizes. We present bounds, which only differ in the pre-factor, for the tail probability of the response time in the case of the first-come first-served (FCFS) discipline. For the c.o.s. variant we restrict ourselves to redundancy-$d$ scheduling, which is a special case of the fork-join model. In particular, for regularly varying job sizes with tail index $-ν$ the tail index of the response time for the c.o.s. variant of redundancy-$d$ equals $-\min\{d_{\mathrm{cap}}(ν-1),ν\}$, where $d_{\mathrm{cap}} = \min\{d,N-k\}$, $N$ is the number of servers and $k$ is the integer part of the load. This result indicates that for $d_{\mathrm{cap}} < \fracν{ν-1}$ the waiting time component is dominant, whereas for $d_{\mathrm{cap}} > \fracν{ν-1}$ the job size component is dominant. Thus, having $d = \lceil \min\{\fracν{ν-1},N-k\} \rceil$ replicas is sufficient to achieve the optimal asymptotic tail behavior of the response time. For the c.o.c. variant of the fork-join($n_{\mathrm{F}},n_{\mathrm{J}}$) model the tail index of the response time, under some assumptions on the load, equals $1-ν$ and $1-(n_{\mathrm{F}}+1-n_{\mathrm{J}})ν$, for identical and i.i.d. replicas, respectively; here the waiting time component is always dominant.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Reinforcement learning for Admission Control in 5G Wireless Networks
Authors:
Youri Raaijmakers,
Silvio Mandelli,
Mark Doll
Abstract:
The key challenge in admission control in wireless networks is to strike an optimal trade-off between the blocking probability for new requests while minimizing the drop** probability of ongoing requests. We consider two approaches for solving the admission control problem: i) the typically adopted threshold policy and ii) our proposed policy relying on reinforcement learning with neural network…
▽ More
The key challenge in admission control in wireless networks is to strike an optimal trade-off between the blocking probability for new requests while minimizing the drop** probability of ongoing requests. We consider two approaches for solving the admission control problem: i) the typically adopted threshold policy and ii) our proposed policy relying on reinforcement learning with neural networks. Extensive simulation experiments are conducted to analyze the performance of both policies. The results show that the reinforcement learning policy outperforms the threshold-based policies in the scenario with heterogeneous time-varying arrival rates and multiple user equipment types, proving its applicability in realistic wireless network scenarios.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Comparison of the FCFS and PS discipline in Redundancy Systems
Authors:
Youri Raaijmakers
Abstract:
We consider the c.o.c. redundancy system with $N$ parallel servers where incoming jobs are immediately replicated to $d$ servers chosen uniformly at random (without replacement). A job finishes service as soon as the first replica is completed, after which all the remaining replicas are abandoned. We compare the performance of the first-come first-served (FCFS) and processor-sharing (PS) disciplin…
▽ More
We consider the c.o.c. redundancy system with $N$ parallel servers where incoming jobs are immediately replicated to $d$ servers chosen uniformly at random (without replacement). A job finishes service as soon as the first replica is completed, after which all the remaining replicas are abandoned. We compare the performance of the first-come first-served (FCFS) and processor-sharing (PS) discipline based on the stability condition, the tail behavior of the latency and the expected latency.
△ Less
Submitted 24 March, 2021;
originally announced April 2021.
-
Achievable Stability in Redundancy Systems
Authors:
Youri Raaijmakers,
Sem Borst
Abstract:
We consider a system with $N$ parallel servers where incoming jobs are immediately replicated to, say, $d$ servers. Each of the $N$ servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous s…
▽ More
We consider a system with $N$ parallel servers where incoming jobs are immediately replicated to, say, $d$ servers. Each of the $N$ servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
Threshold-based rerouting and replication for resolving job-server affinity relations
Authors:
Youri Raaijmakers,
Sem Borst,
Onno Boxma
Abstract:
We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be a…
▽ More
We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be available. Information about the job type can however be obtained while the job is in service; as the service progresses, the likelihood that the service speed of this job type is low increases, creating an incentive to execute the job on different, possibly faster, server(s). Two policies are considered: reroute the job to the other server pool, or replicate it there.
We determine the effective load per server under both the rerouting and replication policy for completely unknown as well as partly known job types. We also examine the impact of these policies on the stability bound, and find that the uncertainty in job types may significantly degrade the performance. For (highly) unbalanced service speeds full replication achieves the largest stability bound while for (nearly) balanced service speeds no replication maximizes the stability bound. Finally, we discuss how the use of threshold-based policies can help improve the expected latency for completely or partly unknown job types.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Stability of Redundancy Systems with Processor Sharing
Authors:
Youri Raaijmakers,
Sem Borst,
Onno Boxma
Abstract:
We investigate the stability condition for redundancy-d systems where each of the servers follows a processor-sharing (PS) discipline. We allow for generally distributed job sizes, with possible dependence among the d replica sizes being governed by an arbitrary joint distribution. We establish that the stability condition is characterized by the expectation of the minimum of d replica sizes being…
▽ More
We investigate the stability condition for redundancy-d systems where each of the servers follows a processor-sharing (PS) discipline. We allow for generally distributed job sizes, with possible dependence among the d replica sizes being governed by an arbitrary joint distribution. We establish that the stability condition is characterized by the expectation of the minimum of d replica sizes being less than the mean interarrival time per server. In the special case of identical replicas, the stability condition is insensitive to the job size distribution given its mean, and the stability condition is inversely proportional to the number of replicas. In the special case of i.i.d. replicas, the stability threshold decreases (increases) in the number of replicas for job size distributions that are NBU (NWU). We also discuss extensions to scenarios with heterogeneous servers.
△ Less
Submitted 6 March, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Redundancy scheduling with scaled Bernoulli service requirements
Authors:
Youri Raaijmakers,
Sem Borst,
Onno Boxma
Abstract:
Redundancy scheduling has emerged as a powerful strategy for improving response times in parallel-server systems. The key feature in redundancy scheduling is replication of a job upon arrival by dispatching replicas to different servers. Redundant copies are abandoned as soon as the first of these replicas finishes service. By creating multiple service opportunities, redundancy scheduling increase…
▽ More
Redundancy scheduling has emerged as a powerful strategy for improving response times in parallel-server systems. The key feature in redundancy scheduling is replication of a job upon arrival by dispatching replicas to different servers. Redundant copies are abandoned as soon as the first of these replicas finishes service. By creating multiple service opportunities, redundancy scheduling increases the chance of a fast response from a server that is quick to provide service, and mitigates the risk of a long delay incurred when a single selected server turns out to be slow.
The diversity enabled by redundant requests has been found to strongly improve the response time performance, especially in case of highly variable service requirements. Analytical results for redundancy scheduling are unfortunately scarce however, and even the stability condition has largely remained elusive so far, except for exponentially distributed service requirements. In order to gain further insight in the role of the service requirement distribution, we explore the behavior of redundancy scheduling for scaled Bernoulli service requirements. We establish a sufficient stability condition for generally distributed service requirements and we show that, for scaled Bernoulli service requirements, this condition is also asymptotically nearly necessary. This stability condition differs drastically from the exponential case, indicating that the stability condition depends on the service requirements in a sensitive and intricate manner.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.