-
Balanced Nonadaptive Redundancy Scheduling
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
Distributed computing systems implement redundancy to reduce the job completion time and variability. Despite a large body of work about computing redundancy, the analytical performance evaluation of redundancy techniques in queuing systems is still an open problem. In this work, we take one step forward to analyze the performance of scheduling policies in systems with redundancy. In particular, w…
▽ More
Distributed computing systems implement redundancy to reduce the job completion time and variability. Despite a large body of work about computing redundancy, the analytical performance evaluation of redundancy techniques in queuing systems is still an open problem. In this work, we take one step forward to analyze the performance of scheduling policies in systems with redundancy. In particular, we study the pattern of shared servers among replicas of different jobs. To this end, we employ combinatorics and graph theory and define and derive performance indicators using the statistics of the overlaps. We consider two classical nonadaptive scheduling policies: random and round-robin. We then propose a scheduling policy based on combinatorial block designs. Compared with conventional scheduling, the proposed scheduling improves the performance indicators. We study the expansion property of the graphs associated with round-robin and block design-based policies. It turns out the superior performance of the block design-based policy results from better expansion properties of its associated graph. As indicated by the performance indicators, the simulation results show that the block design-based policy outperforms random and round-robin scheduling in different scenarios. Specifically, it reduces the average waiting time in the queue to up to 25% compared to the random policy and up to 100% compared to the round-robin policy.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Efficient Replication for Straggler Mitigation in Distributed Computing
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the case when the batches do not overlap and, using the results from majorization theory, show that, for a general class of workers' service time distributions, a ba…
▽ More
Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the case when the batches do not overlap and, using the results from majorization theory, show that, for a general class of workers' service time distributions, a balanced assignment of batches to workers minimizes the average job compute time. We next show that this balanced assignment of non-overlap** batches achieves lower average job compute time compared to the overlap** schemes proposed in the literature. Furthermore, we derive the optimum redundancy level as a function of the service time distribution at workers. We show that the redundancy level that minimizes average job compute time is not necessarily the same as the redundancy level that maximizes the predictability of job compute time, and thus there exists a trade-off between optimizing the two metrics. Finally, by running experiments on Google cluster traces, we observe that redundancy can reduce the compute time of the jobs in Google clusters by an order of magnitude, and that the optimum level of redundancy depends on the distribution of tasks' service time.
△ Less
Submitted 27 December, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Data Freshness in Leader-Based Replicated Storage
Authors:
Amir Behrouzi-Far,
Emina Soljanin,
Roy D. Yates
Abstract:
Leader-based data replication improves consistency in highly available distributed storage systems via sequential writes to the leader nodes. After a write has been committed by the leaders, follower nodes are written by a multicast mechanism and are only guaranteed to be eventually consistent. With Age of Information (AoI) as the freshness metric, we characterize how the number of leaders affects…
▽ More
Leader-based data replication improves consistency in highly available distributed storage systems via sequential writes to the leader nodes. After a write has been committed by the leaders, follower nodes are written by a multicast mechanism and are only guaranteed to be eventually consistent. With Age of Information (AoI) as the freshness metric, we characterize how the number of leaders affects the freshness of the data retrieved by an instantaneous read query. In particular, we derive the average age of a read query for a deterministic model for the leader writing time and a probabilistic model for the follower writing time. We obtain a closed-form expression for the average age for exponentially distributed follower writing time. Our numerical results show that, depending on the relative speed of the write operation to the two groups of nodes, there exists an optimal number of leaders which minimizes the average age of the retrieved data, and that this number increases as the relative speed of writing on leaders increases.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Data Replication for Reducing Computing Time in Distributed Systems with Stragglers
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
In distributed computing systems with stragglers, various forms of redundancy can improve the average delay performance. We study the optimal replication of data in systems where the job execution time is a stochastically decreasing and convex random variable. We show that in such systems, the optimum assignment policy is the balanced replication of disjoint batches of data. Furthermore, for Expon…
▽ More
In distributed computing systems with stragglers, various forms of redundancy can improve the average delay performance. We study the optimal replication of data in systems where the job execution time is a stochastically decreasing and convex random variable. We show that in such systems, the optimum assignment policy is the balanced replication of disjoint batches of data. Furthermore, for Exponential and Shifted-Exponential service times, we derive the optimum redundancy levels for minimizing both expected value and the variance of the job completion time. Our analysis shows that, the optimum redundancy level may not be the same for the two metrics, thus there is a trade-off between reducing the expected value of the completion time and reducing its variance.
△ Less
Submitted 31 December, 2019; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Scheduling in the Presence of Data Intensive Compute Jobs
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
We study the performance of non-adaptive scheduling policies in computing systems with multiple servers. Compute jobs are mostly regular, with modest service requirements. However, there are sporadic data intensive jobs, whose expected service time is much higher than that of the regular jobs. Forthis model, we are interested in the effect of scheduling policieson the average time a job spends in…
▽ More
We study the performance of non-adaptive scheduling policies in computing systems with multiple servers. Compute jobs are mostly regular, with modest service requirements. However, there are sporadic data intensive jobs, whose expected service time is much higher than that of the regular jobs. Forthis model, we are interested in the effect of scheduling policieson the average time a job spends in the system. To this end, we introduce two performance indicators in a simplified, only-arrival system. We believe that these performance indicators are good predictors of the relative performance of the policies in the queuing system, which is supported by simulations results.
△ Less
Submitted 31 December, 2019; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Cooperative Beamforming in Cognitive Radio Relay Networks Using Amplify-and-Forward Relaying Technique
Authors:
Amir Behrouzi-Far,
Saeideh Mohammadkhani
Abstract:
In this work, we study the problem of relay beamforming for an underlay cognitive radio relay network using amplify-and-forward (AF) relaying technique. We consider a cognitive radio network consisting of single primary and multiple secondary transmitters/receivers. In addition, there are several relay nodes that help both primary and secondary networks. We propose a new beamforming method for rel…
▽ More
In this work, we study the problem of relay beamforming for an underlay cognitive radio relay network using amplify-and-forward (AF) relaying technique. We consider a cognitive radio network consisting of single primary and multiple secondary transmitters/receivers. In addition, there are several relay nodes that help both primary and secondary networks. We propose a new beamforming method for relays' transmission, in which the beamforming weights at the relays are the solution of an optimization problem. The objective of the optimization problem is maximizing the worst case signal to interference plus noise ratio (SINR) at the secondary receivers (SU-RX) and the constraints are the interference power on the primary receiver (PU-RX) and the total transmitted power of the relay nodes. We show that the beamforming problem can be reformulated as a second order cone programming (SOCP) problem and solved by the bisection search method. Our simulation results show the performance improvement as a function of the size and the power of the relay network.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Dynamic Resource Allocation and Activity Management for Energy Efficiency and Fairness in Heterogeneous Networks
Authors:
Amir Behrouzi-Far,
Ezhan Karasan
Abstract:
Higher energy consumption of Heterogeneous Networks (HetNet), compared to Macro Only Networks (MONET), raises a great concern about the energy efficiency of HetNets. In this work we study a dynamic activation strategy, which changes the state of small cells between Active and Idle according to the dynamically changing user traffic, in order to increase the energy efficiency of HetNets. Moreover, w…
▽ More
Higher energy consumption of Heterogeneous Networks (HetNet), compared to Macro Only Networks (MONET), raises a great concern about the energy efficiency of HetNets. In this work we study a dynamic activation strategy, which changes the state of small cells between Active and Idle according to the dynamically changing user traffic, in order to increase the energy efficiency of HetNets. Moreover, we incorporate dynamic inter-tier bandwidth allocation to our model. The proposed Dynamic Bandwidth Allocation and Dynamic Activation (DBADA) strategy is applied in cell-edge deployment of small cells, where HotSpot regions are located far from the master base station. Our objective is to maximize the sum utility of the network with minimum energy consumption. To ensure proportional fairness among users, we used logarithmic utility function. To evaluate the performance of the proposed strategy, the median, 10-percentile and the sum of users' data rates and the network energy consumption are evaluated by simulation. Our simulation results shows that the DBADA strategy improves the energy consumed per unit of users' data rate by up to $25\%$. It also achieves lower energy consumption by at least $25\%$, compared to always active scenario for small cells.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Evaluating Load Balancing Performance in Distributed Storage with Redundancy
Authors:
Mehmet Fatih Aktas,
Amir Behrouzi-Far,
Emina Soljanin,
Philip Whiting
Abstract:
To facilitate load balancing, distributed systems store data redundantly. We evaluate the load balancing performance of storage schemes in which each object is stored at $d$ different nodes, and each node stores the same number of objects. In our model, the load offered for the objects is sampled uniformly at random from all the load vectors with a fixed cumulative value. We find that the load bal…
▽ More
To facilitate load balancing, distributed systems store data redundantly. We evaluate the load balancing performance of storage schemes in which each object is stored at $d$ different nodes, and each node stores the same number of objects. In our model, the load offered for the objects is sampled uniformly at random from all the load vectors with a fixed cumulative value. We find that the load balance in a system of $n$ nodes improves multiplicatively with $d$ as long as $d = o\left(\log(n)\right)$, and improves exponentially once $d = Θ\left(\log(n)\right)$. We show that the load balance improves in the same way with $d$ when the service choices are created with XOR's of $r$ objects rather than object replicas. In such redundancy schemes, storage overhead is reduced multiplicatively by $r$. However, recovery of an object requires downloading content from $r$ nodes. At the same time, the load balance increases additively by $r$. We express the system's load balance in terms of the maximal spacing or maximum of $d$ consecutive spacings between the ordered statistics of uniform random variables. Using this connection and the limit results on the maximal $d$-spacings, we derive our main results.
△ Less
Submitted 22 January, 2021; v1 submitted 13 October, 2019;
originally announced October 2019.
-
Redundancy Scheduling in Systems with Bi-Modal Job Service Time Distribution
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
Queuing systems with redundant requests have drawn great attention because of their promise to reduce the job completion time and variability. Despite a large body of work on the topic, we are still far from fully understanding the benefits of redundancy in practice. We here take one step towards practical systems by studying queuing systems with bi-modal job service time distribution. Such distri…
▽ More
Queuing systems with redundant requests have drawn great attention because of their promise to reduce the job completion time and variability. Despite a large body of work on the topic, we are still far from fully understanding the benefits of redundancy in practice. We here take one step towards practical systems by studying queuing systems with bi-modal job service time distribution. Such distributions have been observed in practice, as can be seen in, e.g., Google cluster traces. We develop an analogy to a classical urns and balls problem, and use it to study the queuing time performance of two non-adaptive classical scheduling policies: random and round-robin. We introduce new performance indicators in the analogous model, and argue that they are good predictors of the queuing time in non-adaptive scheduling policies. We then propose a non-adaptive scheduling policy that is based on combinatorial designs, and show that it has better performance indicators. Simulations confirm that the proposed scheduling policy, as the performance indicators suggest, reduces the queuing times compared to random and round-robin scheduling.
△ Less
Submitted 4 October, 2019; v1 submitted 6 August, 2019;
originally announced August 2019.
-
On the Average Maximal Number of Balls in a Bin Resulting from Throwing r Balls into n Bins T times
Authors:
Amir Behrouzi-Far,
Doron Zeilberger
Abstract:
We use the holonomic ansatz to estimate the asymptotic behavior, in $T$, of the average maximal number of balls in a bin that is obtained when one throws uniformly at random (without replacement) $r$ balls into $n$ bins, $T$ times. Our approach works, in principle, for any fixed $n$ and $r$. We were able to do the cases $(n,r)$ = $(2,1),(3,1),(4,1), (4,2)$, but things get too complicated for large…
▽ More
We use the holonomic ansatz to estimate the asymptotic behavior, in $T$, of the average maximal number of balls in a bin that is obtained when one throws uniformly at random (without replacement) $r$ balls into $n$ bins, $T$ times. Our approach works, in principle, for any fixed $n$ and $r$. We were able to do the cases $(n,r)$ = $(2,1),(3,1),(4,1), (4,2)$, but things get too complicated for larger values of $n$ and $r$. We are pledging a \$150 donation to the OEIS for an explicit expression, (in terms of $n$, $r$, and $π$) for the constant $C_{n,r}$ such that that average equals $\frac{n}{r}\,T+C_{n,r} \sqrt{T}+O(1/\sqrt{T})$. In this version we announce that the problem has been solved (to the extent possible) by Marcus Michelen.
△ Less
Submitted 23 May, 2019; v1 submitted 19 May, 2019;
originally announced May 2019.
-
On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers
Authors:
Amir Behrouzi-Far,
Emina Soljanin
Abstract:
We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed num…
▽ More
We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed number of computing tasks that are split in possibly overlap** batches, and independent exponentially distributed machine service times. We show that, for such systems, the uniform replication of non- overlap** (disjoint) batches of computing tasks achieves the minimum expected computing time.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.