Search | arXiv e-print repository

Data Structures for Task-based Priority Scheduling

Authors: Martin Wimmer, Daniel Cederman, Francesco Versaci, Jesper Larsson Träff, Philippas Tsigas

Abstract: Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide… ▽ More Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide any guarantees for task-ordering in-between threads. Next, we present a centralized priority data structure based on $k$-fifo queues, which provides strong (but still relaxed with regard to a sequential specification) guarantees. The parameter $k$ allows to dynamically configure the trade-off between scalability and the required ordering guarantee. Third, and finally, we combine both data structures into a hybrid, $k$-priority data structure, which provides scalability similar to the work-stealing based approach for larger $k$, while giving strong ordering guarantees for smaller $k$. We argue for using the hybrid data structure as the best compromise for generic, priority-based task-scheduling. We analyze the behavior and trade-offs of our data structures in the context of a simple parallelization of Dijkstra's single-source shortest path algorithm. Our theoretical analysis and simulations show that both the centralized and the hybrid $k$-priority based data structures can give strong guarantees on the useful work performed by the parallel Dijkstra algorithm. We support our results with experimental evidence on an 80-core Intel Xeon system. △ Less

Submitted 9 December, 2013; originally announced December 2013.

arXiv:1310.7453 [pdf, other]

doi 10.1109/ICPADS.2013.40

OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks

Authors: Francesco Versaci

Abstract: We present a new, deadlock-free, routing scheme for toroidal interconnection networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which exploits non-minimal links, both in the source and in the destination nodes. When minimal links are congested, OFR deroutes packets to carefully chosen intermediate destinations, in order to obtain travel paths which are only an additive constant l… ▽ More We present a new, deadlock-free, routing scheme for toroidal interconnection networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which exploits non-minimal links, both in the source and in the destination nodes. When minimal links are congested, OFR deroutes packets to carefully chosen intermediate destinations, in order to obtain travel paths which are only an additive constant longer than the shortest ones. Since routing performance is very sensitive to changes in the traffic model or in the router parameters, an accurate discrete-event simulator of the toroidal network has been developed to empirically validate OFR, by comparing it against other relevant routing strategies, over a range of typical real-world traffic patterns. On the 16x16x16 (4096 nodes) simulated network OFR exhibits improvements of the maximum sustained throughput between 14% and 114%, with respect to Adaptive Bubble Routing. △ Less

Submitted 28 October, 2013; originally announced October 2013.

Comments: 9 pages, 5 figures, to be presented at ICPADS 2013

arXiv:1201.3778 [pdf, other]

doi 10.1007/978-3-642-31125-3_1

Processor Allocation for Optimistic Parallelization of Irregular Programs

Authors: Francesco Versaci, Keshav **ali

Abstract: Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amou… ▽ More Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this problem. Our approach is based on a construct called the conflict graph, which (i) provides insight into the amount of parallelism that can be extracted from an irregular algorithm, and (ii) can be used to address the processor allocation problem for irregular algorithms. We show that this problem is related to a generalization of the unfriendly seating problem and, by extending Turán's theorem, we obtain a worst-case class of problems for optimistic parallelization, which we use to derive a lower bound on the exploitable parallelism. Finally, using some theoretically derived properties and some experimental facts, we design a quick and stable control strategy for solving the processor allocation problem heuristically. △ Less

Submitted 27 June, 2012; v1 submitted 18 January, 2012; originally announced January 2012.

Comments: 12 pages, 3 figures, extended version of SPAA 2011 brief announcement

Journal ref: LNCS 7333/2012

arXiv:1109.6643 [pdf, other]

doi 10.1016/j.tcs.2013.01.016

Optimal Eviction Policies for Stochastic Address Traces

Authors: Gianfranco Bilardi, Francesco Versaci

Abstract: The eviction problem for memory hierarchies is studied for the Hidden Markov Reference Model (HMRM) of the memory trace, showing how miss minimization can be naturally formulated in the optimal control setting. In addition to the traditional version assuming a buffer of fixed capacity, a relaxed version is also considered, in which buffer occupancy can vary and its average is constrained. Resortin… ▽ More The eviction problem for memory hierarchies is studied for the Hidden Markov Reference Model (HMRM) of the memory trace, showing how miss minimization can be naturally formulated in the optimal control setting. In addition to the traditional version assuming a buffer of fixed capacity, a relaxed version is also considered, in which buffer occupancy can vary and its average is constrained. Resorting to multiobjective optimization, viewing occupancy as a cost rather than as a constraint, the optimal eviction policy is obtained by composing solutions for the individual addressable items. This approach is then specialized to the Least Recently Used Stack Model (LRUSM), a type of HMRM often considered for traces, which includes V-1 parameters, where V is the size of the virtual space. A gain optimal policy for any target average occupancy is obtained which (i) is computable in time O(V) from the model parameters, (ii) is optimal also for the fixed capacity case, and (iii) is characterized in terms of priorities, with the name of Least Profit Rate (LPR) policy. An O(log C) upper bound (being C the buffer capacity) is derived for the ratio between the expected miss rate of LPR and that of OPT, the optimal off-line policy; the upper bound is tightened to O(1), under reasonable constraints on the LRUSM parameters. Using the stack-distance framework, an algorithm is developed to compute the number of misses incurred by LPR on a given input trace, simultaneously for all buffer capacities, in time O(log V) per access. Finally, some results are provided for miss minimization over a finite horizon and over an infinite horizon under bias optimality, a criterion more stringent than gain optimality. △ Less

Submitted 25 October, 2013; v1 submitted 29 September, 2011; originally announced September 2011.

Comments: 37 pages, 3 figures

Journal ref: Theoretical Computer Science, Volume 514, 25 November 2013, Pages 36-60

Showing 1–4 of 4 results for author: Versaci, F