-
Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types
Authors:
Vincenzo Gulisano,
Yiannis Nikolakopoulos,
Daniel Cederman,
Marina Papatriantafilou,
Philippas Tsigas
Abstract:
Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data structures act as articulation points and maintain the state of data streaming operators, potentially supporting high parallelism and balancing the work between them.…
▽ More
Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data structures act as articulation points and maintain the state of data streaming operators, potentially supporting high parallelism and balancing the work between them. Prompted by this fact, in this work we study and analyze parallelization needs of these articulation points, focusing on the problem of streaming multiway aggregation, where large data volumes are received from multiple input streams. The analysis of the parallelization needs, as well as of the use and limitations of existing aggregate designs and their data structures, leads us to identify needs for proper shared objects that can achieve low-latency and high throughput multiway aggregation. We present the requirements of such objects as abstract data types and we provide efficient lock-free linearizable algorithmic implementations of them, along with new multiway aggregate algorithmic designs that leverage them, supporting both deterministic order-sensitive and order-insensitive aggregate functions. Furthermore, we point out future directions that open through these contributions. The paper includes an extensive experimental study, based on a variety of aggregation continuous queries on two large datasets extracted from SoundCloud, a music social network, and from a Smart Grid network. In all the experiments, the proposed data structures and the enhanced aggregate operators improved the processing performance significantly, up to one order of magnitude, in terms of both throughput and latency, over the commonly-used techniques based on queues.
△ Less
Submitted 15 June, 2016;
originally announced June 2016.
-
Data Structures for Task-based Priority Scheduling
Authors:
Martin Wimmer,
Daniel Cederman,
Francesco Versaci,
Jesper Larsson Träff,
Philippas Tsigas
Abstract:
Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide…
▽ More
Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide any guarantees for task-ordering in-between threads. Next, we present a centralized priority data structure based on $k$-fifo queues, which provides strong (but still relaxed with regard to a sequential specification) guarantees. The parameter $k$ allows to dynamically configure the trade-off between scalability and the required ordering guarantee. Third, and finally, we combine both data structures into a hybrid, $k$-priority data structure, which provides scalability similar to the work-stealing based approach for larger $k$, while giving strong ordering guarantees for smaller $k$. We argue for using the hybrid data structure as the best compromise for generic, priority-based task-scheduling.
We analyze the behavior and trade-offs of our data structures in the context of a simple parallelization of Dijkstra's single-source shortest path algorithm. Our theoretical analysis and simulations show that both the centralized and the hybrid $k$-priority based data structures can give strong guarantees on the useful work performed by the parallel Dijkstra algorithm. We support our results with experimental evidence on an 80-core Intel Xeon system.
△ Less
Submitted 9 December, 2013;
originally announced December 2013.
-
Configurable Strategies for Work-stealing
Authors:
Martin Wimmer,
Daniel Cederman,
Jesper Larsson Träff,
Philippas Tsigas
Abstract:
Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual task execution order is typically determined by the underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing t…
▽ More
Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual task execution order is typically determined by the underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing task parallel executions by providing information on specific tasks and their preferred execution order to the scheduling system.
We introduce scheduling strategies to enable applications to dynamically provide hints to the task-scheduling system on the nature of specific tasks. Scheduling strategies can be used to independently control both local task execution order as well as steal order. In contrast to conventional scheduling policies that are normally global in scope, strategies allow the scheduler to apply optimizations on individual tasks. This flexibility greatly improves composability as it allows the scheduler to apply different, specific scheduling choices for different parts of applications simultaneously. We present a number of benchmarks that highlight diverse, beneficial effects that can be achieved with scheduling strategies. Some benchmarks (branch-and-bound, single-source shortest path) show that prioritization of tasks can reduce the total amount of work compared to standard work-stealing execution order. For other benchmarks (triangle strip generation) qualitatively better results can be achieved in shorter time. Other optimizations, such as dynamic merging of tasks or stealing of half the work, instead of half the tasks, are also shown to improve performance. Composability is demonstrated by examples that combine different strategies, both within the same kernel (prefix sum) as well as when scheduling multiple kernels (prefix sum and unbalanced tree search).
△ Less
Submitted 28 May, 2013;
originally announced May 2013.
-
Lock-free Concurrent Data Structures
Authors:
Daniel Cederman,
Anders Gidenstam,
Phuong Ha,
Håkan Sundell,
Marina Papatriantafilou,
Philippas Tsigas
Abstract:
Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respecti…
▽ More
Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.
△ Less
Submitted 12 February, 2013;
originally announced February 2013.
-
Supporting Lock-Free Composition of Concurrent Data Objects
Authors:
Daniel Cederman,
Philippas Tsigas
Abstract:
Lock-free data objects offer several advantages over their blocking counterparts, such as being immune to deadlocks and convoying and, more importantly, being highly concurrent. But they share a common disadvantage in that the operations they provide are difficult to compose into larger atomic operations while still guaranteeing lock-freedom. We present a lock-free methodology for composing high…
▽ More
Lock-free data objects offer several advantages over their blocking counterparts, such as being immune to deadlocks and convoying and, more importantly, being highly concurrent. But they share a common disadvantage in that the operations they provide are difficult to compose into larger atomic operations while still guaranteeing lock-freedom. We present a lock-free methodology for composing highly concurrent linearizable objects together by unifying their linearization points. This makes it possible to relatively easily introduce atomic lock-free move operations to a wide range of concurrent objects. Experimental evaluation has shown that the operations originally supported by the data objects keep their performance behavior under our methodology.
△ Less
Submitted 2 October, 2009;
originally announced October 2009.