Search | arXiv e-print repository

arXiv:2009.13569 [pdf, other]

Engineering In-place (Shared-memory) Sorting Algorithms

Authors: Michael Axtmann, Sascha Witt, Daniel Ferizovic, Peter Sanders

Abstract: We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably c… ▽ More We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our comparison-based algorithm, In-place Superscalar Samplesort (IPS$^4$o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based competitor by almost a factor of three. IPS$^4$o also outperforms the best comparison-based competitors in the in-place or not in-place, parallel or sequential settings. IPS$^4$o even outperforms the best integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new in-place radix sorter turns out to be the best algorithm. Claims to have the, in some sense, "best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on extensive experiments involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. △ Less

Submitted 3 February, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

ACM Class: F.2.2

arXiv:1905.10902 [pdf, other]

Engineering Kernelization for Maximum Cut

Authors: Damir Ferizovic, Demian Hespe, Sebastian Lamm, Matthias Mnich, Christian Schulz, Darren Strash

Abstract: Kernelization is a general theoretical framework for preprocessing instances of NP-hard problems into (generally smaller) instances with bounded size, via the repeated application of data reduction rules. For the fundamental Max Cut problem, kernelization algorithms are theoretically highly efficient for various parameterizations. However, the efficacy of these reduction rules in practice---to aid… ▽ More Kernelization is a general theoretical framework for preprocessing instances of NP-hard problems into (generally smaller) instances with bounded size, via the repeated application of data reduction rules. For the fundamental Max Cut problem, kernelization algorithms are theoretically highly efficient for various parameterizations. However, the efficacy of these reduction rules in practice---to aid solving highly challenging benchmark instances to optimality---remains entirely unexplored. We engineer a new suite of efficient data reduction rules that subsume most of the previously published rules, and demonstrate their significant impact on benchmark data sets, including synthetic instances, and data sets from the VLSI and image segmentation application domains. Our experiments reveal that current state-of-the-art solvers can be sped up by up to multiple orders of magnitude when combined with our data reduction rules. On social and biological networks in particular, kernelization enables us to solve four instances that were previously unsolved in a ten-hour time limit with state-of-the-art solvers; three of these instances are now solved in less than two seconds. △ Less

Submitted 26 May, 2019; originally announced May 2019.

Comments: 16 pages, 4 tables, 2 figures

ACM Class: F.2.2; G.2.2

arXiv:1705.02257 [pdf, other]

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

Authors: Michael Axtmann, Sascha Witt, Daniel Ferizovic, Peter Sanders

Abstract: We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we sh… ▽ More We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we show how to eliminate the recursion stack. Extensive experiments show that our algorithm IPS$^4$o scales well on a variety of multi-core machines. We outperform our closest in-place competitor by a factor of up to 3. Even as a sequential algorithm, we are up to 1.5 times faster than the closest sequential competitor, BlockQuicksort. △ Less

Submitted 29 June, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

ACM Class: F.2.2

arXiv:1612.05665 [pdf, other]

doi 10.1145/3178487.3178509

PAM: Parallel Augmented Maps

Authors: Yihan Sun, Daniel Ferizovic, Guy E. Blelloch

Abstract: Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks. We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a par… ▽ More Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks. We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice. As examples of the use of the interface and the performance of PAM, we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading. △ Less

Submitted 26 March, 2018; v1 submitted 16 December, 2016; originally announced December 2016.

Journal ref: Yihan Sun, Daniel Ferizovic, and Guy E. Belloch. 2018. PAM: parallel augmented maps. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). ACM, New York, NY, USA, 290-304

arXiv:1602.02120 [pdf, other]

doi 10.1145/2935764.2935768

Parallel Ordered Sets Using Join

Authors: Guy Blelloch, Daniel Ferizovic, Yihan Sun

Abstract: The ordered set is one of the most important data type in both theoretical algorithm design and analysis and practical programming. In this paper we study the set operations on two ordered sets, including Union, Intersect and Difference, based on four types of balanced Binary Search Trees (BST) including AVL trees, red-black trees, weight balanced trees and treaps. We introduced only one subroutin… ▽ More The ordered set is one of the most important data type in both theoretical algorithm design and analysis and practical programming. In this paper we study the set operations on two ordered sets, including Union, Intersect and Difference, based on four types of balanced Binary Search Trees (BST) including AVL trees, red-black trees, weight balanced trees and treaps. We introduced only one subroutine Join that needs to be implemented differently for each balanced BST, and on top of which we can implement generic, simple and efficient parallel functions for ordered sets. We first prove the work-efficiency of these Join-based set functions using a generic proof working for all the four types of balanced BSTs. We also implemented and tested our algorithm on all the four balancing schemes. Interestingly the implementations on all four data structures and three set functions perform similarly in time and speedup (more than 45x on 64 cores). We also compare the performance of our implementation to other existing libraries and algorithms. △ Less

Submitted 12 November, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

Showing 1–5 of 5 results for author: Ferizovic, D