Search | arXiv e-print repository

HPX -- An open source C++ Standard Library for Parallelism and Concurrency

Authors: Thomas Heller, Patrick Diehl, Zachary Byerly, John Biddiscombe, Hartmut Kaiser

Abstract: To achieve scalability with today's heterogeneous HPC resources, we need a dramatic shift in our thinking; MPI+X is not enough. Asynchronous Many Task (AMT) runtime systems break down the global barriers imposed by the Bulk Synchronous Programming model. HPX is an open-source, C++ Standards compliant AMT runtime system that is developed by a diverse international community of collaborators called… ▽ More To achieve scalability with today's heterogeneous HPC resources, we need a dramatic shift in our thinking; MPI+X is not enough. Asynchronous Many Task (AMT) runtime systems break down the global barriers imposed by the Bulk Synchronous Programming model. HPX is an open-source, C++ Standards compliant AMT runtime system that is developed by a diverse international community of collaborators called The Ste||ar Group. HPX provides features which allow application developers to naturally use key design patterns, such as overlap** communication and computation, decentralizing of control flow, oversubscribing execution resources and sending work to data instead of data to work. The Ste||ar Group comprises physicists, engineers, and computer scientists; men and women from many different institutions and affiliations, and over a dozen different countries. We are committed to advancing the development of scalable parallel applications by providing a platform for collaborating and exchanging ideas. In this paper, we give a detailed description of the features HPX provides and how they help achieve scalability and programmability, a list of applications of HPX including two large NSF funded collaborations (STORM, for storm surge forecasting; and STAR (OctoTiger) an astro-physics project which runs at 96.8% parallel efficiency on 643,280 cores), and we end with a description of how HPX and the Ste||ar Group fit into the open source community. △ Less

Submitted 11 August, 2023; originally announced January 2024.

Journal ref: Proceedings of OpenSuCo 2017, Denver, Colorado USA, November 2017 (OpenSuCo 17)

arXiv:2301.13723 [pdf, ps, other]

p-median location interdiction on trees

Authors: Lena Leiß, Till Heller, Luca E. Schäfer, Manuel Streicher, Stefan Ruzika

Abstract: In p-median location interdiction the aim is to find a subset of edges in a graph, such that the objective value of the p-median problem in the same graph without the selected edges is as large as possible. We prove that this problem is NP-hard even on acyclic graphs. Restricting the problem to trees with unit lengths on the edges, unit interdiction costs, and a single edge interdiction, we prov… ▽ More In p-median location interdiction the aim is to find a subset of edges in a graph, such that the objective value of the p-median problem in the same graph without the selected edges is as large as possible. We prove that this problem is NP-hard even on acyclic graphs. Restricting the problem to trees with unit lengths on the edges, unit interdiction costs, and a single edge interdiction, we provide an algorithm which solves the problem in polynomial time. Furthermore, we investigate path graphs with unit and arbitrary lengths. For the former case, we present an algorithm, where multiple edges can get interdicted. Furthermore, for the latter case, we present a method to compute an optimal solution for one interdiction step which can also be extended to multiple interdicted edges. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: 24 pages, 8 figures

MSC Class: 90B80; 05C05; 68Q25

arXiv:2206.06302 [pdf, other]

doi 10.1007/978-3-319-46079-6_2

Closing the Performance Gap with Modern C++

Authors: Thomas Heller, Hartmut Kaiser, Patrick Diehl, Dietmar Fey, Marc Alexander Schweitzer

Abstract: On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today's heterogeneous systems often include two or more completely distinct and incompatible hardware… ▽ More On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today's heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU's, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlap** programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need for a uniform, higher-level abstraction and programming model for parallelism in the C++ standard targeting heterogeneous and distributed computing. Such an abstraction should perfectly blend with existing, already standardized language and library features, but should also be generic enough to support future hardware developments. In this paper, we present the results from develo** such a higher-level programming abstraction for parallelism in C++ which aims at enabling code and performance portability over a wide range of architectures and for various types of parallelism. We present and compare performance data obtained from running the well-known STREAM benchmark ported to our higher level C++ abstraction with the corresponding results from running it natively. We show that our abstractions enable performance at least as good as the comparable base-line benchmarks while providing a uniform programming API on all compared target architectures. △ Less

Submitted 30 May, 2022; originally announced June 2022.

arXiv:2201.05515 [pdf, ps, other]

On Reward-Penalty-Selection Games

Authors: Niklas Gräf, Till Heller, Sven O. Krumke

Abstract: The Reward-Penalty-Selection Problem (RPSP) can be seen as a combination of the Set Cover Problem (SCP) and the Hitting Set Problem (HSP). Given a set of elements, a set of reward sets, and a set of penalty sets, one tries to find a subset of elements such that as many reward sets as possible are covered, i.e. all elements are contained in the subset, and at the same time as few penalty sets as po… ▽ More The Reward-Penalty-Selection Problem (RPSP) can be seen as a combination of the Set Cover Problem (SCP) and the Hitting Set Problem (HSP). Given a set of elements, a set of reward sets, and a set of penalty sets, one tries to find a subset of elements such that as many reward sets as possible are covered, i.e. all elements are contained in the subset, and at the same time as few penalty sets as possible are hit, i.e. the intersection of the subset with the penalty set is non-empty. In this paper we define a cooperative game based on the RPSP where the elements of the RPSP are the players. We prove structural results and show that RPS games are convex, superadditive and totally balanced. Furthermore, the Shapley value can be computed in polynomial time. In addition to that, we provide a characterization of the core elements as a feasible flow in a network graph depending on the instance of the underlying RPSP. By using this characterization, a core element can be computed efficiently. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: 15 pages, 1 figure

MSC Class: 91A68; 90C27

arXiv:2106.15389 [pdf, ps, other]

Computing the egalitarian allocation with network flows

Authors: T. Heller, S. O. Krumke

Abstract: In a combinatorial exchange setting, players place sell (resp. buy) bids on combinations of traded goods. Besides the question of finding an optimal selection of winning bids, the question of how to share the obtained profit is of high importance. The egalitarian allocation is a well-known solution concept of profit sharing games which tries to distribute profit among players in a most equal way w… ▽ More In a combinatorial exchange setting, players place sell (resp. buy) bids on combinations of traded goods. Besides the question of finding an optimal selection of winning bids, the question of how to share the obtained profit is of high importance. The egalitarian allocation is a well-known solution concept of profit sharing games which tries to distribute profit among players in a most equal way while respecting individual contributions to the obtained profit. Given a set of winning bids, we construct a special network graph and show that every flow in said graph corresponds to a core payment. Furthermore, we show that the egalitarian allocation can be characterized as an almost equal maximum flow which is a maximum flow with the additional property that the difference of flow value on given edge sets is bounded by a constant. With this, we are able to compute the egalitarian allocation in polynomial time. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 7 pages, 1 figure

MSC Class: 05C21; 91A12

arXiv:2106.14601 [pdf, ps, other]

The Reward-Penalty-Selection Problem

Authors: T. Heller, S. O. Krumke, K. -H. Küfer

Abstract: The Set Cover Problem (SCP) and the Hitting Set Problem (HSP) are well-studied optimization problems. In this paper we introduce the Reward-Penalty-Selection Problem (RPSP) which can be understood as a combination of the SCP and the HSP where the objectives of both problems are contrary to each other. Applications of the RPSP can be found in the context of combinatorial exchanges in order to solve… ▽ More The Set Cover Problem (SCP) and the Hitting Set Problem (HSP) are well-studied optimization problems. In this paper we introduce the Reward-Penalty-Selection Problem (RPSP) which can be understood as a combination of the SCP and the HSP where the objectives of both problems are contrary to each other. Applications of the RPSP can be found in the context of combinatorial exchanges in order to solve the corresponding winner determination problem. We give complexity results for the minimization and the maximization problem as well as for several variants with additional restrictions. Further, we provide an algorithm that runs in polynomial time for the special case of laminar sets and a dynamic programming approach for the case where the instance can be represented by a tree or a graph with bounded tree-width. We further present a graph theoretical generalization of this problem and results regarding its complexity. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 24 pages, 8 figures, 1 table

MSC Class: 90C27; 68Q25

arXiv:2106.14442 [pdf, ps, other]

On the Connection between Individual Scaled Vickrey Payments and the Egalitarian Allocation

Authors: N. Gräf, T. Heller, S. O. Krumke

Abstract: The Egalitarian Allocation (EA) is a well-known profit sharing method for cooperative games which attempts to distribute profit among participants in a most equal way while respecting the individual contributions to the obtained profit. Despite having desirable properties from the viewpoint of game theory like being contained in the core, the EA is in general hard to compute. Another well-known me… ▽ More The Egalitarian Allocation (EA) is a well-known profit sharing method for cooperative games which attempts to distribute profit among participants in a most equal way while respecting the individual contributions to the obtained profit. Despite having desirable properties from the viewpoint of game theory like being contained in the core, the EA is in general hard to compute. Another well-known method is given by Vickrey Payments (VP). Again, the VP have desirable properties like coalitional rationality, the VP do not fulfill budget balance in general and, thus, are not contained in the core in general. One attempt to overcome this shortcoming is to scale down the VP. This can be done by a unique scaling factor, or, by individual scaling factors. Now, the individual scaled Vickrey Payments (ISV) are computed by maximizing the scaling factors lexicographically. In this paper we show that the ISV payments are in fact identical to a weighted EA, thus exhibiting an interesting connection between EA and VP. With this, we conclude the uniqueness of the ISV payments and provide a polynomial time algorithm for computing a special weighted EA. △ Less

Submitted 2 September, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 12 pages

MSC Class: 91A12

arXiv:2104.05288 [pdf, ps, other]

Algorithms and Complexity for the Almost Equal Maximum Flow Problem

Authors: Rebekka Haese, Till Heller, Sven O. Krumke

Abstract: In the Equal Maximum Flow Problem (EMFP), we aim for a maximum flow where we require the same flow value on all edges in some given subsets of the edge set. In this paper, we study the closely related Almost Equal Maximum Flow Problems (AEMFP) where the flow values on edges of one homologous edge set differ at most by the valuation of a so called deviation function~$Δ$. We prove that the integer a… ▽ More In the Equal Maximum Flow Problem (EMFP), we aim for a maximum flow where we require the same flow value on all edges in some given subsets of the edge set. In this paper, we study the closely related Almost Equal Maximum Flow Problems (AEMFP) where the flow values on edges of one homologous edge set differ at most by the valuation of a so called deviation function~$Δ$. We prove that the integer almost equal maximum flow problem (integer AEMFP) is in general $\mathcal{NP}$-complete, and show that even the problem of finding a fractional maximum flow in the case of convex deviation functions is also $\mathcal{NP}$-complete. This is in contrast to the EMFP, which is polynomial time solvable in the fractional case. We provide inapproximability results for the integral AEMFP. For the integer AEMFP we state a polynomial algorithm for the constant deviation and concave case for a fixed number of homologous sets. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Journal ref: Operations Research Proceedings 2019. Springer, Cham, 2020. 323-329

arXiv:1810.11482 [pdf, other]

doi 10.1109/ESPM2.2018.00006

Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX)

Authors: Patrick Diehl, Madhavan Seshadri, Thomas Heller, Hartmut Kaiser

Abstract: Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for dist… ▽ More Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous invocation of CUDA kernels on this data. Both operations are well integrated into the general programming model of HPX which allows to seamlessly overlap any GPU operation with work on the main cores. Any user defined CUDA kernel can be launched on any (local or remote) GPU device available to the distributed application. We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph. Using this approach we can combine all remotely and locally available acceleration cards on a cluster to utilize its full performance capabilities. Overhead measurements show, that the integration of the asynchronous operations (data transfer + launches of the kernels) as part of the HPX execution graph imposes no additional computational overhead and significantly eases orchestrating coordinated and concurrent work on the main cores and the used GPU devices. △ Less

Submitted 26 October, 2018; originally announced October 2018.

arXiv:1705.10208 [pdf, ps, other]

Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes

Authors: Kiril Dichev, Herbert Jordan, Konstantinos Tovletoglou, Thomas Heller, Dimitrios S. Nikolopoulos, Georgios Karakonstantis, Charles Gillan

Abstract: With the increase in compute nodes in large compute platforms, a proportional increase in node failures will follow. Many application-based checkpoint/restart (C/R) techniques have been proposed for MPI applications to target the reduced mean time between failures. However, rollback as part of the recovery remains a dominant cost even in highly optimised MPI applications employing C/R techniques.… ▽ More With the increase in compute nodes in large compute platforms, a proportional increase in node failures will follow. Many application-based checkpoint/restart (C/R) techniques have been proposed for MPI applications to target the reduced mean time between failures. However, rollback as part of the recovery remains a dominant cost even in highly optimised MPI applications employing C/R techniques. Continuing execution past a checkpoint (that is, reducing rollback) is possible in message-passing runtimes, but extremely complex to design and implement. Our work focuses on task-based runtimes, where task dependencies are explicit and message passing is implicit. We see an opportunity for reducing rollback for such runtimes: we explore task dependencies in the rollback, which we call dependency-aware rollback. We also design a new C/R technique, which is influenced by recursive decomposition of tasks, and combine it with dependency-aware rollback. We expect the dependency-aware rollback to cancel and recompute less tasks in the presence of node failures. We describe, implement and validate the proposed protocol in a simulator, which confirms these expectations. In addition, we consistently observe faster overall execution time for dependency-aware rollback in the presence of faults, despite the fact that reduced task cancellation does not guarantee reduced overall execution time. △ Less

Submitted 29 May, 2017; originally announced May 2017.

Showing 1–10 of 10 results for author: Heller, T