Search | arXiv e-print repository

Agent-based Constraint Solving for Resource Allocation in Manycore Systems

Authors: Volker Wenzel, Lars Bauer, Wolfgang Schröder-Preikschat, Jörg Henkel

Abstract: For efficiency reasons, manycore systems are increasingly heterogeneous, which makes the map** of complex workloads a key problem with a high optimization potential. Constraints express the application requirements like which core type to choose, how many cores to choose, exclusively or non-exclusively, using a certain core, etc. In this work, we propose a decentralized solution for solving appl… ▽ More For efficiency reasons, manycore systems are increasingly heterogeneous, which makes the map** of complex workloads a key problem with a high optimization potential. Constraints express the application requirements like which core type to choose, how many cores to choose, exclusively or non-exclusively, using a certain core, etc. In this work, we propose a decentralized solution for solving application resource constraints by means of an agent-based approach in order to obtain scalability. We translate the constraints into a Distributed Constraint Optimization Problem (DCOP) and propose a local search algorithm RESMGM to solve them. For the first time, we demonstrate the viability and efficiency of the DCOP approach for heterogeneous manycore systems. Our RESMGM algorithm supports a far wider range of constraints than state-of-the-art, leading to superior results, but still has comparable overheads w.r.t. computation and communication. △ Less

Submitted 13 April, 2022; originally announced April 2022.

arXiv:2202.09365 [pdf, other]

doi 10.1109/SBESC53686.2021.9628358

Migration-Based Synchronization

Authors: Stefan Reif, Phillip Raffeck, Luis Gerhorst, Wolfgang Schröder-Preikschat, Timo Hönig

Abstract: A fundamental challenge in multi- and many-core systems is the correct execution of concurrent access to shared data. A common drawback from existing synchronization mechanisms is the loss of data locality as the shared data is transferred between the accessing cores. In real-time systems, this is especially important as knowledge about data access times is crucial to establish bounds on execution… ▽ More A fundamental challenge in multi- and many-core systems is the correct execution of concurrent access to shared data. A common drawback from existing synchronization mechanisms is the loss of data locality as the shared data is transferred between the accessing cores. In real-time systems, this is especially important as knowledge about data access times is crucial to establish bounds on execution times and guarantee the meeting of deadlines.We propose in this paper a refinement of our previously sketched approach of Migration-Based Synchronization (MBS) as well as its first practical implementation. The core concept of MBS is the replacement of data migration with control-flow migration to achieve synchronized memory accesses with guaranteed data locality. This leads to both shorter and more predictable execution times for critical sections. As MBS can be used as a substitute for classical locks, it can be employed in legacy applications without code alterations.We further examine how the gained data locality improves the results of worst-case timing analyses and results in tighter bounds on execution and response time. We reason about the similarity of MBS to existing synchronization approaches and how it enables us to reuse existing analysis techniques.Finally, we evaluate our prototype implementation, showing that MBS can exploit data locality with similar overheads as traditional locking mechanisms. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Journal ref: SBESC'21: Proceedings of the XI Brazilian Symposium on Computing Systems Engineering. 2021. IEEE, Pages 1-8

arXiv:2201.13160 [pdf, other]

doi 10.1145/3477113.3487267

AnyCall: Fast and Flexible System-Call Aggregation

Authors: Luis Gerhorst, Benedict Herzog, Stefan Reif, Wolfgang Schröder-Preikschat, Timo Hönig

Abstract: Operating systems rely on system calls to allow the controlled communication of isolated processes with the kernel and other processes. Every system call includes a processor mode switch from the unprivileged user mode to the privileged kernel mode. Although processor mode switches are the essential isolation mechanism to guarantee the system's integrity, they induce direct and indirect performanc… ▽ More Operating systems rely on system calls to allow the controlled communication of isolated processes with the kernel and other processes. Every system call includes a processor mode switch from the unprivileged user mode to the privileged kernel mode. Although processor mode switches are the essential isolation mechanism to guarantee the system's integrity, they induce direct and indirect performance costs as they invalidate parts of the processor state. In recent years, high-performance networks and storage hardware has made the user/kernel transition overhead the bottleneck for IO-heavy applications. To make matters worse, security vulnerabilities in modern processors (e.g., Meltdown) have prompted kernel mitigations that further increase the transition overhead. To decouple system calls from user/kernel transitions we propose AnyCall, which uses an in-kernel compiler to execute safety-checked user bytecode in kernel mode. This allows for very fast system calls interleaved with error checking and processing logic using only a single user/kernel transition. We have implemented AnyCall based on the Linux kernel's eBPF subsystem. Our evaluation demonstrates that system call bursts are up to 55 times faster using AnyCall and that real-world applications can be sped up by 24% even if only a minimal part of their code is run by AnyCall. △ Less

Submitted 31 January, 2022; originally announced January 2022.

Journal ref: PLOS'21: Proceedings of the 11th Workshop on Programming Languages and Operating Systems. 2021. Association for Computing Machinery (ACM), New York, NY, USA, Pages 1-8

arXiv:1905.11788 [pdf, other]

doi 10.1145/3314206.3314211

$Δ$elta: Differential Energy-Efficiency, Latency, and Timing Analysis for Real-Time Networks

Authors: Stefan Reif, Andreas Schmidt, Timo Hönig, Thorsten Herfet, Wolfgang Schröder-Preikschat

Abstract: The continuously increasing degree of automation in many areas (e.g. manufacturing engineering, public infrastructure) lead to the construction of cyber-physical systems and cyber-physical networks. To both, time and energy are the most critical operating resources. Considering for instance the Tactile Internet specification, end-to-end latencies in these systems must be below 1ms, which means tha… ▽ More The continuously increasing degree of automation in many areas (e.g. manufacturing engineering, public infrastructure) lead to the construction of cyber-physical systems and cyber-physical networks. To both, time and energy are the most critical operating resources. Considering for instance the Tactile Internet specification, end-to-end latencies in these systems must be below 1ms, which means that both communication and system latencies are in the same order of magnitude and must be predictably low. As control loops are commonly handled over different variants of network infrastructure (e.g. mobile and fibre links) particular attention must be payed to the design of reliable, yet fast and energy-efficient data-transmission channels that are robust towards unexpected transmission failures. As design goals are often conflicting (e.g. high performance vs. low energy), it is necessary to analyze and investigate trade-offs with regards to design decisions during the construction of cyber-physical networks. In this paper, we present $Δ$elta, an approach towards a tool-supported construction process for cyber-physical networks. $Δ$elta extends the previously presented X-Lap tool by new analysis features, but keeps the original measurements facilities unchanged. $Δ$elta jointly analyzes and correlates the runtime behavior (i.e. performance, latency) and energy demand of individual system components. It provides an automated analysis with precise thread-local time interpolation, control-flow extraction, and examination of latency criticality. We further demonstrate the applicability of $Δ$elta with an evaluation of a prototypical implementation. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1808.06434 [pdf, other]

doi 10.1145/3267419.3267422

X-Lap: A Systems Approach for Cross-Layer Profiling and Latency Analysis for Cyber-Physical Networks

Authors: Stefan Reif, Andreas Schmidt, Timo Hönig, Thorsten Herfet, Wolfgang Schröder-Preikschat

Abstract: Networked control applications for cyber-physical networks demand predictable and reliable real-time communication. Applications of this domain have to cooperate with network protocols, the operating system, and the hardware to improve safety properties and increase resource efficiency. In consequence, a cross-layer approach is necessary for the design and holistic optimisation of cyber-physical s… ▽ More Networked control applications for cyber-physical networks demand predictable and reliable real-time communication. Applications of this domain have to cooperate with network protocols, the operating system, and the hardware to improve safety properties and increase resource efficiency. In consequence, a cross-layer approach is necessary for the design and holistic optimisation of cyber-physical systems and networks. This paper presents X-Lap, a cross-layer, inter-host timing analysis tool tailored to the needs of real-time communication. We use X-Lap to evaluate the timing behaviour of a reliable real-time communication protocol. Our analysis identifies parts of the protocol which are responsible for unwanted jitter. To system designers, X-Lap provides useful support for the design and evaluation of networked real-time systems. △ Less

Submitted 20 August, 2018; originally announced August 2018.

arXiv:1502.07451 [pdf]

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

Authors: Hao Wu, Daniel Lohmann, Wolfgang Schröder-Preikschat

Abstract: In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to define task dependencie… ▽ More In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within a computer. A data-flow programming model is attractive in this setting for its ease of expressing concurrency. Programmers only need to define task dependencies without considering how to schedule them on the hardware. However, map** the resulting task graph onto hardware efficiently remains a challenge. In this paper, we propose a graph-partition scheduling policy for map** data-flow workloads to heterogeneous hardware. According to our experiments, our graph-partition-based scheduling achieves comparable performance to conventional queue-base approaches. △ Less

Submitted 26 February, 2015; originally announced February 2015.

Comments: Presented at DATE Friday Workshop on Heterogeneous Architectures and Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241)

Report number: DATEHIS/2015/05

arXiv:1304.6067 [pdf, ps, other]

Invasive Computing - Common Terms and Granularity of Invasion

Authors: Jürgen Teich, Wolfgang Schröder-Preikschat, Andreas Herkersdorf

Abstract: Future MPSoCs with 1000 or more processor cores on a chip require new means for resource-aware programming in order to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal problems. On the other hand, predictable program executions are threatened if not impossible if no proper means of resource isolation and exclusive use may be est… ▽ More Future MPSoCs with 1000 or more processor cores on a chip require new means for resource-aware programming in order to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal problems. On the other hand, predictable program executions are threatened if not impossible if no proper means of resource isolation and exclusive use may be established on demand. In view of these problems and menaces, invasive computing enables an application programmer to claim for processing resources and spread computations to claimed processors dynamically at certain points of the program execution. Such decisions may be depending on the degree of application parallelism and the state of the underlying resources such as utilization, load, and temperature, but also with the goal to provide predictable program execution on MPSoCs by claiming processing resources exclusively as the default and thus eliminating interferences and creating the necessary isolation between multiple concurrently running applications. For achieving this goal, invasive computing introduces new programming constructs for resource-aware programming that meanwhile, for testing purpose, have been embedded into the parallel computing language X10 as developed by IBM using a library-based approach. This paper presents major ideas and common terms of invasive computing as investigated by the DFG Transregional Collaborative Research Centre TR89. Moreoever, a reflection is given on the granularity of resources that may be requested by invasive programs. △ Less

Submitted 22 April, 2013; originally announced April 2013.

Report number: DPA-13052

Showing 1–7 of 7 results for author: Schröder-Preikschat, W