Search | arXiv e-print repository

Performance bottlenecks detection through microarchitectural sensitivity

Authors: Hugo Pompougnac, Alban Dutilleul, Christophe Guillon, Nicolas Derumigny, Fabrice Rastello

Abstract: Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of det… ▽ More Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of detecting them. Techniques that measure resources utilization can offer a good understanding of a program's execution, but, due to the constraints inherent to Performance Monitoring Units (PMU) of CPUs, do not provide the relevant metrics for each use case. Another approach is to rely on a performance model to simulate the CPU behavior. Such a model makes it possible to implement any new microarchitecture-related metric. Within this framework, we advocate for implementing modeled resources as parameters that can be varied at will to reveal performance bottlenecks. This allows a generalization of bottleneck analysis that we call sensitivity analysis. We present Gus, a novel performance analysis tool that combines the advantages of sensitivity analysis and dynamic binary instrumentation within a resource-centric CPU model. We evaluate the impact of sensitivity on bottleneck analysis over a set of high-performance computing kernels. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14567 [pdf, other]

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

Authors: Théophile Bastian, Hugo Pompougnac, Alban Dutilleul, Fabrice Rastello

Abstract: A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled s… ▽ More A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled solution to evaluate code analyzers on C-level benchmarks composed of a benchmark derivation procedure that feeds an evaluation harness. We conclude that memory-carried data dependencies are a major source of imprecision for these tools. We tackle this issue with staticdeps, a static analyzer extracting memory-carried data dependencies, including across loop iterations, from an assembly basic block. We integrate its output to uiCA, a state-of-the-art code analyzer, to evaluate staticdeps' impact on a code analyzer's precision through CesASMe. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2311.16977 [pdf, ps, other]

Bidirectional Reactive Programming for Machine Learning

Authors: Dumitru Potop Butucaru, Albert Cohen, Gordon Plotkin, Hugo Pompougnac

Abstract: Reactive languages are dedicated to the programming of systems which interact continuously and concurrently with their environment. Values take the form of unbounded streams modeling the (discrete) passing of time or the sequence of concurrent interactions. While conventional reactivity models recurrences forward in time, we introduce a symmetric reactive construct enabling backward recurrences. C… ▽ More Reactive languages are dedicated to the programming of systems which interact continuously and concurrently with their environment. Values take the form of unbounded streams modeling the (discrete) passing of time or the sequence of concurrent interactions. While conventional reactivity models recurrences forward in time, we introduce a symmetric reactive construct enabling backward recurrences. Constraints on the latter allow to make the implementation practical. Machine Learning (ML) systems provide numerous motivations for all of this: we demonstrate that reverse-mode automatic differentiation, backpropagation, batch normalization, bidirectional recurrent neural networks, training and reinforcement learning algorithms, are all naturally captured as bidirectional reactive programs. △ Less

Submitted 28 November, 2023; originally announced November 2023.

ACM Class: D.3; D.3.1; I.2; I.2.5

Showing 1–3 of 3 results for author: Pompougnac, H