-
PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler
Authors:
Gianpietro Consolaro,
Zhen Zhang,
Harenome Razanajato,
Nelson Lossing,
Nassim Tchoulak,
Adilla Susungi,
Artur Cesar Araujo Alves,
Renwei Zhang,
Denis Barthou,
Corinne Ancourt,
Cedric Bastoul
Abstract:
Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and Tensor Scheduler have been proposed, each of them targeting a different architecture, parallelism model, or application scenario. The need for scenario-specific…
▽ More
Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and Tensor Scheduler have been proposed, each of them targeting a different architecture, parallelism model, or application scenario. The need for scenario-specific optimization is growing due to the heterogeneity of architectures. One of the most critical cases is represented by NPUs (Neural Processing Units) used for AI, which may require loop optimization with different objectives. Another factor to be considered is the framework or compiler in which polyhedral optimization takes place. Different scenarios, depending on the target architecture, compilation environment, and application domain, may require different kinds of optimization to best exploit the architecture feature set.
We introduce a new configurable polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations. This scheduler allows the creation of diverse scheduling strategies that can be both scenario-specific (like state-of-the-art schedulers) and kernel-specific, breaking the concept of a one-size-fits-all scheduler approach. PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore AKG deep learning compiler. Experimental results in different scenarios show good performance: a geomean speedup of 7.66x on MindSpore (for the NPU Ascend architecture) hybrid custom operators over isl scheduling, a geomean speedup up to 1.80x on PolyBench on different multicore architectures over Pluto scheduling. Finally, some comparisons with different state-of-the-art tools are presented in the PolyMage scenario.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
A DSEL for High Throughput and Low Latency Software-Defined Radio on Multicore CPUs
Authors:
Adrien Cassagne,
Romain Tajan,
Olivier Aumage,
Camille Leroux,
Denis Barthou,
Christophe Jégo
Abstract:
This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables th…
▽ More
This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables the combination of pipelining and sequence duplication techniques to extract both temporal and spatial parallelism from digital communication systems. We leverage the DSEL capabilities on a real use case: a fully digital transceiver for the widely used DVB-S2 standard designed entirely in software. Through evaluation, we show how proposed software DVB-S2 transceiver is able to get the most from modern, high-end multicore CPU targets.
△ Less
Submitted 3 August, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Automated Code Generation for Lattice Quantum Chromodynamics and beyond
Authors:
Denis Barthou,
Olivier Brand-Foissac,
Romain Dolbeau,
Gilbert Grosdidier,
Christina Eisenbeis,
Michael Kruse,
Olivier Pene,
Konstantin Petrov,
Claude Tadonki
Abstract:
We present here our ongoing work on a Domain Specific Language which aims to simplify Monte-Carlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The tool-chain, called Qiral, is used to produce high-performance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the genera…
▽ More
We present here our ongoing work on a Domain Specific Language which aims to simplify Monte-Carlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The tool-chain, called Qiral, is used to produce high-performance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the generated code to the well-established simulation software is also made.
△ Less
Submitted 9 January, 2014;
originally announced January 2014.
-
QIRAL: A High Level Language for Lattice QCD Code Generation
Authors:
Denis Barthou,
Gilbert Grosdidier,
Michael Kruse,
Olivier Pène,
Claude Tadonki
Abstract:
Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at mod- eling the strong nuclear force, which is responsible for the interactions of nuclear particles. Lattice QCD (LQCD) is the corresponding discrete formulation, widely used for simula- tions. The computational demand for the LQCD is tremendous. It has played a role in the history of supercomputers, and has also helped de…
▽ More
Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at mod- eling the strong nuclear force, which is responsible for the interactions of nuclear particles. Lattice QCD (LQCD) is the corresponding discrete formulation, widely used for simula- tions. The computational demand for the LQCD is tremendous. It has played a role in the history of supercomputers, and has also helped defining their future. Designing efficient LQCD codes that scale well on large (probably hybrid) supercomputers requires to express many levels of parallelism, and then to explore different algorithmic solutions. While al- gorithmic exploration is the key for efficient parallel codes, the process is hampered by the necessary coding effort. We present in this paper a domain-specific language, QIRAL, for a high level expression of parallel algorithms in LQCD. Parallelism is expressed through the mathematical struc- ture of the sparse matrices defining the problem. We show that from these expressions and from algorithmic and preconditioning formulations, a parallel code can be automatically generated. This separates algorithms and mathematical formulations for LQCD (that be- long to the field of physics) from the effective orchestration of parallelism, mainly related to compilation and optimization for parallel architectures.
△ Less
Submitted 16 August, 2012;
originally announced August 2012.