Skip to main content

Showing 1–11 of 11 results for author: Rastello, F

.
  1. arXiv:2404.16443  [pdf, ps, other

    cs.CC cs.DC

    Tightening I/O Lower Bounds through the Hourglass Dependency Pattern

    Authors: Lionel Eyraud-Dubois, Guillaume Iooss, Julien Langou, Fabrice Rastello

    Abstract: When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a given I/O model, scheduling strategies such as loop tiling can reduce the required I/O down to a limit, called the I/O complexity, inherent to the algorithm itsel… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Journal ref: 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '24), Jun 2024, Nantes, France

  2. arXiv:2402.15773  [pdf, other

    cs.PF

    Performance bottlenecks detection through microarchitectural sensitivity

    Authors: Hugo Pompougnac, Alban Dutilleul, Christophe Guillon, Nicolas Derumigny, Fabrice Rastello

    Abstract: Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of det… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  3. arXiv:2402.14567  [pdf, other

    cs.PF

    CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers

    Authors: Théophile Bastian, Hugo Pompougnac, Alban Dutilleul, Fabrice Rastello

    Abstract: A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled s… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  4. arXiv:2012.11473  [pdf, other

    cs.AR cs.PF

    PALMED: Throughput Characterization for Superscalar Architectures -- Extended Version

    Authors: Nicolas Derumigny, Fabian Gruber, Théophile Bastian, Guillaume Iooss, Christophe Guillon, Louis-Noël Pouchet, Fabrice Rastello

    Abstract: In a super-scalar architecture, the scheduler dynamically assigns micro-operations ($μ$OPs) to execution ports. The port map** of an architecture describes how an instruction decomposes into $μ$OPs and lists for each $μ$OP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeated… ▽ More

    Submitted 18 January, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

  5. arXiv:1911.06664  [pdf, other

    cs.CC

    Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

    Authors: Auguste Olivry, Julien Langou, Louis-Noël Pouchet, P. Sadayappan, Fabrice Rastello

    Abstract: For most relevant computation, the energy and time needed for data movement dominates that for performing arithmetic operations on all computing systems today. Hence it is of critical importance to understand the minimal total data movement achievable during the execution of an algorithm. The achieved total data movement for different schedules of an algorithm can vary widely depending on how effi… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  6. On Characterizing the Data Access Complexity of Programs

    Authors: Venmugil Elango, Fabrice Rastello, Louis-Noel Pouchet, J. Ramanujam, P. Sadayappan

    Abstract: Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of develo** lower bounds for data access complexity has be… ▽ More

    Submitted 9 November, 2014; originally announced November 2014.

    ACM Class: F.2; D.2.8

  7. arXiv:1406.0582  [pdf, other

    cs.PL

    A Tiling Perspective for Register Optimization

    Authors: Lukasz Domagala, Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel

    Abstract: Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, inclu… ▽ More

    Submitted 3 June, 2014; originally announced June 2014.

    Report number: RR-8541

    Journal ref: N° RR-8541 (2014)

  8. arXiv:1404.4767  [pdf, other

    cs.DC cs.DS

    On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

    Authors: Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan

    Abstract: Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerabl… ▽ More

    Submitted 18 April, 2014; originally announced April 2014.

    Report number: RR-8522

  9. arXiv:1403.5952  [pdf, other

    cs.PL

    Parameterized Construction of Program Representations for Sparse Dataflow Analyses

    Authors: André Tavares, Benoit Boissinot, Fernando Pereira, Fabrice Rastello

    Abstract: Data-flow analyses usually associate information with control flow regions. Informally, if these regions are too small, like a point between two consecutive statements, we call the analysis dense. On the other hand, if these regions include many such points, then we call it sparse. This paper presents a systematic method to build program representations that support sparse analyses. To pave the wa… ▽ More

    Submitted 21 March, 2014; originally announced March 2014.

  10. arXiv:1401.5024  [pdf, other

    cs.OH

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Authors: Naznin Fauzia, Venmugil Elango, Mahesh Ravishankar, J. Ramanujam, Fabrice Rastello, Atanas Rountev, Louis-Noël Pouchet, P. Sadayappan

    Abstract: Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order… ▽ More

    Submitted 21 December, 2013; originally announced January 2014.

    Comments: Transaction on Architecture and Code Optimization (2014)

  11. On the Complexity of Spill Everywhere under SSA Form

    Authors: Florent Bouchez, Alain Darte, Fabrice Rastello

    Abstract: Compilation for embedded processors can be either aggressive (time consuming cross-compilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open promising directions for the design of new register allocation heuristics for embedded systems and… ▽ More

    Submitted 19 October, 2007; originally announced October 2007.

    Comments: 10 pages

    Journal ref: ACM SIGPLAN Notices Issue 7, Volume 42 (2007) 103 - 112