Skip to main content

Showing 1–6 of 6 results for author: Agullo, E

.
  1. arXiv:2010.13342  [pdf, other

    cs.DC

    Resiliency in Numerical Algorithm Design for Extreme Scale Simulations

    Authors: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik Goeddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Orti , et al. (11 additional authors not shown)

    Abstract: This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to backgr… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: 45 pages, 3 figures, submitted to The International Journal of High Performance Computing Applications

    ACM Class: D.4.5; G.4; G.1; D.4.4

  2. arXiv:1601.07068  [pdf, other

    math.NA

    Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

    Authors: Siegfried Cools, Emrullah Fatih Yetkin, Emmanuel Agullo, Luc Giraud, Wim Vanroose

    Abstract: Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlap** time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipeline… ▽ More

    Submitted 29 November, 2017; v1 submitted 26 January, 2016; originally announced January 2016.

    Comments: 26 pages, 6 figures, 2 tables, 4 algorithms

    MSC Class: 65F10; 65F50; 65G50; 65Y05; 65Y20

    Journal ref: SIAM Journal on Matrix Analysis and Applications, 2017

  3. arXiv:1206.0115  [pdf, other

    cs.DC

    Pipelining the Fast Multipole Method over a Runtime System

    Authors: Emmanuel Agullo, BĂ©ranger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Takahashi Toru

    Abstract: Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task… ▽ More

    Submitted 1 June, 2012; originally announced June 2012.

    Comments: No. RR-7981 (2012)

  4. arXiv:1102.5328  [pdf, ps, other

    cs.DC

    Fully Empirical Autotuned QR Factorization For Multicore Architectures

    Authors: Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanimire Tomov

    Abstract: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures. We show that it is hard to rely on a model, which motivates us to design a fully empirical approach.… ▽ More

    Submitted 25 February, 2011; originally announced February 2011.

    Report number: RR-7526

  5. arXiv:1002.4057  [pdf, ps, other

    cs.MS math.NA

    Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

    Authors: Emmanuel Agullo, Henricus Bouwmeester, Jack Dongarra, Jakub Kurzak, Julien Langou, Lee Rosenberg

    Abstract: The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, and a QR fa… ▽ More

    Submitted 22 February, 2010; originally announced February 2010.

    Comments: 8 pages, extended abstract submitted to VecPar10 on 12/11/09, notification of acceptance received on 02/05/10. See: http://vecpar.fe.up.pt/2010/

  6. QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

    Authors: Emmanuel Agullo, Camille Coti, Jack Dongarra, Thomas Herault, Julien Langou

    Abstract: Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problem… ▽ More

    Submitted 13 December, 2009; originally announced December 2009.

    Comments: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.)