Skip to main content

Showing 1–3 of 3 results for author: Vatai, E

.
  1. arXiv:2204.02235  [pdf, other

    cs.DC

    At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

    Authors: Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka

    Abstract: Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method… ▽ More

    Submitted 16 October, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

  2. arXiv:2010.14373  [pdf, other

    cs.DC

    Matrix Engines for High Performance Computing:A Paragon of Performance or Gras** at Straws?

    Authors: Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    Abstract: Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced from the No.1 benchmark in supercomputing, namely High Performance Linpack, one would expect an awakened enthusiasm by the HPC community, too. Hence… ▽ More

    Submitted 27 February, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

  3. arXiv:1111.3297  [pdf, other

    cs.DS

    Cache optimized linear sieve

    Authors: A. Járai, E. Vatai

    Abstract: Sieving is essential in different number theoretical algorithms. Sieving with large primes violates locality of memory access, thus degrading performance. Our suggestion on how to tackle this problem is to use cyclic data structures in combination with in-place bucket-sort. We present our results on the implementation of the sieve of Eratosthenes, using these ideas, which show that this approach i… ▽ More

    Submitted 14 November, 2011; originally announced November 2011.

    MSC Class: 11Y11; 68W99 ACM Class: F.2.1

    Journal ref: Acta Univ. Sapientiae, Inform. 3,2 (2011) 205--223