Skip to main content

Showing 1–28 of 28 results for author: Podobas, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02019  [pdf, other

    cs.NE cs.AR cs.PF

    Fast Algorithms for Spiking Neural Network Simulation with FPGAs

    Authors: Björn A. Lindqvist, Artur Podobas

    Abstract: Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25\% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favora… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 34 pages

  2. arXiv:2401.14576  [pdf

    cs.DC cs.PF

    Accelerating Scientific Application through Transparent I/O Interposition

    Authors: Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, Stefano Markidis, Michio Honda

    Abstract: The ability to handle a large volume of data generated by scientific applications is crucial. We have seen an increase in the heterogeneity of storage technologies available to scientific applications, such as burst buffers, local temporary block storage, managed cloud parallel file systems (PFS), and non-POSIX object stores. However, scientific applications designed for traditional HPC systems ca… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Submitted to HPDC 2024

  3. arXiv:2308.00497  [pdf, other

    cs.MS

    Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries

    Authors: Yifei He, Artur Podobas, Stefano Markidis

    Abstract: FFTc is a Domain-Specific Language (DSL) for designing and generating Fast Fourier Transforms (FFT) libraries. The FFTc uniqueness is that it leverages and extend Multi-Level Intermediate Representation (MLIR) dialects to optimize FFT code generation. In this work, we present FFTc extensions and improvements such as the possibility of using different data layout for complex-value arrays, and spars… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  4. arXiv:2303.01606  [pdf, other

    cs.AR

    Q2Logic: An Coarse-Grained Architecture targeting Schrödinger Quantum Circuit Simulations

    Authors: Artur Podobas

    Abstract: Quantum computing is emerging as an important (but radical) technology that might take us beyond Moore's law for certain applications. Today, in parallel with improving quantum computers, computer scientists are relying heavily on quantum circuit simulators to develop algorithms. Most existing quantum circuit simulators run on general-purpose CPUs or GPUs. However, at the same time, quantum circui… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  5. arXiv:2208.13658  [pdf, other

    physics.comp-ph cs.PF

    Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software

    Authors: Måns I. Andersson, N. Arul Murugan, Artur Podobas, Stefano Markidis

    Abstract: GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during th… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  6. arXiv:2207.06803  [pdf, other

    cs.MS cs.CL

    FFTc: An MLIR Dialect for Develo** HPC Fast Fourier Transform Libraries

    Authors: Yifei He, Artur Podobas, Måns I. Andersson, Stefano Markidis

    Abstract: Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for express… ▽ More

    Submitted 26 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

  7. arXiv:2206.14103  [pdf, other

    cs.DC

    Workflows to driving high-performance interactive supercomputing for urgent decision making

    Authors: Nick Brown, Rupert Nash, Gordon Gibb, Evgenij Belikov, Artur Podobas, Wei Der Chien, Stefano Markidis, Markus Flatken, Andreas Gerndt

    Abstract: Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; na… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Pre-print of paper accepted to the InteractiveHPC workshop of ISC2022

  8. arXiv:2204.02235  [pdf, other

    cs.DC

    At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

    Authors: Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka

    Abstract: Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method… ▽ More

    Submitted 16 October, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

  9. arXiv:2112.00116  [pdf, ps, other

    cs.DC

    A Review on Parallel Virtual Screening Softwares for High Performance Computers

    Authors: Natarajan Arul Murugan, Artur Podobas, Davide Gadioli, Emanuele Vitali, Gianluca Palermo, Stefano Markidis

    Abstract: Drug discovery is the most expensive, time demanding and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high affinity binding and specificity for a target associated with a disease and in addition they should have favorable pharmacodynamic and pharmacokinetic… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: Submitted to Pharmaceuticals, MPDI journal

  10. arXiv:2111.05654  [pdf, other

    cs.DC

    Utilising urgent computing to tackle the spread of mosquito-borne diseases

    Authors: Nick Brown, Rupert Nash, Piero Poletti, Giorgio Guzzetta, Mattia Manica, Agnese Zardini, Markus Flatken, Jules Vidal, Charles Gueunet, Evgenij Belikov, Julien Tierny, Artur Podobas, Wei Der Chien, Stefano Markidis, Andreas Gerndt

    Abstract: It is estimated that around 80\% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causi… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: Preprint of paper in 2021 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC)

  11. arXiv:2109.03592  [pdf, ps, other

    cs.DC

    Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

    Authors: Jonathan Vincent, **g Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson, Dirk Pleiter, Erwin Laure, Philipp Schlatter

    Abstract: We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong… ▽ More

    Submitted 4 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 9 pages, 8 figures. Submitted to HPC-Asia 2022 conference, updated to address reviewers comments

    ACM Class: G.4; J.2; C.1

  12. A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays

    Authors: Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, Stefano Markidis

    Abstract: The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we de… ▽ More

    Submitted 2 November, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 12 pages, 3 figures, 3 tables, Accepted to HPC Asia 2022

    ACM Class: G.4; J.2; C.1

  13. arXiv:2107.06676  [pdf, other

    cs.LG cs.CE cs.DC cs.NE

    Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain

    Authors: Martin Svedin, Artur Podobas, Steven W. D. Chien, Stefano Markidis

    Abstract: One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problem… ▽ More

    Submitted 17 August, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at The 2nd Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S 2021)

  14. arXiv:2107.01243  [pdf

    cs.MS

    Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics

    Authors: Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, Philipp Schlatter

    Abstract: Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. C… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  15. arXiv:2106.05373  [pdf, other

    cs.DC cs.LG cs.NE

    StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

    Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis

    Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART 2021)

  16. arXiv:2106.04979  [pdf

    cs.DC

    Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers

    Authors: Martin Svedin, Steven W. D. Chien, Gibson Chikafa, Niclas Jansson, Artur Podobas

    Abstract: For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non… ▽ More

    Submitted 3 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: 7 pages

  17. arXiv:2103.09683  [pdf, other

    cs.DC

    Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs

    Authors: Felix Liu, Niclas Jansson, Artur Podobas, Albin Fredriksson, Stefano Markidis

    Abstract: Radiation Treatment Planning (RTP) is the process of planning the appropriate external beam radiotherapy to combat cancer in human patients. RTP is a complex and compute-intensive task, which often takes a long time (several hours) to compute. Reducing this time allows for higher productivity at clinics and more sophisticated treatment planning, which can materialize in better treatments. The stat… ▽ More

    Submitted 19 September, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

  18. arXiv:2010.14373  [pdf, other

    cs.DC

    Matrix Engines for High Performance Computing:A Paragon of Performance or Gras** at Straws?

    Authors: Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    Abstract: Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced from the No.1 benchmark in supercomputing, namely High Performance Linpack, one would expect an awakened enthusiasm by the HPC community, too. Hence… ▽ More

    Submitted 27 February, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

  19. arXiv:2010.13463  [pdf

    cs.DC

    High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

    Authors: Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis

    Abstract: Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate… ▽ More

    Submitted 4 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 10 pages, IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

    ACM Class: G.4; J.2; C.1

  20. arXiv:2010.05348  [pdf, other

    physics.comp-ph cs.LG

    Automatic Particle Trajectory Classification in Plasma Simulations

    Authors: Stefano Markidis, Ivy Peng, Artur Podobas, Itthinat Jongsuebchoke, Gabriel Bengtsson, Pawel Herman

    Abstract: Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes. Our overall goal is to provide a gener… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at AI4S: Workshop on Artificial Intelligence and Machine Learning for Scientific Applications

  21. sputniPIC: an Implicit Particle-in-Cell Code for Multi-GPU Systems

    Authors: Steven W. D. Chien, Jonas Nylund, Gabriel Bengtsson, Ivy B. Peng, Artur Podobas, Stefano Markidis

    Abstract: Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes re… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2020)

  22. tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Authors: Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis

    Abstract: Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and al… ▽ More

    Submitted 11 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020)

  23. arXiv:2005.13425  [pdf

    cs.DC

    Optimization of Tensor-product Operations in Nekbone on GPUs

    Authors: Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, Stefano Markidis

    Abstract: In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 4 pages, 4 figures

    ACM Class: G.4; J.2

  24. arXiv:2004.04628  [pdf, other

    cs.DC

    White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

    Authors: Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku

    Abstract: In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which a… ▽ More

    Submitted 11 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Report number: hal-02536316

  25. A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

    Authors: Artur Podobas, Kentaro Sano, Satoshi Matsuoka

    Abstract: With the end of both Dennard's scaling and Moore's law, computer users and researchers are aggressively exploring alternative forms of computing in order to continue the performance scaling that we have come to enjoy. Among the more salient and practical of the post-Moore alternatives are reconfigurable systems, with Coarse-Grained Reconfigurable Architectures (CGRAs) seemingly capable of striking… ▽ More

    Submitted 15 September, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    ACM Class: A.1; B.0; C.1; C.3

    Journal ref: IEEE Access, 2020 (https://ieeexplore.ieee.org/document/9149601)

  26. High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

    Authors: Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

    Abstract: In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance c… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Published at RAW'18: 25th Anniversary of Reconfigurable Architectures Workshop held in conjunction with IPDPS'18

  27. arXiv:1810.09330  [pdf, ps, other

    cs.DC

    Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

    Authors: Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka

    Abstract: Among the (uncontended) common wisdom in High-Performance Computing (HPC) is the applications' need for large amount of double-precision support in hardware. Hardware manufacturers, the TOP500 list, and (rarely revisited) legacy software have without doubt followed and contributed to this view. In this paper, we challenge that wisdom, and we do so by exhaustively comparing a large number of HPC… ▽ More

    Submitted 25 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: IEEE International Parallel and Distributed Processing Symposium 2019

  28. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

    Authors: Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

    Abstract: Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to FPGA on-chip memory. In th… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

    Comments: FPGA '18: 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays