Search | arXiv e-print repository

Porting HPC Applications to AMD Instinct$^\text{TM}$ MI300A Using Unified Memory and OpenMP

Authors: Suyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya

Abstract: AMD Instinct$^\text{TM}$ MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC$^\text{TM}$ cores and third generation CDNA$^\text{TM}$ compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and… ▽ More AMD Instinct$^\text{TM}$ MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC$^\text{TM}$ cores and third generation CDNA$^\text{TM}$ compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and allows an incremental acceleration of applications, iii) is easy to maintain, and iv) its potential can be well realized via the abstractions in the OpenMP 5.2 standard, where the host and the device data environments can be unified in a more performant way. In this article, we provide a blueprint of the APU programming model leveraging unified memory and highlight key distinctions compared to the conventional approach with discrete GPUs. OpenFOAM, an open-source C++ library for computational fluid dynamics, is presented as a case study to emphasize the flexibility and ease of offloading a full-scale production-ready application on MI300 APUs using directive-based OpenMP programming. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted paper at ISC High Performance 2024

arXiv:2310.01586 [pdf, other]

Experiences Readying Applications for Exascale

Authors: Paul T. Bauman, Reuben D. Budiardja, Dmytro Bykov, Noel Chalmers, Jacqueline Chen, Nicholas Curtis, Marc Day, Markus Eisenbach, Lucas Esclapez, Alessandro Fanfarillo, William Freitag, Nicholas Frontiere, Antigoni Georgiadou, Joseph Glenski, Kalyana Gottiparthi, Marc T. Henry de Frahan, Gustav R. Jansen, Wayne Joubert, Justin G. Lietz, Jakub Kurzak, Nicholas Malaya, Bronson Messer, Damon McDougall, Paul Mullowney, Stephen Nichols , et al. (7 additional authors not shown)

Abstract: The advent of exascale computing invites an assessment of existing best practices for develo** application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programm… ▽ More The advent of exascale computing invites an assessment of existing best practices for develo** application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programmability, tuning, and portability considerations that are key to moving applications from existing systems to future installations. A set of representative workloads provides case studies for general system and software testing. We evaluate the use of early access systems for development across several generations of hardware. Finally, we discuss how best practices were identified and disseminated to the community through a wide range of activities including user-guides and trainings. We conclude with recommendations for ensuring application readiness on future leadership computing systems. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted at SC23

arXiv:2307.12679 [pdf, other]

An Estimator for the Sensitivity to Perturbations of Deep Neural Networks

Authors: Naman Maheshwari, Nicholas Malaya, Scott Moe, Jaydeep P. Kulkarni, Sudhanva Gurumurthi

Abstract: For Deep Neural Networks (DNNs) to become useful in safety-critical applications, such as self-driving cars and disease diagnosis, they must be stable to perturbations in input and model parameters. Characterizing the sensitivity of a DNN to perturbations is necessary to determine minimal bit-width precision that may be used to safely represent the network. However, no general result exists that i… ▽ More For Deep Neural Networks (DNNs) to become useful in safety-critical applications, such as self-driving cars and disease diagnosis, they must be stable to perturbations in input and model parameters. Characterizing the sensitivity of a DNN to perturbations is necessary to determine minimal bit-width precision that may be used to safely represent the network. However, no general result exists that is capable of predicting the sensitivity of a given DNN to round-off error, noise, or other perturbations in input. This paper derives an estimator that can predict such quantities. The estimator is derived via inequalities and matrix norms, and the resulting quantity is roughly analogous to a condition number for the entire neural network. An approximation of the estimator is tested on two Convolutional Neural Networks, AlexNet and VGG-19, using the ImageNet dataset. For each of these networks, the tightness of the estimator is explored via random perturbations and adversarial attacks. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Actual work and paper concluded in January 2019

arXiv:2203.14154 [pdf, other]

NUNet: Deep Learning for Non-Uniform Super-Resolution of Turbulent Flows

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: Deep Learning (DL) algorithms are becoming increasingly popular for the reconstruction of high-resolution turbulent flows (aka super-resolution). However, current DL approaches perform spatially uniform super-resolution - a key performance limiter for scalability of DL-based surrogates for Computational Fluid Dynamics (CFD). To address the above challenge, we introduce NUNet, a deep learning-bas… ▽ More Deep Learning (DL) algorithms are becoming increasingly popular for the reconstruction of high-resolution turbulent flows (aka super-resolution). However, current DL approaches perform spatially uniform super-resolution - a key performance limiter for scalability of DL-based surrogates for Computational Fluid Dynamics (CFD). To address the above challenge, we introduce NUNet, a deep learning-based adaptive mesh refinement (AMR) framework for non-uniform super-resolution of turbulent flows. NUNet divides the input low-resolution flow field into patches, scores each patch, and predicts their target resolution. As a result, it outputs a spatially non-uniform flow field, adaptively refining regions of the fluid domain to achieve the target accuracy. We train NUNet with Reynolds-Averaged Navier-Stokes (RANS) solutions from three different canonical flows, namely turbulent channel flow, flat plate, and flow around ellipses. NUNet shows remarkable discerning properties, refining areas with complex flow features, such as near-wall domains and the wake region in flow around solid bodies, while leaving areas with smooth variations (such as the freestream) in the low-precision range. Hence, NUNet demonstrates an excellent qualitative and quantitative alignment with the traditional OpenFOAM AMR solver. Moreover, it reaches the same convergence guarantees as the AMR solver while accelerating it by 3.2-5.5x, including unseen-during-training geometries and boundary conditions, demonstrating its generalization capacities. Due to NUNet's ability to super-resolve only regions of interest, it predicts the same target 1024x1024 spatial resolution 7-28.5x faster than state-of-the-art DL methods and reduces the memory usage by 4.4-7.65x, showcasing improved scalability. △ Less

Submitted 26 March, 2022; originally announced March 2022.

arXiv:2108.07667 [pdf, other]

SURFNet: Super-resolution of Turbulent Flows with Transfer Learning using Small Datasets

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: Deep Learning (DL) algorithms are emerging as a key alternative to computationally expensive CFD simulations. However, state-of-the-art DL approaches require large and high-resolution training data to learn accurate models. The size and availability of such datasets are a major limitation for the development of next-generation data-driven surrogate models for turbulent flows. This paper introduces… ▽ More Deep Learning (DL) algorithms are emerging as a key alternative to computationally expensive CFD simulations. However, state-of-the-art DL approaches require large and high-resolution training data to learn accurate models. The size and availability of such datasets are a major limitation for the development of next-generation data-driven surrogate models for turbulent flows. This paper introduces SURFNet, a transfer learning-based super-resolution flow network. SURFNet primarily trains the DL model on low-resolution datasets and transfer learns the model on a handful of high-resolution flow problems - accelerating the traditional numerical solver independent of the input size. We propose two approaches to transfer learning for the task of super-resolution, namely one-shot and incremental learning. Both approaches entail transfer learning on only one geometry to account for fine-grid flow fields requiring 15x less training data on high-resolution inputs compared to the tiny resolution (64x256) of the coarse model, significantly reducing the time for both data collection and training. We empirically evaluate SURFNet's performance by solving the Navier-Stokes equations in the turbulent regime on input resolutions up to 256x larger than the coarse model. On four test geometries and eight flow configurations unseen during training, we observe a consistent 2-2.1x speedup over the OpenFOAM physics solver independent of the test geometry and the resolution size (up to 2048x2048), demonstrating both resolution-invariance and generalization capabilities. Our approach addresses the challenge of reconstructing high-resolution solutions from coarse grid models trained using low-resolution inputs (super-resolution) without loss of accuracy and requiring limited computational resources. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2011.15103 [pdf, other]

Automating Artifact Detection in Video Games

Authors: Parmida Davarmanesh, Kuanhao Jiang, Tingting Ou, Artem Vysogorets, Stanislav Ivashkevich, Max Kiehn, Shantanu H. Joshi, Nicholas Malaya

Abstract: In spite of advances in gaming hardware and software, gameplay is often tainted with graphics errors, glitches, and screen artifacts. This proof of concept study presents a machine learning approach for automated detection of graphics corruptions in video games. Based on a sample of representative screen corruption examples, the model was able to identify 10 of the most commonly occurring screen a… ▽ More In spite of advances in gaming hardware and software, gameplay is often tainted with graphics errors, glitches, and screen artifacts. This proof of concept study presents a machine learning approach for automated detection of graphics corruptions in video games. Based on a sample of representative screen corruption examples, the model was able to identify 10 of the most commonly occurring screen artifacts with reasonable accuracy. Feature representation of the data included discrete Fourier transforms, histograms of oriented gradients, and graph Laplacians. Various combinations of these features were used to train machine learning models that identify individual classes of graphics corruptions and that later were assembled into a single mixed experts "ensemble" classifier. The ensemble classifier was tested on heldout test sets, and produced an accuracy of 84% on the games it had seen before, and 69% on games it had never seen before. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2005.04485 [pdf, other]

doi 10.1145/3392717.3392772

CFDNet: a deep learning-based accelerator for fluid simulations

Authors: Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran

Abstract: CFD is widely used in physical system design and optimization, where it is used to predict engineering quantities of interest, such as the lift on a plane wing or the drag on a motor vehicle. However, many systems of interest are prohibitively expensive for design optimization, due to the expense of evaluating CFD simulations. To render the computation tractable, reduced-order or surrogate models… ▽ More CFD is widely used in physical system design and optimization, where it is used to predict engineering quantities of interest, such as the lift on a plane wing or the drag on a motor vehicle. However, many systems of interest are prohibitively expensive for design optimization, due to the expense of evaluating CFD simulations. To render the computation tractable, reduced-order or surrogate models are used to accelerate simulations while respecting the convergence constraints provided by the higher-fidelity solution. This paper introduces CFDNet -- a physical simulation and deep learning coupled framework, for accelerating the convergence of Reynolds Averaged Navier-Stokes simulations. CFDNet is designed to predict the primary physical properties of the fluid including velocity, pressure, and eddy viscosity using a single convolutional neural network at its core. We evaluate CFDNet on a variety of use-cases, both extrapolative and interpolative, where test geometries are observed/not-observed during training. Our results show that CFDNet meets the convergence constraints of the domain-specific physics solver while outperforming it by 1.9 - 7.4x on both steady laminar and turbulent flows. Moreover, we demonstrate the generalization capacity of CFDNet by testing its prediction on new geometries unseen during training. In this case, the approach meets the CFD convergence criterion while still providing significant speedups over traditional domain-only models. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Comments: It has been accepted and almost published in the International Conference in Supercomputing (ICS) 2020

arXiv:1611.07521 [pdf, other]

The QUESO Library, User's Manual

Authors: Kemelli C. Estacio-Hiroms, Ernesto E. Prudencio, Nicholas P. Malaya, Manav Vohra, Damon McDougall

Abstract: QUESO stands for Quantification of Uncertainty for Estimation, Simulation and Optimization and consists of algorithms and C++ classes intended for research in uncertainty quantification, including the solution of statistical inverse problem and statistical forward problems, the validation of mathematical models under uncertainty, and the prediction of quantities of interest from such models along… ▽ More QUESO stands for Quantification of Uncertainty for Estimation, Simulation and Optimization and consists of algorithms and C++ classes intended for research in uncertainty quantification, including the solution of statistical inverse problem and statistical forward problems, the validation of mathematical models under uncertainty, and the prediction of quantities of interest from such models along with quantification of their uncertainties. QUESO is designed for flexibility, portability, ease of use and ease of extension. Its software design follows an object oriented approach and its code is written on C++ and over MPI. It can run over uniprocessor and multiprocessor environments. QUESO contains two forms of documentation: a user's manual available in PDF format and a lower level code documentation available in web based/HTML format. The present document is a user's manual which provides an overview of the capabilities of QUESO, procedures for software execution, and includes example studies. △ Less

Submitted 21 November, 2016; originally announced November 2016.

arXiv:1507.00398 [pdf, other]

The Parallel C++ Statistical Library for Bayesian Inference: QUESO

Authors: Damon McDougall, Nicholas Malaya, Robert D. Moser

Abstract: The Parallel C++ Statistical Library for the Quantification of Uncertainty for Estimation, Simulation and Optimization, Queso, is a collection of statistical algorithms and programming constructs supporting research into the quantification of uncertainty of models and their predictions. Queso is primarily focused on solving statistical inverse problems using Bayes's theorem, which expresses a dist… ▽ More The Parallel C++ Statistical Library for the Quantification of Uncertainty for Estimation, Simulation and Optimization, Queso, is a collection of statistical algorithms and programming constructs supporting research into the quantification of uncertainty of models and their predictions. Queso is primarily focused on solving statistical inverse problems using Bayes's theorem, which expresses a distribution of possible values for a set of uncertain parameters (the posterior distribution) in terms of the existing knowledge of the system (the prior) and noisy observations of a physical process, represented by a likelihood distribution. The posterior distribution is not often known analytically, and so requires computational methods. It is typical to compute probabilities and moments from the posterior distribution, but this is often a high-dimensional object and standard Reimann-type methods for quadrature become prohibitively expensive. The approach Queso takes in this regard is to rely on Markov chain Monte Carlo (MCMC) methods which are well suited to evaluating quantities such as probabilities and moments of high-dimensional probability distributions. Queso's intended use is as tool to assist and facilitate coupling uncertainty quantification to a specific application called a forward problem. While many libraries presently exist that solve Bayesian inference problems, Queso is a specialized piece of software primarily designed to solve such problems by utilizing parallel environments demanded by large-scale forward problems. Queso is written in C++, uses MPI, and utilizes libraries already available to the scientific community. △ Less

Submitted 1 July, 2015; originally announced July 2015.

arXiv:1311.0828 [pdf, ps, other]

doi 10.1063/1.4866813

Estimating Uncertainties in Statistics Computed from DNS

Authors: Todd A. Oliver, Nicholas Malaya, Rhys Ulerich, Robert D. Moser

Abstract: Rigorous assessment of uncertainty is crucial to the utility of DNS results. Uncertainties in the computed statistics arise from two sources: finite statistical sampling and the discretization of the Navier-Stokes equations. Due to the presence of non-trivial sampling error, standard techniques for estimating discretization error (such as Richardson extrapolation) fail or are unreliable. This work… ▽ More Rigorous assessment of uncertainty is crucial to the utility of DNS results. Uncertainties in the computed statistics arise from two sources: finite statistical sampling and the discretization of the Navier-Stokes equations. Due to the presence of non-trivial sampling error, standard techniques for estimating discretization error (such as Richardson extrapolation) fail or are unreliable. This work provides a systematic and unified approach for estimating these errors. First, a sampling error estimator that accounts for correlation in the input data is developed. Then, this sampling error estimate is used as part of a Bayesian extension of Richardson extrapolation in order to characterize the discretization error. These methods are tested using the Lorenz equations and are shown to perform well. These techniques are then used to investigate the sampling and discretization errors in the DNS of a wall-bounded turbulent flow. For both cases, it is found that while the sampling uncertainty is large enough to make the order of accuracy difficult to determine, the estimated discretization errors are quite small. This indicates that the commonly used heuristics provide ad- equate resolution for this class of problems. However, it is also found that, for some quantities, the discretization error is not small relative to sampling error, indicating that the conventional wisdom that sampling error dominates discretization error for this class of simulations needs to be reevaluated. △ Less

Submitted 4 November, 2013; originally announced November 2013.

Comments: 42 Pages, 17 Figures, Submitted to Physics of Fluids

Showing 1–10 of 10 results for author: Malaya, N