Search | arXiv e-print repository

PETSc/TAO Developments for Early Exascale Systems

Authors: Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang

Abstract: The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascal… ▽ More The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, submitted to IJHPCA

MSC Class: 00A69

arXiv:2302.03772 [pdf, ps, other]

A note on the standard diffusion curve of TAP analysis

Authors: Toby Isaac

Abstract: The standard diffusion curve used in models of TAP reactors, as it is usually defined, is numerically unstable for small values. We use a functional equation satisfied by the curve to define a numerically stable way of computing it for all values. The standard diffusion curve used in models of TAP reactors, as it is usually defined, is numerically unstable for small values. We use a functional equation satisfied by the curve to define a numerically stable way of computing it for all values. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 3 pages, 2 figures, 2 code listings

arXiv:2205.09914 [pdf, other]

Robust Expected Information Gain for Optimal Bayesian Experimental Design Using Ambiguity Sets

Authors: **woo Go, Tobin Isaac

Abstract: The ranking of experiments by expected information gain (EIG) in Bayesian experimental design is sensitive to changes in the model's prior distribution, and the approximation of EIG yielded by sampling will have errors similar to the use of a perturbed prior. We define and analyze \emph{robust expected information gain} (REIG), a modification of the objective in EIG maximization by minimizing an a… ▽ More The ranking of experiments by expected information gain (EIG) in Bayesian experimental design is sensitive to changes in the model's prior distribution, and the approximation of EIG yielded by sampling will have errors similar to the use of a perturbed prior. We define and analyze \emph{robust expected information gain} (REIG), a modification of the objective in EIG maximization by minimizing an affine relaxation of EIG over an ambiguity set of distributions that are close to the original prior in KL-divergence. We show that, when combined with a sampling-based approach to estimating EIG, REIG corresponds to a `log-sum-exp' stabilization of the samples used to estimate EIG, meaning that it can be efficiently implemented in practice. Numerical tests combining REIG with variational nested Monte Carlo (VNMC), adaptive contrastive estimation (ACE) and mutual information neural estimation (MINE) suggest that in practice REIG also compensates for the variability of under-sampled estimators. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: The 38th Conference on Uncertainty in Artificial Intelligence, 2022

arXiv:2204.09748 [pdf, other]

Inferring ice sheet damage models from limited observations using CRIKit: the Constitutive Relation Inference Toolkit

Authors: Grant Bruer, Tobin Isaac

Abstract: We examine the prospect of learning ice sheet damage models from observational data. Our approach, implemented in CRIKit (the Constitutive Relation Inference Toolkit), is to model the material time derivative of damage as a frame-invariant neural network, and to optimize the parameters of the model from simulations of the flow of an ice dome. Using the model of Albrecht and Levermann as the ground… ▽ More We examine the prospect of learning ice sheet damage models from observational data. Our approach, implemented in CRIKit (the Constitutive Relation Inference Toolkit), is to model the material time derivative of damage as a frame-invariant neural network, and to optimize the parameters of the model from simulations of the flow of an ice dome. Using the model of Albrecht and Levermann as the ground truth to generate synthetic observations, we measure the difference of optimized neural network models from that model to try to understand how well this process generates models that can then transfer to other ice sheet simulations. The use of so-called "deep-learning" models for constitutive equations, equations of state, sub-grid-scale processes, and other pointwise relations that appear in systems of PDEs has been successful in other disciplines, yet our inference setting has some confounding factors. The first is the type of observations that are available: we compare the quality of the inferred models when the loss of the numerical simulations includes observation misfits throughout the ice, which is unobtainable in real settings, to losses that include only combinations of surface and borehole observations. The second confounding factor is the evolution of damage in an ice sheet, which is advection dominated. The non-local effect of perturbations in a damage models results in loss functions that have both many local minima and many parameter configurations for which the system is unsolvable. Our experience suggests that basic neural networks have several deficiencies that affect the quality of the optimized models. We suggest several approaches to incorporating additional inductive biases into neural networks which may lead to better performance in future work. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: 23 pages with 13 figures, 1 table, and 4 listings

arXiv:2112.02174 [pdf, ps, other]

Unifying the geometric decompositions of full and trimmed polynomial spaces in finite element exterior calculus

Authors: Toby Isaac

Abstract: Arnold, Falk, & Winther, in "Finite element exterior calculus, homological techniques, and applications" (2006), show how to geometrically decompose the full and trimmed polynomial spaces on simplicial elements into direct sums of trace-free subspaces and in "Geometric decompositions and local bases for finite element differential forms" (2009) the same authors give direct constructions of extensi… ▽ More Arnold, Falk, & Winther, in "Finite element exterior calculus, homological techniques, and applications" (2006), show how to geometrically decompose the full and trimmed polynomial spaces on simplicial elements into direct sums of trace-free subspaces and in "Geometric decompositions and local bases for finite element differential forms" (2009) the same authors give direct constructions of extension operators for the same spaces. The two families -- full and trimmed -- are treated separately, using differently defined isomorphisms between each and the other's trace-free subspaces and mutually incompatible extension operators. This work describes a single operator $\mathring{\star}_T$ that unifies the two isomorphisms and also defines a weighted-$L^2$ norm appropriate for defining well-conditioned basis functions and dual-basis functionals for geometric decomposition. This work also describes a single extension operator $\dot{E}_{σ,T}$ that implements geometric decompositions of all differential forms as well as for the full and trimmed polynomial spaces separately. △ Less

Submitted 16 February, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: 21 pages

arXiv:2002.09421 [pdf, other]

Recursive, parameter-free, explicitly defined interpolation nodes for simplices

Authors: Tobin Isaac

Abstract: A rule for constructing interpolation nodes for $n$th degree polynomials on the simplex is presented. These nodes are simple to define recursively from families of 1D node sets, such as the Lobatto-Gauss-Legendre (LGL) nodes. The resulting nodes have attractive properties: they are fully symmetric, they match the 1D family used in construction on the edges of the simplex, and the nodes constructed… ▽ More A rule for constructing interpolation nodes for $n$th degree polynomials on the simplex is presented. These nodes are simple to define recursively from families of 1D node sets, such as the Lobatto-Gauss-Legendre (LGL) nodes. The resulting nodes have attractive properties: they are fully symmetric, they match the 1D family used in construction on the edges of the simplex, and the nodes constructed for the $(d-1)$-simplex are the boundary traces of the nodes constructed for the $d$-simplex. When compared using the Lebesgue constant to other explicit rules for defining interpolation nodes, the nodes recursively constructed from LGL nodes are nearly as good as the "warp & blend" nodes of Warburton in 2D (which, though defined differently, are very similar), and in 3D are better than other known explicit rules by increasing margins for $n > 6$. By that same measure, these recursively defined nodes are not as good as implicitly defined nodes found by optimizing the Lebesgue constant or related functions, but such optimal node sets have yet to be computed for the tetrahedron. A reference python implementation has been distributed as the `recursivenodes` package, but the simplicity of the recursive construction makes them easy to implement. △ Less

Submitted 7 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

arXiv:1802.02976 [pdf, ps, other]

A mixed finite element for weakly-symmetric elasticity

Authors: Tobin Isaac

Abstract: We develop a finite element discretization for the weakly symmetric equations of linear elasticity on tetrahedral meshes. The finite element combines, for $r \geq 0$, discontinuous polynomials of $r$ for the displacement, $H(\mathrm{div})$-conforming polynomials of order $r+1$ for the stress, and $H(\mathrm{curl})$-conforming polynomials of order $r+1$ for the vector representation of the multipli… ▽ More We develop a finite element discretization for the weakly symmetric equations of linear elasticity on tetrahedral meshes. The finite element combines, for $r \geq 0$, discontinuous polynomials of $r$ for the displacement, $H(\mathrm{div})$-conforming polynomials of order $r+1$ for the stress, and $H(\mathrm{curl})$-conforming polynomials of order $r+1$ for the vector representation of the multiplier. We prove that this triplet is stable and has optimal approximation properties. The lowest order case can be combined with inexact quadrature to eliminate the stress and multiplier variables, leaving a compact cell-centered finite volume scheme for the displacement. △ Less

Submitted 8 February, 2018; originally announced February 2018.

Comments: 15 pages

arXiv:1702.08880 [pdf, other]

doi 10.1137/17M1118828

Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

Authors: M. F. Adams, E. Hirvijoki, M. G. Knepley, J. Brown, T. Isaac, R. Mills

Abstract: The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emergin… ▽ More The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emerging architectures with an approach that minimizes memory usage and movement and is suitable for vector processing. The Landau collision integral is vectorized with Intel AVX-512 intrinsics and the solver sustains as much as 22% of the theoretical peak flop rate of the Second Generation Intel Xeon Phi, Knights Landing, processor. △ Less

Submitted 28 February, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Journal ref: SIAM Journal on Scientific Computing, 39 (6), 2017

arXiv:1511.01561 [pdf, other]

Strong Scaling for Numerical Weather Prediction at Petascale with the Atmospheric Model NUMA

Authors: Andreas Müller, Michal A. Kopera, Simone Marras, Lucas C. Wilcox, Tobin Isaac, Francis X. Giraldo

Abstract: Numerical weather prediction (NWP) has proven to be computationally challenging due to its inherent multiscale nature. Currently, the highest resolution NWP models use a horizontal resolution of about 10km. In order to increase the resolution of NWP models highly scalable atmospheric models are needed. The Non-hydrostatic Unified Model of the Atmosphere (NUMA), developed by the authors at the Na… ▽ More Numerical weather prediction (NWP) has proven to be computationally challenging due to its inherent multiscale nature. Currently, the highest resolution NWP models use a horizontal resolution of about 10km. In order to increase the resolution of NWP models highly scalable atmospheric models are needed. The Non-hydrostatic Unified Model of the Atmosphere (NUMA), developed by the authors at the Naval Postgraduate School, was designed to achieve this purpose. NUMA is used by the Naval Research Laboratory, Monterey as the engine inside its next generation weather prediction system NEPTUNE. NUMA solves the fully compressible Navier-Stokes equations by means of high-order Galerkin methods (both spectral element as well as discontinuous Galerkin methods can be used). Mesh generation is done using the p4est library. NUMA is capable of running middle and upper atmosphere simulations since it does not make use of the shallow-atmosphere approximation. This paper presents the performance analysis and optimization of the spectral element version of NUMA. The performance at different optimization stages is analyzed using a theoretical performance model as well as measurements via hardware counters. Machine independent optimization is compared to machine specific optimization using BG/Q vector intrinsics. By using vector intrinsics the main computations reach 1.2 PFlops on the entire machine Mira (12% of the theoretical peak performance). The paper also presents scalability studies for two idealized test cases that are relevant for NWP applications. The atmospheric model NUMA delivers an excellent strong scaling efficiency of 99% on the entire supercomputer Mira using a mesh with 1.8 billion grid points. This allows to run a global forecast of a baroclinic wave test case at 3km uniform horizontal resolution and double precision within the time frame required for operational weather prediction. △ Less

Submitted 8 September, 2016; v1 submitted 4 November, 2015; originally announced November 2015.

Comments: 33 pages, 12 figures, submitted to the International Journal of High-Performance Computing Applications

ACM Class: D.2.8; G.1.8; G.4

arXiv:1508.02470 [pdf, other]

Support for Non-conformal Meshes in PETSc's DMPlex Interface

Authors: Tobin Isaac, Matthew G. Knepley

Abstract: PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---whic… ▽ More PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---which combines a tree structure for subset- superset relationships and a "reference tree" describing the types of non-conformal interfaces---allows finite element code written for conformal meshes to extend automatically: in particular, all "hanging-node" constraint calculations are handled behind the scenes. We give example code demonstrating the use of this extension, and use it to convert forests of quadtrees and forests of octrees from the p4est library to DMPlex meshes. △ Less

Submitted 10 August, 2015; originally announced August 2015.

Comments: 16 pages, 13 figures, 5 code examples

arXiv:1505.05055 [pdf, other]

Bounds on the number of discontinuities of Morton-type space-filling curves

Authors: Carsten Burstedde, Johannes Holke, Tobin Isaac

Abstract: The Morton- or z-curve is one example for a space filling curve: Given a level of refinement L, it maps the interval [0, 2**dL) one-to-one to a set of d-dimensional cubes of edge length 2**-L that form a subdivision of the unit cube. Similar curves have been proposed for triangular and tetrahedral unit domains. In contrast to the Hilbert curve that is continuous, the Morton-type curves produce jum… ▽ More The Morton- or z-curve is one example for a space filling curve: Given a level of refinement L, it maps the interval [0, 2**dL) one-to-one to a set of d-dimensional cubes of edge length 2**-L that form a subdivision of the unit cube. Similar curves have been proposed for triangular and tetrahedral unit domains. In contrast to the Hilbert curve that is continuous, the Morton-type curves produce jumps. We prove that any contiguous subinterval of the curve divides the domain into a bounded number of face-connected subdomains. For the hypercube case and arbitrary dimension, the subdomains are star-shaped and the bound is indeed two. For the simplicial case in dimensions 2 and 3, the bound is proportional to the depth of refinement L. We supplement the paper with theoretical and computational studies on the frequency of jumps for a quantitative assessment. △ Less

Submitted 20 April, 2017; v1 submitted 24 March, 2015; originally announced May 2015.

Comments: 25 pages, 16 figures, 2 tables: added proofs for triangles and tetrahedra; moved appendices into main document

ACM Class: F.2.2

arXiv:1410.1221 [pdf, other]

doi 10.1016/j.jcp.2015.04.047

Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet

Authors: Tobin Isaac, Noemi Petra, Georg Stadler, Omar Ghattas

Abstract: The majority of research on efficient and scalable algorithms in computational science and engineering has focused on the forward problem: given parameter inputs, solve the governing equations to determine output quantities of interest. In contrast, here we consider the broader question: given a (large-scale) model containing uncertain parameters, (possibly) noisy observational data, and a predict… ▽ More The majority of research on efficient and scalable algorithms in computational science and engineering has focused on the forward problem: given parameter inputs, solve the governing equations to determine output quantities of interest. In contrast, here we consider the broader question: given a (large-scale) model containing uncertain parameters, (possibly) noisy observational data, and a prediction quantity of interest, how do we construct efficient and scalable algorithms to (1) infer the model parameters from the data (the deterministic inverse problem), (2) quantify the uncertainty in the inferred parameters (the Bayesian inference problem), and (3) propagate the resulting uncertain parameters through the model to issue predictions with quantified uncertainties (the forward uncertainty propagation problem)? We present efficient and scalable algorithms for this end-to-end, data-to-prediction process under the Gaussian approximation and in the context of modeling the flow of the Antarctic ice sheet and its effect on sea level. The ice is modeled as a viscous, incompressible, cree**, shear-thinning fluid. The observational data come from InSAR satellite measurements of surface ice flow velocity, and the uncertain parameter field to be inferred is the basal sliding parameter. The prediction quantity of interest is the present-day ice mass flux from the Antarctic continent to the ocean. We show that the work required for executing this data-to-prediction process is independent of the state dimension, parameter dimension, data dimension, and number of processor cores. The key to achieving this dimension independence is to exploit the fact that the observational data typically provide only sparse information on model parameters. This property can be exploited to construct a low rank approximation of the linearized parameter-to-observable map. △ Less

Submitted 1 September, 2015; v1 submitted 5 October, 2014; originally announced October 2014.

MSC Class: 35Q62; 62F15; 35R30; 35Q93; 65C60; 49M15; 86A40

arXiv:1406.6573 [pdf, other]

doi 10.1137/140974407

Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics

Authors: Tobin Isaac, Georg Stadler, Omar Ghattas

Abstract: Motivated by the need for efficient and accurate simulation of the dynamics of the polar ice sheets, we design high-order finite element discretizations and scalable solvers for the solution of nonlinear incompressible Stokes equations. We focus on power-law, shear thinning rheologies used in modeling ice dynamics and other geophysical flows. We use nonconforming hexahedral meshes and the conformi… ▽ More Motivated by the need for efficient and accurate simulation of the dynamics of the polar ice sheets, we design high-order finite element discretizations and scalable solvers for the solution of nonlinear incompressible Stokes equations. We focus on power-law, shear thinning rheologies used in modeling ice dynamics and other geophysical flows. We use nonconforming hexahedral meshes and the conforming inf-sup stable finite element velocity-pressure pairings $\mathbb{Q}_k\times \mathbb{Q}^\text{disc}_{k-2}$ or $\mathbb{Q}_k \times \mathbb{P}^\text{disc}_{k-1}$. To solve the nonlinear equations, we propose a Newton-Krylov method with a block upper triangular preconditioner for the linearized Stokes systems. The diagonal blocks of this preconditioner are sparse approximations of the (1,1)-block and of its Schur complement. The (1,1)-block is approximated using linear finite elements based on the nodes of the high-order discretization, and the application of its inverse is approximated using algebraic multigrid with an incomplete factorization smoother. This preconditioner is designed to be efficient on anisotropic meshes, which are necessary to match the high aspect ratio domains typical for ice sheets. We develop and make available extensions to two libraries---a hybrid meshing scheme for the p4est parallel AMR library, and a modified smoothed aggregation scheme for PETSc---to improve their support for solving PDEs in high aspect ratio domains. In a numerical study, we find that our solver yields fast convergence that is independent of the element aspect ratio, the occurrence of nonconforming interfaces, and of mesh refinement, and that depends only weakly on the polynomial finite element order. We simulate the ice flow in a realistic description of the Antarctic ice sheet derived from field data, and study the parallel scalability of our solver for problems with up to 383M unknowns. △ Less

Submitted 9 July, 2015; v1 submitted 25 June, 2014; originally announced June 2014.

Comments: 31 pages

arXiv:1406.0089 [pdf, other]

doi 10.1137/140970963

Recursive Algorithms for Distributed Forests of Octrees

Authors: Tobin Isaac, Carsten Burstedde, Lucas C. Wilcox, Omar Ghattas

Abstract: The forest-of-octrees approach to parallel adaptive mesh refinement and coarsening (AMR) has recently been demonstrated in the context of a number of large-scale PDE-based applications. Although linear octrees, which store only leaf octants, have an underlying tree structure by definition, it is not often exploited in previously published mesh-related algorithms. This is because the branches are n… ▽ More The forest-of-octrees approach to parallel adaptive mesh refinement and coarsening (AMR) has recently been demonstrated in the context of a number of large-scale PDE-based applications. Although linear octrees, which store only leaf octants, have an underlying tree structure by definition, it is not often exploited in previously published mesh-related algorithms. This is because the branches are not explicitly stored, and because the topological relationships in meshes, such as the adjacency between cells, introduce dependencies that do not respect the octree hierarchy. In this work we combine hierarchical and topological relationships between octree branches to design efficient recursive algorithms. We present three important algorithms with recursive implementations. The first is a parallel search for leaves matching any of a set of multiple search criteria. The second is a ghost layer construction algorithm that handles arbitrarily refined octrees that are not covered by previous algorithms, which require a 2:1 condition between neighboring leaves. The third is a universal mesh topology iterator. This iterator visits every cell in a domain partition, as well as every interface (face, edge and corner) between these cells. The iterator calculates the local topological information for every interface that it visits, taking into account the nonconforming interfaces that increase the complexity of describing the local topology. To demonstrate the utility of the topology iterator, we use it to compute the numbering and encoding of higher-order $C^0$ nodal basis functions. We analyze the complexity of the new recursive algorithms theoretically, and assess their performance, both in terms of single-processor efficiency and in terms of parallel scalability, demonstrating good weak and strong scaling up to 458k cores of the JUQUEEN supercomputer. △ Less

Submitted 19 August, 2015; v1 submitted 31 May, 2014; originally announced June 2014.

Comments: 35 pages, 15 figures, 3 tables

Showing 1–14 of 14 results for author: Isaac, T