Search | arXiv e-print repository

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

Authors: David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama

Abstract: In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-pro… ▽ More In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-processing on the process count appropriate to the given phase of the simulation and other conditions. For demonstration, we implemented this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific Computation, and added a convenient high-level interface into Firedrake, a system for solving partial differential equations using finite element methods. We evaluated our new implementation by saving and loading data involving 8.2 billion finite element degrees of freedom using 8,192 parallel processes on ARCHER2, the UK National Supercomputing Service. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2303.12620 [pdf, other]

A Numerical Study of Landau Dam** with PETSc-PIC

Authors: Daniel S. Finn, Matthew G. Knepley, Joseph V. Pusztay, Mark F. Adams

Abstract: We present a study of the standard plasma physics test, Landau dam**, using the Particle-In-Cell (PIC) algorithm. The Landau dam** phenomenon consists of the dam** of small oscillations in plasmas without collisions. In the PIC method, a hybrid discretization is constructed with a grid of finitely supported basis functions to represent the electric, magnetic and/or gravitational fields, and… ▽ More We present a study of the standard plasma physics test, Landau dam**, using the Particle-In-Cell (PIC) algorithm. The Landau dam** phenomenon consists of the dam** of small oscillations in plasmas without collisions. In the PIC method, a hybrid discretization is constructed with a grid of finitely supported basis functions to represent the electric, magnetic and/or gravitational fields, and a distribution of delta functions to represent the particle field. Approximations to the dispersion relation are found to be inadequate in accurately calculating values for the electric field frequency and dam** rate when parameters of the physical system, such as the plasma frequency or thermal velocity, are varied. We present a full derivation and numerical solution for the dispersion relation, and verify the PETSC-PIC numerical solutions to the Vlasov-Poisson for a large range of wave numbers and charge densities. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 14 pages, 7 figures

arXiv:2209.03228 [pdf, other]

A performance portable, fully implicit Landau collision operator with batched linear solvers

Authors: Mark F. Adams, Peng Wang, Jacob Merson, Kevin Huck, Matthew G. Knepley

Abstract: Modern accelerators use hierarchical parallel programming models that enable massive multithreading within a processing element (PE), with multiple PEs per device driven by traditional processes. Batching is a technique for exposing PE-level parallelism in algorithms that have traditionally run on MPI processes or multiple threads within a single process. Opportunities for batching arise in, for e… ▽ More Modern accelerators use hierarchical parallel programming models that enable massive multithreading within a processing element (PE), with multiple PEs per device driven by traditional processes. Batching is a technique for exposing PE-level parallelism in algorithms that have traditionally run on MPI processes or multiple threads within a single process. Opportunities for batching arise in, for example, kinetic discretizations of magnetized plasmas where collisions are advanced, in velocity space, at each spatial point independently. This paper builds on previous work on a high-performance, fully nonlinear Landau collision operator by batching the linear solver, as well as batching the spatial point problems and adding new support for multiple grids for highly multiscale, multi-species problems. An anisotropic relaxation verification test that agrees well with previous published results and analytical solutions is presented. The performance results from an NVIDIA A100 node and an AMD MI250X node are presented with hardware utilization analysis for each architecture. The entire Landau operator time advance is implemented in the Kokkos language for performance portability, runs entirely on the device, and is available in the PETSc numerical library. △ Less

Submitted 12 February, 2024; v1 submitted 7 September, 2022; originally announced September 2022.

arXiv:2208.07128 [pdf, other]

Tetrahedralization of a Hexahedral Mesh

Authors: Aman Timalsina, Matthew G. Knepley

Abstract: Two important classes of three-dimensional elements in computational meshes are hexahedra and tetrahedra. While several efficient methods exist that convert a hexahedral element to a tetrahedral elements, the existing algorithm for tetrahedralization of a hexahedral complex is the marching tetrahedron algorithm which limits pre-selection of face divisions. We generalize a procedure for tetrahedral… ▽ More Two important classes of three-dimensional elements in computational meshes are hexahedra and tetrahedra. While several efficient methods exist that convert a hexahedral element to a tetrahedral elements, the existing algorithm for tetrahedralization of a hexahedral complex is the marching tetrahedron algorithm which limits pre-selection of face divisions. We generalize a procedure for tetrahedralizing triangular prisms to tetrahedralizing cubes, and combine it with certain heuristics to design an algorithm that can triangulate any hexahedra. △ Less

Submitted 19 January, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: The previous version had an error in the proof of Observation 2.1, which has since been rectified in this version. Formatting and title changed

arXiv:2205.06402 [pdf, other]

doi 10.1137/21M1454079

Conservative Projection Between Finite Element and Particle Bases

Authors: Joseph V. Pusztay, Matthew G. Knepley, Mark F. Adams

Abstract: Particle-in-Cell (PIC) methods employ particle representations of unknown fields, but also employ continuum fields for other parts of the problem. Thus projection between particle and continuum bases is required. Moreover, we often need to enforce conservation constraints on this projection. We derive a mechanism for enforcement based on weak equality, and implement it in the PETSc libraries. Scal… ▽ More Particle-in-Cell (PIC) methods employ particle representations of unknown fields, but also employ continuum fields for other parts of the problem. Thus projection between particle and continuum bases is required. Moreover, we often need to enforce conservation constraints on this projection. We derive a mechanism for enforcement based on weak equality, and implement it in the PETSc libraries. Scalability is demonstrated to more than 1B particles. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2205.03354 [pdf, other]

On the order of accuracy for finite difference approximations of partial differential equations using stencil composition

Authors: Abhishek Mishra, David Salac, Matthew G. Knepley

Abstract: Stencil composition uses the idea of function composition, wherein two stencils with arbitrary orders of derivative are composed to obtain a stencil with a derivative order equal to sum of the orders of the composing stencils. In this paper, we show how stencil composition can be applied to form finite difference stencils in order to numerically solve partial differential equations (PDEs). We pres… ▽ More Stencil composition uses the idea of function composition, wherein two stencils with arbitrary orders of derivative are composed to obtain a stencil with a derivative order equal to sum of the orders of the composing stencils. In this paper, we show how stencil composition can be applied to form finite difference stencils in order to numerically solve partial differential equations (PDEs). We present various properties of stencil composition and investigate the relationship between the order of accuracy of the composed stencil and that of the composing stencils. We also present comparisons between the stability restrictions of composed higher-order PDEs to their compact versions and numerical experiments wherein we verify the order of accuracy by convergence tests. To demonstrate an application to PDEs, a boundary value problem involving the two-dimensional biharmonic equation is numerically solved using stencil composition and the order of accuracy is verified by performing a convergence test. The method is then applied to the Cahn-Hilliard phase-field model. In addition to sample results in 2D and 3D for this benchmark problem, the scalability, spectral properties, and sparsity is explored. △ Less

Submitted 15 August, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

arXiv:2201.02806 [pdf, other]

Parallel Metric-Based Mesh Adaptation in PETSc using ParMmg

Authors: Joseph G. Wallwork, Matthew G. Knepley, Nicolas Barral, Matthew D. Piggott

Abstract: This research note documents the integration of the MPI-parallel metric-based mesh adaptation toolkit ParMmg into the solver library PETSc. This coupling brings robust, scalable anisotropic mesh adaptation to a wide community of PETSc users, as well as users of downstream packages. We demonstrate the new functionality via the solution of Poisson problems in three dimensions, with both uniform and… ▽ More This research note documents the integration of the MPI-parallel metric-based mesh adaptation toolkit ParMmg into the solver library PETSc. This coupling brings robust, scalable anisotropic mesh adaptation to a wide community of PETSc users, as well as users of downstream packages. We demonstrate the new functionality via the solution of Poisson problems in three dimensions, with both uniform and spatially-varying right-hand sides. △ Less

Submitted 27 July, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: 5 pages, 2 figures. Appeared as a research note in the 30th International Meshing Roundtable

MSC Class: 35-04 ACM Class: G.4

arXiv:2104.10000 [pdf, other]

doi 10.1109/ipdps53621.2022.00020

Exascale Landau collision operator in the Cuda programming model applied to thermal quench plasmas

Authors: M. F. Adams, D. P. Brennan, M. G. Knepley, P. Wang

Abstract: Collisional processes are critical in the understanding of non-Maxwellian plasmas. The Landau form of the Fokker-Planck equation is the gold standard for modeling collisions in most plasmas, however O(N^2) work complexity inhibits its widespread use. We show that with advanced numerical methods and GPU hardware this cost can be effectively mitigated. This paper extends previous work on a conservat… ▽ More Collisional processes are critical in the understanding of non-Maxwellian plasmas. The Landau form of the Fokker-Planck equation is the gold standard for modeling collisions in most plasmas, however O(N^2) work complexity inhibits its widespread use. We show that with advanced numerical methods and GPU hardware this cost can be effectively mitigated. This paper extends previous work on a conservative, high order accurate, finite element discretization with adaptive mesh refinement of the Landau operator, with extensions to GPU hardware and implementations in both the CUDA and Kokkos programming languages. This work focuses on the Landau kernels and on NVIDIA hardware, however preliminary results on AMD and Fujitsu/ARM hardware, as well as end-to-end performance of a velocity space model of a plasma thermal quench, are also presented. Both the fully implicit Landau time integrator and the plasma thermal quench model are publicly available in PETSc (Portable, Extensible, Toolkit for Scientific computing). △ Less

Submitted 18 May, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

Journal ref: IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022

arXiv:2103.12067 [pdf, other]

doi 10.1177/1094342020966835

Understanding performance variability in standard and pipelined parallel Krylov solvers

Authors: Hannah Morgan, Patrick Sanan, Matthew G. Knepley, Richard Tran Mills

Abstract: In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variati… ▽ More In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variation between statistical distributions of runtimes across iterations. Based on these results, we improve upon a previously introduced nondeterministic performance model by allowing iterations to fluctuate over time. We present our data from runs of various Krylov algorithms across multiple platforms as well as our updated non-stationary model that provides good agreement with observations. We also suggest how it can be used as a predictive tool. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 18 pages, 12 figures

Journal ref: IJHPCA, 35(1), 2020

arXiv:2012.11764 [pdf, other]

doi 10.1017/S0022377821000441

Implementation of higher-order velocity map** between marker particles and grid in the particle-in-cell code XGC

Authors: Albert Mollén, M. F. Adams, M. G. Knepley, R. Hager, C. S. Chang

Abstract: The global total-$f$ gyrokinetic particle-in-cell code XGC, used to study transport in magnetic fusion plasmas, implements a continuum grid to perform the dissipative operations, such as plasma collisions. To transfer the distribution function between marker particles and a rectangular velocity-space grid, XGC employs a bilinear map**. The conservation of particle density and momentum is accurat… ▽ More The global total-$f$ gyrokinetic particle-in-cell code XGC, used to study transport in magnetic fusion plasmas, implements a continuum grid to perform the dissipative operations, such as plasma collisions. To transfer the distribution function between marker particles and a rectangular velocity-space grid, XGC employs a bilinear map**. The conservation of particle density and momentum is accurate enough in this bilinear operation, but the error in the particle energy conservation can become undesirably large in special conditions. In the present work we update XGC to use a novel map** technique, based on the calculation of a pseudo-inverse, to exactly preserve moments up to the order of the discretization space. We describe the details of the implementation and we demonstrate the reduced interpolation error for a neoclassical tokamak test case by using $1^{\mathrm{st}}$- and $2^{\mathrm{nd}}$-order elements with the pseudo-inverse method and comparing to the bilinear map**. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: 21 pages, 7 figures

arXiv:2004.08729 [pdf, other]

doi 10.1137/20M1332748

Fully Parallel Mesh I/O using PETSc DMPlex with an Application to Waveform Modeling

Authors: Vaclav Hapla, Matthew G. Knepley, Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Andreas Fichtner

Abstract: Large-scale PDE simulations using high-order finite-element methods on unstructured meshes are an indispensable tool in science and engineering. The widely used open-source PETSc library offers an efficient representation of generic unstructured meshes within its DMPlex module. This paper details our recent implementation of parallel mesh reading and topological interpolation (computation of edges… ▽ More Large-scale PDE simulations using high-order finite-element methods on unstructured meshes are an indispensable tool in science and engineering. The widely used open-source PETSc library offers an efficient representation of generic unstructured meshes within its DMPlex module. This paper details our recent implementation of parallel mesh reading and topological interpolation (computation of edges and faces from a cell-vertex mesh) into DMPlex. We apply these developments to seismic wave propagation scenarios on Mars as an example application. The principal motivation is to overcome single-node memory limits and reach mesh sizes which were impossible before. Moreover, we demonstrate that scalability of I/O and topological interpolation goes beyond 12'000 cores, and memory-imposed limits on mesh size vanish. △ Less

Submitted 15 September, 2020; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: 23 pages, 11 figures

MSC Class: 65-04; 65Y05; 65M50; 05C90; 35L05

Journal ref: SIAM J. Sci. Comput. 43 (2021) C127-C153

arXiv:1912.08516 [pdf, other]

doi 10.1145/3445791

PCPATCH: software for the topological construction of multigrid relaxation methods

Authors: Patrick E. Farrell, Matthew G. Knepley, Lawrence Mitchell, Florian Wechsung

Abstract: Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß-Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this paper we present a unifying software abstraction, PCPATCH, for the topological construction of space decomposition… ▽ More Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß-Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this paper we present a unifying software abstraction, PCPATCH, for the topological construction of space decompositions for multigrid relaxation methods. Space decompositions are specified by collecting topological entities in a mesh (such as all vertices or faces) and applying a construction rule (such as taking all degrees of freedom in the cells around each entity). The software is implemented in PETSc and facilitates the elegant expression of a wide range of schemes merely by varying solver options at runtime. In turn, this allows for the very rapid development of fast solvers for difficult problems. △ Less

Submitted 5 July, 2021; v1 submitted 18 December, 2019; originally announced December 2019.

Comments: 22 pages, minor fixes in bibliography

Journal ref: ACM Transactions on Mathematical Software 47(3):25 (2021)

arXiv:1809.00747 [pdf, other]

A high order hybridizable discontinuous Galerkin method for incompressible miscible displacement in heterogeneous media

Authors: Maurice S. Fabien, Matthew G. Knepley, Beatrice M. Riviere

Abstract: We present a new method for approximating solutions to the incompressible miscible displacement problem in porous media. At the discrete level, the coupled nonlinear system has been split into two linear systems that are solved sequentially. The method is based on a hybridizable discontinuous Galerkin method for the Darcy flow, which produces a mass--conservative flux approximation, and a hybridiz… ▽ More We present a new method for approximating solutions to the incompressible miscible displacement problem in porous media. At the discrete level, the coupled nonlinear system has been split into two linear systems that are solved sequentially. The method is based on a hybridizable discontinuous Galerkin method for the Darcy flow, which produces a mass--conservative flux approximation, and a hybridizable discontinuous Galerkin method for the transport equation. The resulting method is high order accurate. Due to the implicit treatment of the system of partial differential equations, we observe computationally that no slope limiters are needed. Numerical experiments are provided that show that the method converges optimally and is robust for highly heterogeneous porous media in 2D and 3D. △ Less

Submitted 16 September, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

arXiv:1808.08328 [pdf, other]

doi 10.1016/j.jcp.2019.02.020

Composable block solvers for the four-field double porosity/permeability model

Authors: M. S. Joshaghani, J. Chang, K. B. Nakshatrala, M. G. Knepley

Abstract: The objective of this paper is twofold. First, we propose two composable block solver methodologies to solve the discrete systems that arise from finite element discretizations of the double porosity/permeability (DPP) model. The DPP model, which is a four-field mathematical model, describes the flow of a single-phase incompressible fluid in a porous medium with two distinct pore-networks and with… ▽ More The objective of this paper is twofold. First, we propose two composable block solver methodologies to solve the discrete systems that arise from finite element discretizations of the double porosity/permeability (DPP) model. The DPP model, which is a four-field mathematical model, describes the flow of a single-phase incompressible fluid in a porous medium with two distinct pore-networks and with a possibility of mass transfer between them. Using the composable solvers feature available in PETSc and the finite element libraries available under the Firedrake Project, we illustrate two different ways by which one can effectively precondition these large systems of equations. Second, we employ the recently developed performance model called the Time-Accuracy-Size (TAS) spectrum to demonstrate that the proposed composable block solvers are scalable in both the parallel and algorithmic sense. Moreover, we utilize this spectrum analysis to compare the performance of three different finite element discretizations (classical mixed formulation with H(div) elements, stabilized continuous Galerkin mixed formulation, and stabilized discontinuous Galerkin mixed formulation) for the DPP model. Our performance spectrum analysis demonstrates that the composable block solvers are fine choices for any of these three finite element discretizations. Sample computer codes are provided to illustrate how one can easily implement the proposed block solver methodologies through PETSc command line options. △ Less

Submitted 24 August, 2018; originally announced August 2018.

arXiv:1802.07832 [pdf, other]

Comparative study of finite element methods using the Time-Accuracy-Size (TAS) spectrum analysis

Authors: Justin Chang, Maurice S. Fabien, Matthew G. Knepley, Richard T. Mills

Abstract: We present a performance analysis appropriate for comparing algorithms using different numerical discretizations. By taking into account the total time-to-solution, numerical accuracy with respect to an error norm, and the computation rate, a cost-benefit analysis can be performed to determine which algorithm and discretization are particularly suited for an application. This work extends the perf… ▽ More We present a performance analysis appropriate for comparing algorithms using different numerical discretizations. By taking into account the total time-to-solution, numerical accuracy with respect to an error norm, and the computation rate, a cost-benefit analysis can be performed to determine which algorithm and discretization are particularly suited for an application. This work extends the performance spectrum model in Chang et. al. 2017 for interpretation of hardware and algorithmic tradeoffs in numerical PDE simulation. As a proof-of-concept, popular finite element software packages are used to illustrate this analysis for Poisson's equation. △ Less

Submitted 21 February, 2018; originally announced February 2018.

MSC Class: 65Y05; 65Y20; 68N99

arXiv:1802.06013 [pdf, other]

A hybridizable discontinuous Galerkin method for two-phase flow in heterogeneous porous media

Authors: Maurice S. Fabien, Matthew G. Knepley, Beatrice M. Riviere

Abstract: We present a new method for simulating incompressible immiscible two-phase flow in porous media. The semi-implicit method decouples the wetting phase pressure and saturation equations. The equations are discretized using a hybridizable discontinuous Galerkin (HDG) method. The proposed method is of high order, conserves global/local mass balance, and the number of globally coupled degrees of freedo… ▽ More We present a new method for simulating incompressible immiscible two-phase flow in porous media. The semi-implicit method decouples the wetting phase pressure and saturation equations. The equations are discretized using a hybridizable discontinuous Galerkin (HDG) method. The proposed method is of high order, conserves global/local mass balance, and the number of globally coupled degrees of freedom is significantly reduced compared to standard interior penalty discontinuous Galerkin methods. Several numerical examples illustrate the accuracy and robustness of the method. These examples include verification of convergence rates by manufactured solutions, common 1D benchmarks and realistic discontinuous permeability fields. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: 20 pages, 39 figures, 2 tables

arXiv:1712.08286 [pdf, ps, other]

An Algorithm for Computing Lipschitz Inner Functions in Kolmogorov's Superposition Theorem

Authors: Jonas Actor, Matthew G. Knepley

Abstract: Kolmogorov famously proved that multivariate continuous functions can be represented as a superposition of a small number of univariate continuous functions, $$ f(x_1,\dots,x_n) = \sum_{q=0}^{2n+1} χ^q \left( \sum_{p=1}^n ψ^{pq}(x_p) \right).$$ Fridman \cite{fridman} posed the best smoothness bound for the functions $ψ^{pq}$, that such functions can be constructed to be Lipschitz continuous with c… ▽ More Kolmogorov famously proved that multivariate continuous functions can be represented as a superposition of a small number of univariate continuous functions, $$ f(x_1,\dots,x_n) = \sum_{q=0}^{2n+1} χ^q \left( \sum_{p=1}^n ψ^{pq}(x_p) \right).$$ Fridman \cite{fridman} posed the best smoothness bound for the functions $ψ^{pq}$, that such functions can be constructed to be Lipschitz continuous with constant 1. Previous algorithms to describe these inner functions have only been Hölder continuous, such as those proposed by Köppen and Braun and Griebel. This is problematic, as pointed out by Griebel, in that non-smooth functions have very high storage/evaluation complexity, and this makes Kolmogorov's representation (KR) impractical using the standard definition of the inner functions. To date, no one has presented a method to compute a Lipschitz continuous inner function. In this paper, we revisit Kolmogorov's theorem along with Fridman's result. We examine a simple Lipschitz function which appear to satisfy the necessary criteria for Kolmogorov's representation, but fails in the limit. We then present a full solution to the problem, including an algorithm that computes such a Lipschitz function. △ Less

Submitted 21 December, 2017; originally announced December 2017.

Comments: 18 pages, 5 figures

arXiv:1708.06028 [pdf, ps, other]

Efficient Evaluation of Ellipsoidal Harmonics for Potential Modeling

Authors: Thomas S. Klotz, Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: Ellipsoidal harmonics are a useful generalization of spherical harmonics but present additional numerical challenges. One such challenge is in computing ellipsoidal normalization constants which require approximating a singular integral. In this paper, we present results for approximating normalization constants using a well-known decomposition and applying tanh-sinh quadrature to the resulting in… ▽ More Ellipsoidal harmonics are a useful generalization of spherical harmonics but present additional numerical challenges. One such challenge is in computing ellipsoidal normalization constants which require approximating a singular integral. In this paper, we present results for approximating normalization constants using a well-known decomposition and applying tanh-sinh quadrature to the resulting integrals. Tanh-sinh has been shown to be an effective quadrature scheme for a certain subset of singular integrands. To support our numerical results, we prove that the decomposed integrands lie in the space of functions where tanh-sinh is optimal and compare our results to a variety of similar change-of-variable quadratures. △ Less

Submitted 1 September, 2017; v1 submitted 20 August, 2017; originally announced August 2017.

arXiv:1705.09907 [pdf, other]

Manycore parallel computing for a hybridizable discontinuous Galerkin nested multigrid method

Authors: M. S. Fabien, M. G. Knepley, R. T. Mills, B. M. Riviere

Abstract: We present a parallel computing strategy for a hybridizable discontinuous Galerkin (HDG) nested geometric multigrid (GMG) solver. Parallel GMG solvers require a combination of coarse-grain and fine-grain parallelism to improve time to solution performance. In this work we focus on fine-grain parallelism. We use Intel's second generation Xeon Phi (Knights Landing) many-core processor. The GMG metho… ▽ More We present a parallel computing strategy for a hybridizable discontinuous Galerkin (HDG) nested geometric multigrid (GMG) solver. Parallel GMG solvers require a combination of coarse-grain and fine-grain parallelism to improve time to solution performance. In this work we focus on fine-grain parallelism. We use Intel's second generation Xeon Phi (Knights Landing) many-core processor. The GMG method achieves ideal convergence rates of $0.2$ or less, for high polynomial orders. A matrix free (assembly free) technique is exploited to save considerable memory usage and increase arithmetic intensity. HDG enables static condensation, and due to the discontinuous nature of the discretization, we developed a matrix vector multiply routine that does not require any costly synchronizations or barriers. Our algorithm is able to attain 80\% of peak bandwidth performance for higher order polynomials. This is possible due to the data locality inherent in the HDG method. Very high performance is realized for high order schemes, due to good arithmetic intensity, which declines as the order is reduced. △ Less

Submitted 17 July, 2019; v1 submitted 28 May, 2017; originally announced May 2017.

Comments: 23 pages, 10 figures

arXiv:1705.03625 [pdf, other]

A performance spectrum for parallel computational frameworks that solve PDEs

Authors: J. Chang, K. B. Nakshatrala, M. G. Knepley, L. Johnsson

Abstract: Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whether a computational framework is performing efficiently or is scalable. The aim of this paper is to present various strategies for better understandin… ▽ More Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whether a computational framework is performing efficiently or is scalable. The aim of this paper is to present various strategies for better understanding the performance of any parallel computational frameworks for solving PDEs. Important performance issues that negatively impact time-to-solution are discussed, and we propose a performance spectrum analysis that can enhance one's understanding of critical aforementioned performance issues. As proof of concept, we examine commonly used finite element simulation packages and software and apply the performance spectrum to quickly analyze the performance and scalability across various hardware platforms, software implementations, and numerical discretizations. It is shown that the proposed performance spectrum is a versatile performance model that is not only extendable to more complex PDEs such as hydrostatic ice sheet flow equations, but also useful for understanding hardware performance in a massively parallel computing environment. Potential applications and future extensions of this work are also discussed. △ Less

Submitted 14 September, 2017; v1 submitted 10 May, 2017; originally announced May 2017.

arXiv:1702.08880 [pdf, other]

doi 10.1137/17M1118828

Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

Authors: M. F. Adams, E. Hirvijoki, M. G. Knepley, J. Brown, T. Isaac, R. Mills

Abstract: The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emergin… ▽ More The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emerging architectures with an approach that minimizes memory usage and movement and is suitable for vector processing. The Landau collision integral is vectorized with Intel AVX-512 intrinsics and the solver sustains as much as 22% of the theoretical peak flop rate of the Second Generation Intel Xeon Phi, Knights Landing, processor. △ Less

Submitted 28 February, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Journal ref: SIAM Journal on Scientific Computing, 39 (6), 2017

arXiv:1612.02208 [pdf, ps, other]

Scalable smoothing strategies for a geometric multigrid method for the immersed boundary equations

Authors: Amneet Pal Singh Bhalla, Matthew G. Knepley, Mark F. Adams, Robert D. Guy, Boyce E. Griffith

Abstract: The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable… ▽ More The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable semi-implicit IB method under Stokes flow conditions; however, this solver methodology used a Vanka-type smoother that presented limited opportunities for parallelization. This work extends this Stokes-IB solver methodology by develo** smoothing techniques that are suitable for parallel implementation. Specifically, we demonstrate that an additive version of the Vanka smoother can yield an effective multigrid preconditioner for the Stokes-IB equations, and we introduce an efficient Schur complement-based smoother that is also shown to be effective for the Stokes-IB equations. We investigate the performance of these solvers for a broad range of material stiffnesses, both for Stokes flows and flows at nonzero Reynolds numbers, and for thick and thin structural models. We show here that linear solver performance degrades with increasing Reynolds number and material stiffness, especially for thin interface cases. Nonetheless, the proposed approaches promise to yield effective solution algorithms, especially at lower Reynolds numbers and at modest-to-high elastic stiffnesses. △ Less

Submitted 7 December, 2016; originally announced December 2016.

arXiv:1611.02150 [pdf, other]

doi 10.1063/1.4977037

Predicting Solvation Free Energies and Thermodynamics in Polar Solvents and Mixtures Using a Solvation-Layer Interface Condition

Authors: Amirhossein Molavi Tabrizi, Spencer Goossens, Ali Mehdizadeh Rahimi, Matthew G. Knepley, Jaydeep P. Bardhan

Abstract: We demonstrate that with two small modifications, the popular dielectric continuum model is capable of predicting, with high accuracy, ion solvation thermodynamics in numerous polar solvents, and ion solvation free energies in water--co-solvent mixtures. The first modification involves perturbing the macroscopic dielectric-flux interface condition at the solute--solvent interface with a nonlinear… ▽ More We demonstrate that with two small modifications, the popular dielectric continuum model is capable of predicting, with high accuracy, ion solvation thermodynamics in numerous polar solvents, and ion solvation free energies in water--co-solvent mixtures. The first modification involves perturbing the macroscopic dielectric-flux interface condition at the solute--solvent interface with a nonlinear function of the local electric field, giving what we have called a solvation-layer interface condition (SLIC). The second modification is a simple treatment of the microscopic interface potential (static potential). We show that the resulting model exhibits high accuracy without the need for fitting solute atom radii in a state-dependent fashion. Compared to experimental results in nine water--co-solvent mixtures, SLIC predicts transfer free energies to within 2.5 kJ/mol. The co-solvents include both protic and aprotic species, as well as biologically relevant denaturants such as urea and dimethylformamide. Furthermore, our results indicate that the interface potential is essential to reproduce entropies and heat capacities. The present work, together with previous studies of SLIC illustrating its accuracy for biomolecules in water, indicates it as a promising dielectric continuum model for accurate predictions of molecular solvation in a wide range of conditions. △ Less

Submitted 14 November, 2016; v1 submitted 3 November, 2016; originally announced November 2016.

Comments: 30 pages, 6 figures

arXiv:1610.09874 [pdf, other]

Anisotropic mesh adaptation in Firedrake with PETSc DMPlex

Authors: Nicolas Barral, Matthew G. Knepley, Michael Lange, Matthew D. Piggott, Gerard J. Gorman

Abstract: Despite decades of research in this area, mesh adaptation capabilities are still rarely found in numerical simulation software. We postulate that the primary reason for this is lack of usability. Integrating mesh adaptation into existing software is difficult as non-trivial operators, such as error metrics and interpolation operators, are required, and integrating available adaptive remeshers is n… ▽ More Despite decades of research in this area, mesh adaptation capabilities are still rarely found in numerical simulation software. We postulate that the primary reason for this is lack of usability. Integrating mesh adaptation into existing software is difficult as non-trivial operators, such as error metrics and interpolation operators, are required, and integrating available adaptive remeshers is not straightforward. Our approach presented here is to first integrate Pragmatic, an anisotropic mesh adaptation library, into DMPlex, a PETSc object that manages unstructured meshes and their interactions with PETSc's solvers and I/O routines. As PETSc is already widely used, this will make anisotropic mesh adaptation available to a much larger community. As a demonstration of this we describe the integration of anisotropic mesh adaptation into Firedrake, an automated Finite Element based system for the portable solution of partial differential equations which already uses PETSc solvers and I/O via DMPlex. We present a proof of concept of this integration with a three-dimensional advection test case. △ Less

Submitted 31 October, 2016; originally announced October 2016.

Comments: 5 page, 2 figures, Proceedings of the 25th International Meshing Roundtable, ed. Steve Owen and Hang Si, 2016

arXiv:1607.04257 [pdf, ps, other]

doi 10.1080/00268976.2016.1198503

Generalizing The Mean Spherical Approximation as a Multiscale, Nonlinear Boundary Condition at the Solute--Solvent Interface

Authors: Amirhossein Molavi Tabrizi, Matthew G. Knepley, Jaydeep P. Bardhan

Abstract: In this paper we extend the familiar continuum electrostatic model with a perturbation to the usual macroscopic boundary condition. The perturbation is based on the mean spherical approximation (MSA), to derive a multiscale hydration-shell boundary condition (HSBC). We show that the HSBC/MSA model reproduces MSA predictions for Born ions in a variety of polar solvents, including both protic and ap… ▽ More In this paper we extend the familiar continuum electrostatic model with a perturbation to the usual macroscopic boundary condition. The perturbation is based on the mean spherical approximation (MSA), to derive a multiscale hydration-shell boundary condition (HSBC). We show that the HSBC/MSA model reproduces MSA predictions for Born ions in a variety of polar solvents, including both protic and aprotic solvents. Importantly, the HSBC/MSA model predicts not only solvation free energies accurately but also solvation entropies, which standard continuum electrostatic models fail to predict. The HSBC/MSA model depends only on the normal electric field at the dielectric boundary, similar to our recent development of an HSBC model for charge-sign hydration asymmetry, and the reformulation of the MSA as a boundary condition enables its straightforward application to complex molecules such as proteins. △ Less

Submitted 14 July, 2016; originally announced July 2016.

Comments: 14 pages, 2 figures

Journal ref: Molecular Physics, 2016

arXiv:1607.04254 [pdf, other]

doi 10.1137/130936725

Composing Scalable Nonlinear Algebraic Solvers

Authors: Peter R. Brune, Matthew G. Knepley, Barry F. Smith, Xuemin Tu

Abstract: Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear com… ▽ More Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods. △ Less

Submitted 14 July, 2016; originally announced July 2016.

Comments: 29 pages, 14 figures, 13 tables

MSC Class: 65F08; 65Y05; 65Y20; 68W10

Journal ref: SIAM Review 57(4), 535-565, 2015

arXiv:1607.04245 [pdf, other]

Finite Element Integration with Quadrature on the GPU

Authors: Matthew G. Knepley, Karl Rupp, Andy R. Terrel

Abstract: We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call \textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order d… ▽ More We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call \textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90\% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations. △ Less

Submitted 14 July, 2016; originally announced July 2016.

Comments: 14 pages, 6 figures

ACM Class: G.4; G.1.8

arXiv:1604.07163 [pdf, other]

Extreme-scale Multigrid Components within PETSc

Authors: Dave A. May, Patrick Sanan, Karl Rupp, Matthew G. Knepley, Barry F. Smith

Abstract: Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for high-resolution simulations. The scalability and time-to-solution of massively parallel multilevel precon… ▽ More Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for high-resolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely effected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary. In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation. △ Less

Submitted 25 April, 2016; originally announced April 2016.

arXiv:1602.04873 [pdf, other]

A Stochastic Performance Model for Pipelined Krylov Methods

Authors: Hannah Morgan, Matthew G. Knepley, Patrick Sanan, L. Ridgway Scott

Abstract: Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlap** it with the computation associated with sparse matrix-vector multiplication. We clarify a folk theorem that this can only result in a speedup of $2\times$ over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and… ▽ More Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlap** it with the computation associated with sparse matrix-vector multiplication. We clarify a folk theorem that this can only result in a speedup of $2\times$ over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and we model this using an analytical probability distribution. Our analysis shows that speedups greater than $2\times$ are possible with these algorithms. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1512.08408 [pdf, ps, other]

Multiscale models and approximation algorithms for protein electrostatics

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: Electrostatic forces play many important roles in molecular biology, but are hard to model due to the complicated interactions between biomolecules and the surrounding solvent, a fluid composed of water and dissolved ions. Continuum model have been surprisingly successful for simple biological questions, but fail for important problems such as understanding the effects of protein mutations. In thi… ▽ More Electrostatic forces play many important roles in molecular biology, but are hard to model due to the complicated interactions between biomolecules and the surrounding solvent, a fluid composed of water and dissolved ions. Continuum model have been surprisingly successful for simple biological questions, but fail for important problems such as understanding the effects of protein mutations. In this paper we highlight the advantages of boundary-integral methods for these problems, and our use of boundary integrals to design and test more accurate theories. Examples include a multiscale model based on nonlocal continuum theory, and a nonlinear boundary condition that captures atomic-scale effects at biomolecular surfaces. △ Less

Submitted 28 December, 2015; originally announced December 2015.

Comments: 12 pages, 6 figures

arXiv:1512.08406 [pdf, other]

Work/Precision Tradeoffs in Continuum Models of Biomolecular Electrostatics

Authors: Matthew G. Knepley, Jaydeep P. Bardhan

Abstract: The structure and function of biological molecules are strongly influenced by the water and dissolved ions that surround them. This aqueous solution (solvent) exerts significant electrostatic forces in response to the biomolecule's ubiquitous atomic charges and polar chemical groups. In this work, we investigate a simple approach to numerical calculation of this model using boundary-integral equat… ▽ More The structure and function of biological molecules are strongly influenced by the water and dissolved ions that surround them. This aqueous solution (solvent) exerts significant electrostatic forces in response to the biomolecule's ubiquitous atomic charges and polar chemical groups. In this work, we investigate a simple approach to numerical calculation of this model using boundary-integral equation (BIE) methods and boundary-element methods (BEM). Traditional BEM discretizes the protein--solvent boundary into a set of boundary elements, or panels, and the approximate solution is defined as a weighted combination of basis functions with compact support. The resulting BEM matrix then requires integrating singular or near singular functions, which can be slow and challenging to compute. Here we investigate the accuracy and convergence of a simpler representation, namely modeling the unknown surface charge distribution as a set of discrete point charges on the surface. We find that at low resolution, point-based BEM is more accurate than panel-based methods, due to the fact that the protein surface is sampled directly, and can be of significant value for numerous important calculations that require only moderate accuracy, such as the preliminary stages of rational drug design and protein engineering. △ Less

Submitted 28 December, 2015; originally announced December 2015.

Comments: 10 pages, 8 figures, in Proceedings of ASME 2015 International Mechanical Engineering Congress & Exposition, 2015

arXiv:1508.02470 [pdf, other]

Support for Non-conformal Meshes in PETSc's DMPlex Interface

Authors: Tobin Isaac, Matthew G. Knepley

Abstract: PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---whic… ▽ More PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---which combines a tree structure for subset- superset relationships and a "reference tree" describing the types of non-conformal interfaces---allows finite element code written for conformal meshes to extend automatically: in particular, all "hanging-node" constraint calculations are handled behind the scenes. We give example code demonstrating the use of this extension, and use it to convert forests of quadtrees and forests of octrees from the p4est library to DMPlex meshes. △ Less

Submitted 10 August, 2015; originally announced August 2015.

Comments: 16 pages, 13 figures, 5 code examples

arXiv:1506.07749 [pdf, other]

doi 10.1137/15M1026092

Efficient mesh management in Firedrake using PETSc-DMPlex

Authors: Michael Lange, Lawrence Mitchell, Matthew G. Knepley, Gerard J. Gorman

Abstract: The use of composable abstractions allows the application of new and established algorithms to a wide range of problems while automatically inheriting the benefits of well-known performance optimisations. This work highlights the composition of the PETSc DMPlex domain topology abstraction with the Firedrake automated finite element system to create a PDE solving environment that combines expressiv… ▽ More The use of composable abstractions allows the application of new and established algorithms to a wide range of problems while automatically inheriting the benefits of well-known performance optimisations. This work highlights the composition of the PETSc DMPlex domain topology abstraction with the Firedrake automated finite element system to create a PDE solving environment that combines expressiveness, flexibility and high performance. We describe how Firedrake utilises DMPlex to provide the indirection maps required for finite element assembly, while supporting various mesh input formats and runtime domain decomposition. In particular, we describe how DMPlex and its accompanying data structures allow the generic creation of user-defined discretisations, while utilising data layout optimisations that improve cache coherency and ensure overlapped communication during assembly computation. △ Less

Submitted 25 June, 2015; originally announced June 2015.

Comments: 12 pages, 6 figures, submitted to SISC CSE Special Issue

Journal ref: SIAM Journal on Scientific Computing 38(5):S143-S155 (2016)

arXiv:1506.06194 [pdf, other]

Unstructured Overlap** Mesh Distribution in Parallel

Authors: Matthew G. Knepley, Michael Lange, Gerard J. Gorman

Abstract: We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such as… ▽ More We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such as hybrid meshes, meshes embedded in higher dimension, and overlapped meshes in parallel. We present evidence, both theoretical and experimental, that the algorithms are scalable and efficient. A working implementation can be found in the latest release of the PETSc libraries. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 14 pages, 6 figures, submitted to TOMS

arXiv:1505.06968 [pdf, ps, other]

A Nonlinear Boundary Condition for Continuum Models of Biomolecular Electrostatics

Authors: J. P. Bardhan, D. A. Tejani, N. S. Wieckowski, A. Ramaswamy, M. G. Knepley

Abstract: Understanding the behavior of biomolecules such as proteins requires understanding the critical influence of the surrounding fluid (solvent) environment--water with mobile salt ions such as sodium. Unfortunately, for many studies, fully atomistic simulations of biomolecules, surrounded by thousands of water molecules and ions are too computationally slow. Continuum solvent models based on macrosco… ▽ More Understanding the behavior of biomolecules such as proteins requires understanding the critical influence of the surrounding fluid (solvent) environment--water with mobile salt ions such as sodium. Unfortunately, for many studies, fully atomistic simulations of biomolecules, surrounded by thousands of water molecules and ions are too computationally slow. Continuum solvent models based on macroscopic dielectric theory (e.g. the Poisson equation) are popular alternatives, but their simplicity fails to capture well-known phenomena of functional significance. For example, standard theories predict that electrostatic response is symmetric with respect to the sign of an atomic charge, even though response is in fact strongly asymmetric if the charge is near the biomolecule surface. In this work, we present an asymmetric continuum theory that captures the essential physical mechanism--the finite size of solvent atoms--using a nonlinear boundary condition (NLBC) at the dielectric interface between the biomolecule and solvent. Numerical calculations using boundary-integral methods demonstrate that the new NLBC model reproduces a wide range of results computed by more realistic, and expensive, all-atom molecular-dynamics (MD) simulations in explicit water. We discuss model extensions such as modeling dilute-electrolyte solvents with Debye-Huckel theory (the linearized Poisson-Boltzmann equation) and opportunities for the electromagnetics community to contribute to research in this important area of molecular nanoscience and engineering. △ Less

Submitted 24 May, 2015; originally announced May 2015.

Comments: 7 pages, 1 figures, to appear in PIERS 2015

arXiv:1505.04633 [pdf, other]

Flexible, Scalable Mesh and Data Management using PETSc DMPlex

Authors: Michael Lange, Matthew G. Knepley, Gerard J. Gorman

Abstract: Designing a scientific software stack to meet the needs of the next-generation of mesh-based simulation demands, not only scalable and efficient mesh and data management on a wide range of platforms, but also an abstraction layer that makes it useful for a wide range of application codes. Common utility tasks, such as file I/O, mesh distribution, and work partitioning, should be delegated to exter… ▽ More Designing a scientific software stack to meet the needs of the next-generation of mesh-based simulation demands, not only scalable and efficient mesh and data management on a wide range of platforms, but also an abstraction layer that makes it useful for a wide range of application codes. Common utility tasks, such as file I/O, mesh distribution, and work partitioning, should be delegated to external libraries in order to promote code re-use, extensibility and software interoperability. In this paper we demonstrate the use of PETSc's DMPlex data management API to perform mesh input and domain partitioning in Fluidity, a large scale CFD application. We demonstrate that raising the level of abstraction adds new functionality to the application code, such as support for additional mesh file formats and mesh re- ordering, while improving simulation startup cost through more efficient mesh distribution. Moreover, the separation of concerns accomplished through this interface shifts critical performance and interoperability issues, such as scalable I/O and file format support, to a widely used and supported open source community library, improving the sustainability, performance, and functionality of Fluidity. △ Less

Submitted 18 May, 2015; originally announced May 2015.

Comments: 6 pages, 6 figures, to appear in EASC 2015

arXiv:1409.7418 [pdf, other]

doi 10.1063/1.4897324

Modeling Charge-Sign Asymmetric Solvation Free Energies With Nonlinear Boundary Conditions

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We show that charge-sign-dependent asymmetric hydration can be modeled accurately using linear Poisson theory but replacing the standard electric-displacement boundary condition with a simple nonlinear boundary condition. Using a single multiplicative scaling factor to determine atomic radii from molecular dynamics Lennard-Jones parameters, the new model accurately reproduces MD free-energy calcul… ▽ More We show that charge-sign-dependent asymmetric hydration can be modeled accurately using linear Poisson theory but replacing the standard electric-displacement boundary condition with a simple nonlinear boundary condition. Using a single multiplicative scaling factor to determine atomic radii from molecular dynamics Lennard-Jones parameters, the new model accurately reproduces MD free-energy calculations of hydration asymmetries for (i) monatomic ions, (ii) titratable amino acids in both their protonated and unprotonated states, and (iii) the Mobley "bracelet" and "rod" test problems [J. Phys. Chem. B, v. 112:2408, 2008]. Remarkably, the model also justifies the use of linear response expressions for charging free energies. Our boundary-element method implementation demonstrates the ease with which other continuum-electrostatic solvers can be extended to include asymmetry. △ Less

Submitted 25 September, 2014; originally announced September 2014.

Comments: 7 pages, 2 figures, accepted to Journal of Chemical Physics

arXiv:1407.2905 [pdf, ps, other]

Run-time extensibility and librarization of simulation software

Authors: Jed Brown, Matthew G. Knepley, Barry F. Smith

Abstract: Build-time configuration and environment assumptions are hampering progress and usability in scientific software. That which would be utterly unacceptable in non-scientific software somehow passes for the norm in scientific packages. The community needs reusable software packages that are easy use and flexible enough to accommodate next-generation simulation and analysis demands. Build-time configuration and environment assumptions are hampering progress and usability in scientific software. That which would be utterly unacceptable in non-scientific software somehow passes for the norm in scientific packages. The community needs reusable software packages that are easy use and flexible enough to accommodate next-generation simulation and analysis demands. △ Less

Submitted 10 July, 2014; originally announced July 2014.

Comments: 6 pages

arXiv:1309.1204 [pdf, other]

Achieving High Performance with Unified Residual Evaluation

Authors: Matthew G. Knepley, Jed Brown, Karl Rupp, Barry F. Smith

Abstract: We examine residual evaluation, perhaps the most basic operation in numerical simulation. By raising the level of abstraction in this operation, we can eliminate specialized code, enable optimization, and greatly increase the extensibility of existing code. We examine residual evaluation, perhaps the most basic operation in numerical simulation. By raising the level of abstraction in this operation, we can eliminate specialized code, enable optimization, and greatly increase the extensibility of existing code. △ Less

Submitted 6 September, 2013; v1 submitted 4 September, 2013; originally announced September 2013.

Comments: 4 pages, 1 figure

arXiv:1308.5846 [pdf, other]

doi 10.1002/jgrb.50217

A Domain Decomposition Approach to Implementing Fault Slip in Finite-Element Models of Quasi-static and Dynamic Crustal Deformation

Authors: Brad T. Aagaard, Matthew G. Knepley, Charles A. Williams

Abstract: We employ a domain decomposition approach with Lagrange multipliers to implement fault slip in a finite-element code, PyLith, for use in both quasi-static and dynamic crustal deformation applications. This integrated approach to solving both quasi-static and dynamic simulations leverages common finite-element data structures and implementations of various boundary conditions, discretization scheme… ▽ More We employ a domain decomposition approach with Lagrange multipliers to implement fault slip in a finite-element code, PyLith, for use in both quasi-static and dynamic crustal deformation applications. This integrated approach to solving both quasi-static and dynamic simulations leverages common finite-element data structures and implementations of various boundary conditions, discretization schemes, and bulk and fault rheologies. We have developed a custom preconditioner for the Lagrange multiplier portion of the system of equations that provides excellent scalability with problem size compared to conventional additive Schwarz methods. We demonstrate application of this approach using benchmarks for both quasi-static viscoelastic deformation and dynamic spontaneous rupture propagation that verify the numerical implementation in PyLith. △ Less

Submitted 27 August, 2013; originally announced August 2013.

Comments: 14 pages, 15 figures

Journal ref: Journal of Geophysical Research, 118(6), pp.3059-3079, 2013

arXiv:1209.1711 [pdf, ps, other]

doi 10.1007/978-3-540-70529-1

Programming Languages for Scientific Computing

Authors: Matthew G. Knepley

Abstract: Scientific computation is a discipline that combines numerical analysis, physical understanding, algorithm development, and structured programming. Several yottacycles per year on the world's largest computers are spent simulating problems as diverse as weather prediction, the properties of material composites, the behavior of biomolecules in solution, and the quantum nature of chemical compounds.… ▽ More Scientific computation is a discipline that combines numerical analysis, physical understanding, algorithm development, and structured programming. Several yottacycles per year on the world's largest computers are spent simulating problems as diverse as weather prediction, the properties of material composites, the behavior of biomolecules in solution, and the quantum nature of chemical compounds. This article is intended to review specfic languages features and their use in computational science. We will review the strengths and weaknesses of different programming styles, with examples taken from widely used scientific codes. △ Less

Submitted 9 January, 2018; v1 submitted 8 September, 2012; originally announced September 2012.

Comments: 21 pages

Journal ref: Encyclopedia of Applied and Computational Mathematics, Springer, 2012

arXiv:1208.3866 [pdf, ps, other]

Analytical Nonlocal Electrostatics Using Eigenfunction Expansions of Boundary-Integral Operators

Authors: Jaydeep P. Bardhan, Matthew G. Knepley, Peter R. Brune

Abstract: In this paper, we present an analytical solution to nonlocal continuum electrostatics for an arbitrary charge distribution in a spherical solute. Our approach relies on two key steps: (1) re-formulating the PDE problem using boundary-integral equations, and (2) diagonalizing the boundary-integral operators using the fact their eigenfunctions are the surface spherical harmonics. To introduce this u… ▽ More In this paper, we present an analytical solution to nonlocal continuum electrostatics for an arbitrary charge distribution in a spherical solute. Our approach relies on two key steps: (1) re-formulating the PDE problem using boundary-integral equations, and (2) diagonalizing the boundary-integral operators using the fact their eigenfunctions are the surface spherical harmonics. To introduce this uncommon approach for analytical calculations in separable geometries, we rederive Kirkwood's classic results for a protein surrounded concentrically by a pure-water ion-exclusion layer and then a dilute electrolyte (modeled with the linearized Poisson--Boltzmann equation). Our main result, however, is an analytical method for calculating the reaction potential in a protein embedded in a nonlocal-dielectric solvent, the Lorentz model studied by Dogonadze and Kornyshev. The analytical method enables biophysicists to study the new nonlocal theory in a simple, computationally fast way; an open-source MATLAB implementation is included as supplemental information. △ Less

Submitted 20 August, 2012; v1 submitted 19 August, 2012; originally announced August 2012.

Comments: 19 pages, 7 figures

arXiv:1204.0267 [pdf, ps, other]

doi 10.1088/1749-4699/5/1/014006

Computational science and re-discovery: open-source implementations of ellipsoidal harmonics for problems in potential theory

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We present two open-source (BSD) implementations of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables. Ellipsoidal harmonics are used surprisingly infrequently, considering their substantial value for problems ranging in scale from molecules to the entire solar system. In this article, we suggest two possible reasons for the paucity relative to… ▽ More We present two open-source (BSD) implementations of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables. Ellipsoidal harmonics are used surprisingly infrequently, considering their substantial value for problems ranging in scale from molecules to the entire solar system. In this article, we suggest two possible reasons for the paucity relative to spherical harmonics. The first is essentially historical---ellipsoidal harmonics developed during the late 19th century and early 20th, when it was found that only the lowest-order harmonics are expressible in closed form. Each higher-order term requires the solution of an eigenvalue problem, and tedious manual computation seems to have discouraged applications and theoretical studies. The second explanation is practical: even with modern computers and accurate eigenvalue algorithms, expansions in ellipsoidal harmonics are significantly more challenging to compute than those in Cartesian or spherical coordinates. The present implementations reduce the "barrier to entry" by providing an easy and free way for the community to begin using ellipsoidal harmonics in actual research. We demonstrate our implementation using the specific and physiologically crucial problem of how charged proteins interact with their environment, and ask: what other analytical tools await re-discovery in an era of inexpensive computation? △ Less

Submitted 3 April, 2012; v1 submitted 1 April, 2012; originally announced April 2012.

Comments: 25 pages, 3 figures

Journal ref: Computational Science & Discovery, 5:014006, 2012

arXiv:1111.6583 [pdf, other]

doi 10.1137/110856976

PyClaw: Accessible, Extensible, Scalable Tools for Wave Propagation Problems

Authors: David I. Ketcheson, Kyle T. Mandli, Aron Ahmadia, Amal Alghamdi, Manuel Quezada, Matteo Parsani, Matthew G. Knepley, Matthew Emmett

Abstract: Development of scientific software involves tradeoffs between ease of use, generality, and performance. We describe the design of a general hyperbolic PDE solver that can be operated with the convenience of MATLAB yet achieves efficiency near that of hand-coded Fortran and scales to the largest supercomputers. This is achieved by using Python for most of the code while employing automatically-wrap… ▽ More Development of scientific software involves tradeoffs between ease of use, generality, and performance. We describe the design of a general hyperbolic PDE solver that can be operated with the convenience of MATLAB yet achieves efficiency near that of hand-coded Fortran and scales to the largest supercomputers. This is achieved by using Python for most of the code while employing automatically-wrapped Fortran kernels for computationally intensive routines, and using Python bindings to interface with a parallel computing library and other numerical packages. The software described here is PyClaw, a Python-based structured grid solver for general systems of hyperbolic PDEs \cite{pyclaw}. PyClaw provides a powerful and intuitive interface to the algorithms of the existing Fortran codes Clawpack and SharpClaw, simplifying code development and use while providing massive parallelism and scalable solvers via the PETSc library. The package is further augmented by use of PyWENO for generation of efficient high-order weighted essentially non-oscillatory reconstruction code. The simplicity, capability, and performance of this approach are demonstrated through application to example problems in shallow water flow, compressible flow and elasticity. △ Less

Submitted 12 May, 2012; v1 submitted 27 November, 2011; originally announced November 2011.

Journal ref: SISC 34(4):C210-C231 (2012)

arXiv:1109.0651 [pdf, ps, other]

doi 10.1063/1.3641485

Mathematical Analysis of the BIBEE Approximation for Molecular Solvation: Exact Results for Spherical Inclusions

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We analyze the mathematically rigorous BIBEE (boundary-integral based electrostatics estimation) approximation of the mixed-dielectric continuum model of molecular electrostatics, using the analytically solvable case of a spherical solute containing an arbitrary charge distribution. Our analysis, which builds on Kirkwood's solution using spherical harmonics, clarifies important aspects of the appr… ▽ More We analyze the mathematically rigorous BIBEE (boundary-integral based electrostatics estimation) approximation of the mixed-dielectric continuum model of molecular electrostatics, using the analytically solvable case of a spherical solute containing an arbitrary charge distribution. Our analysis, which builds on Kirkwood's solution using spherical harmonics, clarifies important aspects of the approximation and its relationship to Generalized Born models. First, our results suggest a new perspective for analyzing fast electrostatic models: the separation of variables between material properties (the dielectric constants) and geometry (the solute dielectric boundary and charge distribution). Second, we find that the eigenfunctions of the reaction-potential operator are exactly preserved in the BIBEE model for the sphere, which supports the use of this approximation for analyzing charge-charge interactions in molecular binding. Third, a comparison of BIBEE to the recent GB$ε$ theory suggests a modified BIBEE model capable of predicting electrostatic solvation free energies to within 4% of a full numerical Poisson calculation. This modified model leads to a projection-framework understanding of BIBEE and suggests opportunities for future improvements. △ Less

Submitted 3 September, 2011; originally announced September 2011.

Comments: 33 pages, 5 figures

Journal ref: Journal of Chemical Physics, 135(12):124107-124117, 2011

arXiv:1107.5951 [pdf, other]

doi 10.1111/j.1365-246X.2011.05167.x

Optimal, scalable forward models for computing gravity anomalies

Authors: Dave A. May, Matthew G. Knepley

Abstract: We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM)… ▽ More We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM). The methods utilizing the PDE formulation described here differ from previously published approaches used in gravity modeling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to $10^8$ voxels on 1000's of cores. △ Less

Submitted 29 July, 2011; originally announced July 2011.

Comments: 38 pages, 13 figures; accepted by Geophysical Journal International

Journal ref: Geophysical Journal International, 187(1):161-177, 2011

arXiv:1104.0261 [pdf, other]

Unstructured Geometric Multigrid in Two and Three Dimensions on Complex and Graded Meshes

Authors: Peter R. Brune, Matthew G. Knepley, L. Ridgway Scott

Abstract: The use of multigrid and related preconditioners with the finite element method is often limited by the difficulty of applying the algorithm effectively to a problem, especially when the domain has a complex shape or adaptive refinement. We introduce a simplification of a general topologically-motivated mesh coarsening algorithm for use in creating hierarchies of meshes for geometric unstructured… ▽ More The use of multigrid and related preconditioners with the finite element method is often limited by the difficulty of applying the algorithm effectively to a problem, especially when the domain has a complex shape or adaptive refinement. We introduce a simplification of a general topologically-motivated mesh coarsening algorithm for use in creating hierarchies of meshes for geometric unstructured multigrid methods. The connections between the guarantees of this technique and the quality criteria necessary for multigrid methods for non-quasi-uniform problems are noted. The implementation details, in particular those related to coarsening, remeshing, and interpolation, are discussed. Computational tests on pathological test cases from adaptive finite element methods show the performance of the technique. △ Less

Submitted 5 April, 2011; v1 submitted 1 April, 2011; originally announced April 2011.

Comments: 17 pages, 5 figures, 4 tables

MSC Class: 65N30; 65M50; 65M55

Journal ref: SIAM Journal on Scientific Computing, 35(1), A173-A191, 2013

arXiv:1103.0066 [pdf, other]

doi 10.1145/2427023.2427027

Finite Element Integration on GPUs

Authors: Matthew G. Knepley, Andy R. Terrel

Abstract: We present a novel finite element integration method for low order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators. We present a novel finite element integration method for low order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators. △ Less

Submitted 28 February, 2011; originally announced March 2011.

Comments: 16 pages, 3 figures

ACM Class: G.4; G.1.8

Journal ref: ACM Transactions on Mathematical Software, 39(2), 2013

arXiv:1008.2410 [pdf, other]

Removing the Barrier to Scalability in Parallel FMM

Authors: Matthew G. Knepley

Abstract: The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing workload on higher levels of the FMM tree [Greengard and Gropp, Comp. Math. Appl., 20(7), 1990]. We show that this potential bottleneck can be eliminated by overlap** multipole and local expansion computations with direct kernel evaluations on the finest level grid. The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing workload on higher levels of the FMM tree [Greengard and Gropp, Comp. Math. Appl., 20(7), 1990]. We show that this potential bottleneck can be eliminated by overlap** multipole and local expansion computations with direct kernel evaluations on the finest level grid. △ Less

Submitted 13 August, 2010; originally announced August 2010.

Comments: 11 pages, 2 figures

arXiv:1007.4591 [pdf, other]

doi 10.1016/j.cpc.2011.02.013

Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns

Authors: Rio Yokota, Jaydeep P. Bardhan, Matthew G. Knepley, L. A. Barba, Tsuyoshi Hamada

Abstract: We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved… ▽ More We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs. We are currently adapting our BEM software to solve the linearized Poisson-Boltzmann equation for dilute ionic solutions, and it is also designed to be flexible enough to be extended for a variety of integral equation problems, ranging from Poisson problems to Helmholtz problems in electromagnetics and acoustics to high Reynolds number flow. △ Less

Submitted 10 February, 2011; v1 submitted 26 July, 2010; originally announced July 2010.

Journal ref: Comput. Phys. Commun., 182(6):1271-1283 (2011)

Showing 1–50 of 57 results for author: Knepley, M G