-
Automatic Synthesis of Low-Complexity Translation Operators for the Fast Multipole Method
Authors:
Isuru Fernando,
Andreas Klöckner
Abstract:
We demonstrate a new, hybrid symbolic-numerical method for the automatic synthesis of all families of translation operators required for the execution of the Fast Multipole Method (FMM). Our method is applicable in any dimensionality and to any translation-invariant kernel. The Fast Multipole Method, of course, is the leading approach for attaining linear complexity in the evaluation of long-range…
▽ More
We demonstrate a new, hybrid symbolic-numerical method for the automatic synthesis of all families of translation operators required for the execution of the Fast Multipole Method (FMM). Our method is applicable in any dimensionality and to any translation-invariant kernel. The Fast Multipole Method, of course, is the leading approach for attaining linear complexity in the evaluation of long-range (e.g. Coulomb) many-body interactions. Low complexity in translation operators for the Fast Multipole Method (FMM) is usually achieved by algorithms specialized for a potential obeying a specific partial differential equation (PDE). Absent a PDE or specialized algorithms, Taylor series based FMMs or kernel-independent FMM have been used, at asymptotically higher expense.
When symbolically provided with a constant-coefficient elliptic PDE obeyed by the potential, our algorithm can automatically synthesize translation operators requiring $\mathrm{O}(p^d)$ operations, where $p$ is the expansion order and $d$ is dimension, compared with $\mathrm{O}(p^{2d})$ operations in a naive approach carried out on (Cartesian) Taylor expansions. This is achieved by using a compression scheme that asymptotically reduces the number of terms in the Taylor expansion and then operating directly on this ``compressed'' representation. Judicious exploitation of shared subexpressions permits formation, translation, and evaluation of local and multipole expansions to be performed in $\mathrm{O}(p^{d})$ operations, while an FFT-based scheme permits multipole-to-local translations in $\mathrm{O}(p^{d-1}\log(p))$ operations. We demonstrate computational scaling of code generation and evaluation as well as numerical accuracy through numerical experiments on a number of potentials from classical physics.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Exact domain truncation for the Morse-Ingard equations
Authors:
Robert C. Kirby,
Xiaoyu Wei,
Andreas Kloeckner
Abstract:
Morse and Ingard give a coupled system of time-harmonic equations for the temperature and pressure of an excited gas. These equations form a critical aspect of modeling trace gas sensors. Like other wave propagation problems, the computational problem must be closed with suitable far-field boundary conditions. Working in a scattered-field formulation, we adapt a nonlocal boundary condition propose…
▽ More
Morse and Ingard give a coupled system of time-harmonic equations for the temperature and pressure of an excited gas. These equations form a critical aspect of modeling trace gas sensors. Like other wave propagation problems, the computational problem must be closed with suitable far-field boundary conditions. Working in a scattered-field formulation, we adapt a nonlocal boundary condition proposed earlier for the Helmholtz equation to this coupled system. This boundary condition uses a Green's formula for the true solution on the boundary, giving rise to a nonlocal perturbation of standard transmission boundary conditions. However, the boundary condition is exact and so Galerkin discretization of the resulting problem converges to the restriction of the exact solution to the computational domain. Numerical results demonstrate that accuracy can be obtained on relatively coarse meshes on small computational domains, and the resulting algebraic systems may be solved by GMRES using the local part of the operator as an effective preconditioner.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Integral Equation Methods for the Morse-Ingard Equations
Authors:
Xiaoyu Wei,
Andreas Klöckner,
Robert C. Kirby
Abstract:
We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing wi…
▽ More
We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the ill-conditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nyström method based on quadrature-by-expansion (QBX) with fast-multipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry.
△ Less
Submitted 21 April, 2023; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Finite elements for Helmholtz equations with a nonlocal boundary condition
Authors:
Robert C. Kirby,
Andreas Klöckner,
Ben Sepanski
Abstract:
Numerical resolution of exterior Helmholtz problems requires some approach to domain truncation. As an alternative to approximate nonreflecting boundary conditions and invocation of the Dirichlet-to-Neumann map, we introduce a new, nonlocal boundary condition. This condition is exact and requires the evaluation of layer potentials involving the free space Green's function. However, it seems to wor…
▽ More
Numerical resolution of exterior Helmholtz problems requires some approach to domain truncation. As an alternative to approximate nonreflecting boundary conditions and invocation of the Dirichlet-to-Neumann map, we introduce a new, nonlocal boundary condition. This condition is exact and requires the evaluation of layer potentials involving the free space Green's function. However, it seems to work in general unstructured geometry, and Galerkin finite element discretization leads to convergence under the usual mesh constraints imposed by Gårding-type inequalities. The nonlocal boundary conditions are readily approximated by fast multipole methods, and the resulting linear system can be preconditioned by the purely local operator involving transmission boundary conditions.
△ Less
Submitted 2 March, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
On the Approximation of Local Expansions of Laplace Potentials by the Fast Multipole Method
Authors:
Matt Wala,
Andreas Klöckner
Abstract:
In this paper, we present a generalization of the classical error bounds of Greengard-Rokhlin for the Fast Multipole Method (FMM) for Laplace potentials in three dimensions, extended to the case of local expansion (instead of point) targets. We also present a complementary, less sharp error bound proven via approximation theory whose applicability is not restricted to Laplace potentials. Our study…
▽ More
In this paper, we present a generalization of the classical error bounds of Greengard-Rokhlin for the Fast Multipole Method (FMM) for Laplace potentials in three dimensions, extended to the case of local expansion (instead of point) targets. We also present a complementary, less sharp error bound proven via approximation theory whose applicability is not restricted to Laplace potentials. Our study is motivated by the GIGAQBX FMM, an algorithm for the fast, high-order accurate evaluation of layer potentials near and on the source layer. GIGAQBX is based on the FMM, but unlike a conventional FMM, which is designed to evaluate potentials at point-shaped targets, GIGAQBX evaluates local expansions of potentials at ball-shaped targets. Although the accuracy (or the acceleration error, i.e., error due to the approximation of the potential by the fast algorithm) of the conventional FMM is well understood, the acceleration error of FMM-based algorithms applied to the evaluation of local expansions has not been as well studied. The main contribution of this paper is a proof of a set of hypotheses first demonstrated numerically in the paper "A Fast Algorithm for Quadrature by Expansion in Three Dimensions," which pertain to the accuracy of FMM approximation of local expansions of Laplace potentials in three dimensions. These hypotheses are also essential to the three-dimensional error bound for GIGAQBX, which was previously stated conditionally on their truth and can now be stated unconditionally.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
An Integral Equation Method for the Cahn-Hilliard Equation in the Wetting Problem
Authors:
Xiaoyu Wei,
Shidong Jiang,
Andreas Kloeckner,
Xiao-** Wang
Abstract:
We present an integral equation approach to solving the Cahn-Hilliard equation equipped with boundary conditions that model solid surfaces with prescribed Young's angles. The discretization of the system in time using convex splitting leads to a modified biharmonic equation at each time step. To solve it, we split the solution into a volume potential computed with free space kernels, plus the solu…
▽ More
We present an integral equation approach to solving the Cahn-Hilliard equation equipped with boundary conditions that model solid surfaces with prescribed Young's angles. The discretization of the system in time using convex splitting leads to a modified biharmonic equation at each time step. To solve it, we split the solution into a volume potential computed with free space kernels, plus the solution to a second kind integral equation (SKIE). The volume potential is evaluated with the help of a box-based volume-FMM method. For non-box domains, source density is extended by solving a biharmonic Dirichlet problem. The near-singular boundary integrals are computed using quadrature by expansion (QBX) with FMM acceleration. Our method has linear complexity in the number of surface/volume degrees of freedom and can achieve high order convergence with adaptive refinement to manage error from function extension.
△ Less
Submitted 3 March, 2020; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Optimization of Fast Algorithms for Global Quadrature by Expansion Using Target-Specific Expansions
Authors:
Matt Wala,
Andreas Klöckner
Abstract:
We develop an algorithm for the asymptotically fast evaluation of layer potentials close to and on the source geometry, combining Geometric Global Accelerated QBX (`GIGAQBX') and target-specific expansions. GIGAQBX is a fast high-order scheme for evaluation of layer potentials based on Quadrature by Expansion (`QBX') using local expansions formed via the Fast Multipole Method (FMM). Target-specifi…
▽ More
We develop an algorithm for the asymptotically fast evaluation of layer potentials close to and on the source geometry, combining Geometric Global Accelerated QBX (`GIGAQBX') and target-specific expansions. GIGAQBX is a fast high-order scheme for evaluation of layer potentials based on Quadrature by Expansion (`QBX') using local expansions formed via the Fast Multipole Method (FMM). Target-specific expansions serve to lower the cost of the formation and evaluation of QBX local expansions, reducing the associated computational effort from $O((p+1)^{2})$ to $O(p+1)$ in three dimensions, without any accuracy loss compared with conventional expansions, but with the loss of source/target separation in the expansion coefficients. GIGAQBX is a `global' QBX scheme, meaning that the potential is mediated entirely through expansions for points close to or on the boundary. In our scheme, this single global expansion is decomposed into two parts that are evaluated separately: one part incorporating near-field contributions using target-specific expansions, and one part using conventional spherical harmonic expansions of far-field contributions, noting that convergence guarantees only exist for the sum of the two sub-expansions. By contrast, target-specific expansions were originally introduced as an acceleration mechanism for `local' QBX schemes, in which the far-field does not contribute to the QBX expansion. Compared with the unmodified GIGAQBX algorithm, we show through a reproducible, time-calibrated cost model that the combined scheme yields a considerable cost reduction for the near-field evaluation part of the computation. We support the effectiveness of our scheme through numerical results demonstrating performance improvements for Laplace and Helmholtz kernels.
△ Less
Submitted 15 November, 2019; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Multiscale Hydrophobic Lipid Dynamics Simulated by Efficient Integral Equation Methods
Authors:
Szu-Pei P. Fu,
Rolf J. Ryham,
Andreas Klöckner,
Matt Wala,
Shidong Jiang,
Yuan-Nan Young
Abstract:
In this paper, we first develop a mathematical model for long-range, hydrophobic attraction between amphiphilic particles. The non-pairwise interactions follow from the first variation of a hydrophobic attraction domain functional. The variation yields a hydrophobic stress that is used to numerically calculate trajectories for a collection of two-dimensional particles. The functional minimizer tha…
▽ More
In this paper, we first develop a mathematical model for long-range, hydrophobic attraction between amphiphilic particles. The non-pairwise interactions follow from the first variation of a hydrophobic attraction domain functional. The variation yields a hydrophobic stress that is used to numerically calculate trajectories for a collection of two-dimensional particles. The functional minimizer that accounts for hydrophobicity at molecular-aqueous interfaces is a solution to a boundary value problem of the screened Laplace equation. We reformulate the boundary value problem as a second-kind integral equation (SKIE), discretize the SKIE using a Nyström discretization and `Quadrature by Expansion' (QBX) and solve the resulting linear system iteratively using GMRES. We evaluate the required layer potentials using the `GIGAQBX' fast algorithm, a variant of the Fast Multipole Method (FMM), yielding the required particle interactions with asymptotically optimal cost. The entire scheme is adaptive, high-order, and capable of handling close-to-touching geometry. The simulated particle systems exhibit a variety of multiscale behaviors over both time and length: Over short time scales, the numerical results show self-assembly for model lipid particles. For large system simulations, the particles form realistic configurations like micelles and bilayers. Over long time scales, the bilayer shapes emerging from the simulation appear to minimize a form of bending energy.
△ Less
Submitted 17 July, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Multi-Rate Time Integration on Overset Meshes
Authors:
Cory Mikida,
Andreas Klöckner,
Daniel Bodony
Abstract:
Overset meshes are an effective tool for the computational fluid dynamic simulation of problems with complex geometries or multiscale spatio-temporal features. When the maximum allowable timestep on one or more meshes is significantly smaller than on the remaining meshes, standard explicit time integrators impose inefficiencies for time-accurate calculations by requiring that all meshes advance wi…
▽ More
Overset meshes are an effective tool for the computational fluid dynamic simulation of problems with complex geometries or multiscale spatio-temporal features. When the maximum allowable timestep on one or more meshes is significantly smaller than on the remaining meshes, standard explicit time integrators impose inefficiencies for time-accurate calculations by requiring that all meshes advance with the smallest timestep. With the targeted use of multi-rate time integrators, separate meshes can be time-marched at independent rates to avoid wasteful computation while maintaining accuracy and stability. This work applies time-explicit multi-rate integrators to the simulation of the compressible Navier-Stokes equations discretized on overset meshes using summation-by-parts (SBP) operators and simultaneous approximation term (SAT) boundary conditions. We introduce a class of multi-rate Adams-Bashforth (MRAB) schemes that offer significant stability improvements and computational efficiencies for SBP-SAT methods. We present numerical results that confirm the efficacy of MRAB integrators, outline a number of implementation challenges, and demonstrate a reduction in computational cost enabled by MRAB. We also investigate the use of our method in the setting of a large-scale distributed-memory parallel implementation where we discuss concerns involving load balancing and communication efficiency.
△ Less
Submitted 10 July, 2019; v1 submitted 17 May, 2018;
originally announced May 2018.
-
A Fast Algorithm for Quadrature by Expansion in Three Dimensions
Authors:
Matt Wala,
Andreas Klöckner
Abstract:
This paper presents an accelerated quadrature scheme for the evaluation of layer potentials in three dimensions. Our scheme combines a generic, high order quadrature method for singular kernels called Quadrature by Expansion (QBX) with a modified version of the Fast Multipole Method (FMM). Our scheme extends a recently developed formulation of the FMM for QBX in two dimensions, which, in that sett…
▽ More
This paper presents an accelerated quadrature scheme for the evaluation of layer potentials in three dimensions. Our scheme combines a generic, high order quadrature method for singular kernels called Quadrature by Expansion (QBX) with a modified version of the Fast Multipole Method (FMM). Our scheme extends a recently developed formulation of the FMM for QBX in two dimensions, which, in that setting, achieves mathematically rigorous error and running time bounds. In addition to generalization to three dimensions, we highlight some algorithmic and mathematical opportunities for improved performance and stability. Lastly, we give numerical evidence supporting the accuracy, performance, and scalability of the algorithm through a series of experiments involving the Laplace and Helmholtz equations.
△ Less
Submitted 29 March, 2019; v1 submitted 15 May, 2018;
originally announced May 2018.
-
High-order Finite Element--Integral Equation Coupling on Embedded Meshes
Authors:
Natalie N. Beams,
Andreas Klöckner,
Luke N. Olson
Abstract:
This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE…
▽ More
This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE) formulations for interior, exterior, and interface problems. The treatments of the exterior and interface problems are new. The resulting linear systems are solved through an iterative approach exploiting the second-kind nature of the IE operator combined with algebraic multigrid preconditioning for the FE part. Assuming smooth continuations of coefficients and right-hand-side data, we show error analysis supporting high-order accuracy. Numerical evidence further supports our claims of efficiency and high-order accuracy for smooth data.
△ Less
Submitted 16 August, 2018; v1 submitted 8 April, 2018;
originally announced April 2018.
-
A Fast Algorithm with Error Bounds for Quadrature by Expansion
Authors:
Matt Wala,
Andreas Klöckner
Abstract:
Quadrature by Expansion (QBX) is a quadrature method for approximating the value of the singular integrals encountered in the evaluation of layer potentials. It exploits the smoothness of the layer potential by forming locally-valid expansion which are then evaluated to compute the near or on-surface value of the integral. Recent work towards coupling of a Fast Multipole Method (FMM) to QBX yielde…
▽ More
Quadrature by Expansion (QBX) is a quadrature method for approximating the value of the singular integrals encountered in the evaluation of layer potentials. It exploits the smoothness of the layer potential by forming locally-valid expansion which are then evaluated to compute the near or on-surface value of the integral. Recent work towards coupling of a Fast Multipole Method (FMM) to QBX yielded a first step towards the rapid evaluation of such integrals (and the solution of related integral equations), albeit with only empirically understood error behavior. In this paper, we improve upon this approach with a modified algorithm for which we give a comprehensive analysis of error and cost in the case of the Laplace equation in two dimensions. For the same levels of (user-specified) accuracy, the new algorithm empirically has cost-per-accuracy comparable to prior approaches. We provide experimental results to demonstrate scalability and numerical accuracy.
△ Less
Submitted 4 March, 2020; v1 submitted 12 January, 2018;
originally announced January 2018.
-
Array Program Transformation with Loo.py by Example: High-Order Finite Elements
Authors:
Andreas Klöckner,
Lucas C. Wilcox,
T. Warburton
Abstract:
To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, paralleliz…
▽ More
To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Fast algorithms for Quadrature by Expansion I: Globally valid expansions
Authors:
Manas Rachh,
Andreas Klöckner,
Michael O'Neil
Abstract:
The use of integral equation methods for the efficient numerical solution of PDE boundary value problems requires two main tools: quadrature rules for the evaluation of layer potential integral operators with singular kernels, and fast algorithms for solving the resulting dense linear systems. Classically, these tools were developed separately. In this work, we present a unified numerical scheme b…
▽ More
The use of integral equation methods for the efficient numerical solution of PDE boundary value problems requires two main tools: quadrature rules for the evaluation of layer potential integral operators with singular kernels, and fast algorithms for solving the resulting dense linear systems. Classically, these tools were developed separately. In this work, we present a unified numerical scheme based on coupling Quadrature by Expansion, a recent quadrature method, to a customized Fast Multipole Method (FMM) for the Helmholtz equation in two dimensions. The method allows the evaluation of layer potentials in linear-time complexity, anywhere in space, with a uniform, user-chosen level of accuracy as a black-box computational method.
Providing this capability requires geometric and algorithmic considerations beyond the needs of standard FMMs as well as careful consideration of the accuracy of multipole translations. We illustrate the speed and accuracy of our method with various numerical examples.
Keywords: Layer Potentials; Singular Integrals; Quadrature; High-order accuracy; Integral equations; Helmholtz equation; Fast multipole method.
△ Less
Submitted 21 February, 2017; v1 submitted 17 February, 2016;
originally announced February 2016.
-
Conformal Map** via a Density Correspondence for the Double-Layer Potential
Authors:
Matt Wala,
Andreas Klöckner
Abstract:
We derive a representation formula for harmonic polynomials and Laurent polynomials in terms of densities of the double-layer potential on bounded piecewise smooth and simply connected domains. From this result, we obtain a method for the numerical computation of conformal maps that applies to both exterior and interior regions. We present analysis and numerical experiments supporting the accuracy…
▽ More
We derive a representation formula for harmonic polynomials and Laurent polynomials in terms of densities of the double-layer potential on bounded piecewise smooth and simply connected domains. From this result, we obtain a method for the numerical computation of conformal maps that applies to both exterior and interior regions. We present analysis and numerical experiments supporting the accuracy and broad applicability of the method.
△ Less
Submitted 2 November, 2018; v1 submitted 15 February, 2016;
originally announced February 2016.
-
Loo.py: transformation-based code generation for GPUs and CPUs
Authors:
Andreas Klöckner
Abstract:
Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by defin…
▽ More
Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form.
△ Less
Submitted 29 May, 2014;
originally announced May 2014.
-
Visualizing Skin Effects in Conductors with MRI: ${}^7$Li MRI Experiments and Calculations
Authors:
Andrew J. Ilott,
S. Chandrashekar,
Andreas Klöckner,
Hee Jung Chang,
Nicole M. Trease,
Clare P. Grey,
Leslie Greengard,
Alexej Jerschow
Abstract:
While experiments on metals have been performed since the early days of NMR (and DNP), the use of bulk metal is normally avoided. Instead, often powders have been used in combination with low fields, so that skin depth effects could be neglected. Another complicating factor of acquiring NMR spectra or MRI images of bulk metal is the strong signal dependence on the orientation between the sample an…
▽ More
While experiments on metals have been performed since the early days of NMR (and DNP), the use of bulk metal is normally avoided. Instead, often powders have been used in combination with low fields, so that skin depth effects could be neglected. Another complicating factor of acquiring NMR spectra or MRI images of bulk metal is the strong signal dependence on the orientation between the sample and the radio frequency (RF) coil, leading to non-intuitive image distortions and inaccurate quantification. Such factors are particularly important for NMR and MRI of batteries and other electrochemical devices. Here, we show results from a systematic study combining RF field calculations with experimental MRI of $^7$Li metal to visualize skin depth effects directly and to analyze the RF field orientation effect on MRI of bulk metal. It is shown that a certain degree of selectivity can be achieved for particular faces of the metal, simply based on the orientation of the sample. By combining RF field calculations with bulk magnetic susceptibility calculations accurate NMR spectra can be obtained from first principles. Such analyses will become valuable in many applications involving battery systems, but also metals, in general.
△ Less
Submitted 24 March, 2014;
originally announced March 2014.
-
Solving Wave Equations on Unstructured Geometries
Authors:
Andreas Klöckner,
Timothy Warburton,
Jan S. Hesthaven
Abstract:
Waves are all around us--be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit--its computational speed and its accuracy. Discontinuous Galerkin (DG) methods fall on the high-accuracy en…
▽ More
Waves are all around us--be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit--its computational speed and its accuracy. Discontinuous Galerkin (DG) methods fall on the high-accuracy end of this spectrum. Fortuitously, their computational structure is so ideally suited to GPUs that they also achieve very high computational speeds. In other words, the use of DG methods on GPUs significantly lowers the cost of obtaining accurate solutions. This article aims to give the reader an easy on-ramp to the use of this technology, based on a sample implementation which demonstrates a highly accurate, GPU-capable, real-time visualizing finite element solver in about 1500 lines of code.
△ Less
Submitted 19 April, 2013;
originally announced April 2013.
-
On the convergence of local expansions of layer potentials
Authors:
Charles L. Epstein,
Leslie Greengard,
Andreas Klöckner
Abstract:
In a recently developed quadrature method (quadrature by expansion or QBX), it was demonstrated that weakly singular or singular layer potentials can be evaluated rapidly and accurately on surface by making use of local expansions about carefully chosen off-surface points. In this paper, we derive estimates for the rate of convergence of these local expansions, providing the analytic foundation fo…
▽ More
In a recently developed quadrature method (quadrature by expansion or QBX), it was demonstrated that weakly singular or singular layer potentials can be evaluated rapidly and accurately on surface by making use of local expansions about carefully chosen off-surface points. In this paper, we derive estimates for the rate of convergence of these local expansions, providing the analytic foundation for the QBX method. The estimates may also be of mathematical interest, particularly for microlocal or asymptotic analysis in potential theory.
△ Less
Submitted 22 April, 2013; v1 submitted 16 December, 2012;
originally announced December 2012.
-
High-Order Discontinuous Galerkin Methods by GPU Metaprogramming
Authors:
Andreas Klöckner,
Timothy Warburton,
Jan S. Hesthaven
Abstract:
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively p…
▽ More
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science.
△ Less
Submitted 2 November, 2012;
originally announced November 2012.
-
Quadrature by Expansion: A New Method for the Evaluation of Layer Potentials
Authors:
Andreas Klöckner,
Alexander Barnett,
Leslie Greengard,
Michael O'Neil
Abstract:
Integral equation methods for the solution of partial differential equations, when coupled with suitable fast algorithms, yield geometrically flexible, asymptotically optimal and well-conditioned schemes in either interior or exterior domains. The practical application of these methods, however, requires the accurate evaluation of boundary integrals with singular, weakly singular or nearly singula…
▽ More
Integral equation methods for the solution of partial differential equations, when coupled with suitable fast algorithms, yield geometrically flexible, asymptotically optimal and well-conditioned schemes in either interior or exterior domains. The practical application of these methods, however, requires the accurate evaluation of boundary integrals with singular, weakly singular or nearly singular kernels. Historically, these issues have been handled either by low-order product integration rules (computed semi-analytically), by singularity subtraction/cancellation, by kernel regularization and asymptotic analysis, or by the construction of special purpose "generalized Gaussian quadrature" rules. In this paper, we present a systematic, high-order approach that works for any singularity (including hypersingular kernels), based only on the assumption that the field induced by the integral operator is locally smooth when restricted to either the interior or the exterior. Discontinuities in the field across the boundary are permitted. The scheme, denoted QBX (quadrature by expansion), is easy to implement and compatible with fast hierarchical algorithms such as the fast multipole method. We include accuracy tests for a variety of integral operators in two dimensions on smooth and corner domains.
△ Less
Submitted 17 March, 2013; v1 submitted 18 July, 2012;
originally announced July 2012.
-
A consistency condition for the vector potential in multiply-connected domains
Authors:
Charles L. Epstein,
Zydrunas Gimbutas,
Leslie Greengard,
Andreas Klöckner,
Michael O'Neil
Abstract:
A classical problem in electromagnetics concerns the representation of the electric and magnetic fields in the low-frequency or static regime, where topology plays a fundamental role. For multiply connected conductors, at zero frequency the standard boundary conditions on the tangential components of the magnetic field do not uniquely determine the vector potential. We describe a (gauge-invariant)…
▽ More
A classical problem in electromagnetics concerns the representation of the electric and magnetic fields in the low-frequency or static regime, where topology plays a fundamental role. For multiply connected conductors, at zero frequency the standard boundary conditions on the tangential components of the magnetic field do not uniquely determine the vector potential. We describe a (gauge-invariant) consistency condition that overcomes this non-uniqueness and resolves a longstanding difficulty in inverting the magnetic field integral equation.
△ Less
Submitted 18 March, 2012;
originally announced March 2012.
-
Viscous Shock Capturing in a Time-Explicit Discontinuous Galerkin Method
Authors:
Andreas Klöckner,
Tim Warburton,
Jan S. Hesthaven
Abstract:
We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on wo…
▽ More
We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on work by Persson and Peraire, we thoroughly justify the detector's design and analyze its performance on a number of benchmark problems. We further explain the scaling and smoothing steps necessary to turn the output of the detector into a local, artificial viscosity. We close by providing an extensive array of numerical tests of the detector in use.
△ Less
Submitted 18 March, 2011; v1 submitted 15 February, 2011;
originally announced February 2011.
-
Deterministic Numerical Schemes for the Boltzmann Equation
Authors:
Akil Narayan,
Andreas Klöckner
Abstract:
This article describes methods for the deterministic simulation of the collisional Boltzmann equation. It presumes that the transport and collision parts of the equation are to be simulated separately in the time domain. Time step** schemes to achieve the splitting as well as numerical methods for each part of the operator are reviewed, with an emphasis on clearly exposing the challenges posed…
▽ More
This article describes methods for the deterministic simulation of the collisional Boltzmann equation. It presumes that the transport and collision parts of the equation are to be simulated separately in the time domain. Time step** schemes to achieve the splitting as well as numerical methods for each part of the operator are reviewed, with an emphasis on clearly exposing the challenges posed by the equation as well as their resolution by various schemes.
△ Less
Submitted 18 November, 2009;
originally announced November 2009.
-
Nodal Discontinuous Galerkin Methods on Graphics Processors
Authors:
Andreas Klöckner,
Tim Warburton,
Jeffrey Bridge,
Jan S. Hesthaven
Abstract:
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in…
▽ More
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling.
The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG's high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG.
Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device's available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work.
In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method.
△ Less
Submitted 3 April, 2009; v1 submitted 8 January, 2009;
originally announced January 2009.