Search | arXiv e-print repository

A stable decoupled perfectly matched layer for the 3D wave equation using the nodal discontinuous Galerkin method

Authors: Sophia Julia Feriani, Matthias Cosnefroy, Allan Peter Engsig-Karup, Tim Warburton, Finnur Pind, Cheol-Ho Jeong

Abstract: In outdoor acoustics, the calculations of sound propagating in air can be computationally heavy if the domain is chosen large enough to fulfil the Sommerfeld radiation condition. By strategically truncating the computational domain with a efficient boundary treatment, the computational cost is lowered. One commonly used boundary treatment is the perfectly matched layer (PML) that dampens outgoing… ▽ More In outdoor acoustics, the calculations of sound propagating in air can be computationally heavy if the domain is chosen large enough to fulfil the Sommerfeld radiation condition. By strategically truncating the computational domain with a efficient boundary treatment, the computational cost is lowered. One commonly used boundary treatment is the perfectly matched layer (PML) that dampens outgoing waves without polluting the computed solution in the inner domain. The purpose of this study is to propose and assess a new perfectly matched layer formulation for the 3D acoustic wave equation, using the nodal discontinuous Galerkin finite element method. The formulation is based on an efficient PML formulation that can be decoupled to further increase the computational efficiency and guarantee stability without sacrificing accuracy. This decoupled PML formulation is demonstrated to be long-time stable and an optimization procedure of the dam** functions is proposed to enhance the performance of the formulation. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2305.10965 [pdf, other]

Stop** Criteria for the Conjugate Gradient Algorithm in High-Order Finite Element Methods

Authors: Yichen Guo, Eric de Sturler, Tim Warburton

Abstract: We introduce three new stop** criteria that balance algebraic and discretization errors for the conjugate gradient algorithm applied to high-order finite element discretizations of Poisson problems. The current state of the art stop** criteria compare a posteriori estimates of discretization error against estimates of the algebraic error. Firstly, we propose a new error indicator derived from… ▽ More We introduce three new stop** criteria that balance algebraic and discretization errors for the conjugate gradient algorithm applied to high-order finite element discretizations of Poisson problems. The current state of the art stop** criteria compare a posteriori estimates of discretization error against estimates of the algebraic error. Firstly, we propose a new error indicator derived from a recovery-based error estimator that is less computationally expensive and more reliable. Secondly, we introduce a new stop** criterion that suggests stop** when the norm of the linear residual is less than a small fraction of an error indicator derived directly from the residual. This indicator shares the same mesh size and polynomial degree scaling as the norm of the residual, resulting in a robust criterion regardless of the mesh size, the polynomial degree, and the shape regularity of the mesh. Thirdly, in solving Poisson problems with highly variable piecewise constant coefficients, we introduce a subdomain-based criterion that recommends stop** when the norm of the linear residual restricted to each subdomain is smaller than the corresponding indicator also restricted to that subdomain. Numerical experiments, including tests with anisotropic meshes and highly variable piecewise constant coefficients, demonstrate that the proposed criteria efficiently avoid both premature termination and over-solving. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: 22 pages, 11 figures

MSC Class: 65N30; 65N22; 65F10

arXiv:2203.10238 [pdf, other]

On the entropy projection and the robustness of high order entropy stable discontinuous Galerkin schemes for under-resolved flows

Authors: Jesse Chan, Hendrik Ranocha, Andres Rueda-Ramirez, Gregor Gassner, Tim Warburton

Abstract: High order entropy stable schemes provide improved robustness for computational simulations of fluid flows. However, additional stabilization and positivity preserving limiting can still be required for variable-density flows with under-resolved features. We demonstrate numerically that entropy stable DG methods which incorporate an "entropy projection" are less likely to require additional limiti… ▽ More High order entropy stable schemes provide improved robustness for computational simulations of fluid flows. However, additional stabilization and positivity preserving limiting can still be required for variable-density flows with under-resolved features. We demonstrate numerically that entropy stable DG methods which incorporate an "entropy projection" are less likely to require additional limiting to retain positivity for certain types of flows. We conclude by investigating potential explanations for this observed improvement in robustness. △ Less

Submitted 19 March, 2022; originally announced March 2022.

arXiv:2109.05072 [pdf, other]

GPU Algorithms for Efficient Exascale Discretizations

Authors: Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Tomov, Arturo Vargas , et al. (2 additional authors not shown)

Abstract: In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM,… ▽ More In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2109.04996 [pdf, other]

doi 10.1177/10943420211020803

Efficient Exascale Discretizations: High-Order Finite Element Methods

Authors: Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S. Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, Yu-Hsiang Lan, David Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W. Smith , et al. (5 additional authors not shown)

Abstract: Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on u… ▽ More Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially-assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: 22 pages, 18 figures

arXiv:2108.08381 [pdf, other]

A Local Discontinuous Galerkin Level Set Reinitialization with Subcell Stabilization on Unstructured Meshes

Authors: Ali Karakus, Noel Chalmers, Tim Warburton

Abstract: In this paper we consider a level set reinitialization technique based on a high-order, local discontinuous Galerkin method on unstructured triangular meshes. A finite volume based subcell stabilization is used to improve the nonlinear stability of the method. Instead of the standard hyperbolic level set reinitialization, the flow of time Eikonal equation is discretized to construct an approximate… ▽ More In this paper we consider a level set reinitialization technique based on a high-order, local discontinuous Galerkin method on unstructured triangular meshes. A finite volume based subcell stabilization is used to improve the nonlinear stability of the method. Instead of the standard hyperbolic level set reinitialization, the flow of time Eikonal equation is discretized to construct an approximate signed distance function. Using the Eikonal equation removes the regularization parameter in the standard approach which allows more predictable behavior and faster convergence speeds around the interface. This makes our approach very efficient especially for banded level set formulations. A set of numerical experiments including both smooth and non-smooth interfaces indicate that the method experimentally achieves design order accuracy. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: 19 pages, 10 figures

arXiv:2011.11089 [pdf, other]

Entropy stable modal discontinuous Galerkin schemes and wall boundary conditions for the compressible Navier-Stokes equations

Authors: Jesse Chan, Yimin Lin, Tim Warburton

Abstract: Entropy stable schemes ensure that physically meaningful numerical solutions also satisfy a semi-discrete entropy inequality under appropriate boundary conditions. In this work, we describe a discretization of viscous terms in the compressible Navier-Stokes equations which enables a simple and explicit imposition of entropy stable no-slip (adiabatic and isothermal) and reflective (symmetry) wall b… ▽ More Entropy stable schemes ensure that physically meaningful numerical solutions also satisfy a semi-discrete entropy inequality under appropriate boundary conditions. In this work, we describe a discretization of viscous terms in the compressible Navier-Stokes equations which enables a simple and explicit imposition of entropy stable no-slip (adiabatic and isothermal) and reflective (symmetry) wall boundary conditions for discontinuous Galerkin (DG) discretizations. Numerical results confirm the robustness and accuracy of the proposed approaches. △ Less

Submitted 22 November, 2020; originally announced November 2020.

arXiv:2009.10917 [pdf, ps, other]

Portable high-order finite element kernels I: Streaming Operations

Authors: Noel Chalmers, Tim Warburton

Abstract: This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propo… ▽ More This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propose a suite of new Benchmark Streaming tests to focus on the distinct streaming operations which must be performed. We implemented these new tests using the OCCA abstraction framework to demonstrate portability of these streaming operations on different GPU architectures, and propose a simple performance model for such kernels which can accurately capture data movement rates as well as kernel launch costs. △ Less

Submitted 22 September, 2020; originally announced September 2020.

arXiv:2009.10863 [pdf, other]

Initial Guesses for Sequences of Linear Systems in a GPU-Accelerated Incompressible Flow Solver

Authors: Anthony P. Austin, Noel Chalmers, Tim Warburton

Abstract: We consider several methods for generating initial guesses when iteratively solving sequences of linear systems, showing that they can be implemented efficiently in GPU-accelerated PDE solvers, specifically solvers for incompressible flow. We propose new initial guess methods based on stabilized polynomial extrapolation and compare them to the projection method of Fischer [15], showing that they a… ▽ More We consider several methods for generating initial guesses when iteratively solving sequences of linear systems, showing that they can be implemented efficiently in GPU-accelerated PDE solvers, specifically solvers for incompressible flow. We propose new initial guess methods based on stabilized polynomial extrapolation and compare them to the projection method of Fischer [15], showing that they are generally competitive with projection schemes despite requiring only half the storage and performing considerably less data movement and communication. Our implementations of these algorithms are freely available as part of the libParanumal collection of GPU-accelerated flow solvers. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: 28 pages, 5 figures

MSC Class: 65F10; 65M22

arXiv:1808.10481 [pdf, other]

Leapfrog time-step** for Hermite methods

Authors: Arturo Vargas, Thomas Hagstrom, Jesse Chan, Tim Warburton

Abstract: We introduce Hermite-leapfrog methods for first order wave systems. The new Hermite-leapfrog methods pair leapfrog time-step** with the Hermite methods of Goodrich and co-authors. The new schemes stagger field variables in both time and space and are high-order accurate. We provide a detailed description of the method and demonstrate that the method conserves variable quantities in one-space dim… ▽ More We introduce Hermite-leapfrog methods for first order wave systems. The new Hermite-leapfrog methods pair leapfrog time-step** with the Hermite methods of Goodrich and co-authors. The new schemes stagger field variables in both time and space and are high-order accurate. We provide a detailed description of the method and demonstrate that the method conserves variable quantities in one-space dimension. Higher dimensional versions of the method are constructed via a tensor product construction. Numerical evidence and rigorous analysis in one space dimension establish stability and high-order convergence. Experiments demonstrating efficient implementations on a graphics processing unit are also presented. △ Less

Submitted 30 August, 2018; originally announced August 2018.

Comments: Submitted to Journal of Scientific Computing

arXiv:1805.02082 [pdf, other]

doi 10.1016/j.jcp.2019.03.050

Discontinuous Galerkin Discretizations of the Boltzmann Equations in 2D: semi-analytic time step** and absorbing boundary layers

Authors: A. Karakus, N. Chalmers, J. S. Hesthaven, T. Warburton

Abstract: We present an efficient nodal discontinuous Galerkin method for approximating nearly incompressible flows using the Boltzmann equations. The equations are discretized with Hermite polynomials in velocity space yielding a first order conservation law. A stabilized unsplit perfectly matching layer (PML) formulation is introduced for the resulting nonlinear flow equations. The proposed PML equations… ▽ More We present an efficient nodal discontinuous Galerkin method for approximating nearly incompressible flows using the Boltzmann equations. The equations are discretized with Hermite polynomials in velocity space yielding a first order conservation law. A stabilized unsplit perfectly matching layer (PML) formulation is introduced for the resulting nonlinear flow equations. The proposed PML equations exponentially absorb the difference between the nonlinear fluctuation and the prescribed mean flow. We introduce semi-analytic time discretization methods to improve the time step restrictions in small relaxation times. We also introduce a multirate semi-analytic Adams-Bashforth method which preserves efficiency in stiff regimes. Accuracy and performance of the method are tested using distinct cases including isothermal vortex, flow around square cylinder, and wall mounted square cylinder test cases. △ Less

Submitted 5 May, 2018; originally announced May 2018.

Comments: 37 pages, 11 figures

arXiv:1804.02221 [pdf, other]

doi 10.1016/j.jcp.2018.08.038

An entropy stable discontinuous Galerkin method for the shallow water equations on curvilinear meshes with wet/dry fronts accelerated by GPUs

Authors: Niklas Wintermeyer, Andrew R. Winters, Gregor J. Gassner, Timothy Warburton

Abstract: We extend the entropy stable high order nodal discontinuous Galerkin spectral element approximation for the non-linear two dimensional shallow water equations presented by Wintermeyer et al. [N. Wintermeyer, A. R. Winters, G. J. Gassner, and D. A. Kopriva. An entropy stable nodal discontinuous Galerkin method for the two dimensional shallow water equations on unstructured curvilinear meshes with d… ▽ More We extend the entropy stable high order nodal discontinuous Galerkin spectral element approximation for the non-linear two dimensional shallow water equations presented by Wintermeyer et al. [N. Wintermeyer, A. R. Winters, G. J. Gassner, and D. A. Kopriva. An entropy stable nodal discontinuous Galerkin method for the two dimensional shallow water equations on unstructured curvilinear meshes with discontinuous bathymetry. Journal of Computational Physics, 340:200-242, 2017] with a shock capturing technique and a positivity preservation capability to handle dry areas. The scheme preserves the entropy inequality, is well-balanced and works on unstructured, possibly curved, quadrilateral meshes. For the shock capturing, we introduce an artificial viscosity to the equations and prove that the numerical scheme remains entropy stable. We add a positivity preserving limiter to guarantee non-negative water heights as long as the mean water height is non-negative. We prove that non-negative mean water heights are guaranteed under a certain additional time step restriction for the entropy stable numerical interface flux. We implement the method on GPU architectures using the abstract language OCCA, a unified approach to multi-threading languages. We show that the entropy stable scheme is well suited to GPUs as the necessary extra calculations do not negatively impact the runtime up to reasonably high polynomial degrees (around $N=7$). We provide numerical examples that challenge the shock capturing and positivity properties of our scheme to verify our theoretical findings. △ Less

Submitted 6 April, 2018; originally announced April 2018.

arXiv:1801.00246 [pdf, other]

A GPU Accelerated Discontinuous Galerkin Incompressible Flow Solver

Authors: Ali Karakus, Noel Chalmers, Kasia Swirydowicz, Timothy Warburton

Abstract: We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully G… ▽ More We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully GPU-accelerated multigrid preconditioner which is designed to minimize memory requirements and to increase overall performance. A semi-Lagrangian subcycling advection algorithm is used to shift the computational load per timestep away from the pressure Poisson solve by allowing larger timestep sizes in exchange for an increased number of advection steps. Numerical results confirm we achieve the design order accuracy in time and space. We optimize the performance of the most time-consuming kernels by tuning the fine-grain parallelism, memory utilization, and maximizing bandwidth. To assess overall performance we present an empirically calibrated roofline performance model for a target GPU to explain the achieved efficiency. We demonstrate that, in the most cases, the kernels used in the solver are close to their empirically predicted roofline performance. △ Less

Submitted 7 May, 2018; v1 submitted 31 December, 2017; originally announced January 2018.

Comments: 33 pages, 10 figures

arXiv:1711.00903 [pdf, other]

Acceleration of tensor-product operations for high-order finite element methods

Authors: Kasia Świrydowicz, Noel Chalmers, Ali Karakus, Timothy Warburton

Abstract: This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure… ▽ More This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure, and the need to store intermediate results inside the kernel. We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline. △ Less

Submitted 13 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: 31 pages, 11 figures

arXiv:1702.04316 [pdf, ps, other]

Acceleration of the Implicit-Explicit Non-hydrostatic Unified Model of the Atmosphere (NUMA) on Manycore Processors

Authors: Daniel S. Abdi, Francis X. Giraldo, Emil M. Constantinescu, Lester E. Carr III, Lucas C. Wilcox, Timothy C. Warburton

Abstract: We present the acceleration of an IMplicit-EXplicit (IMEX) non-hydrostatic atmospheric model on manycore processors such as GPUs and Intel's MIC architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of I… ▽ More We present the acceleration of an IMplicit-EXplicit (IMEX) non-hydrostatic atmospheric model on manycore processors such as GPUs and Intel's MIC architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C=15 , we obtained a speedup of about 4X relative to an explicit time step** method run with the maximum allowable C=1. In addition, we demonstrate a much larger speedup of 100X at C=150 using 1D-IMEX due to the unconditional stability of the method in the vertical direction. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement; b) formulating a horizontally-explicit vertically-implicit (HEVI) 1D-IMEX scheme that has a lower workload and potentially better scalability than 3D-IMEX; c) using high-order polynomial preconditioners to reduce the condition number of the resulting system; d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Finally, we validate our results with standard benchmark problems in NWP and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors. △ Less

Submitted 13 February, 2017; originally announced February 2017.

arXiv:1611.00102 [pdf, other]

On the penalty stabilization mechanism for upwind discontinuous Galerkin formulations of first order hyperbolic systems

Authors: Jesse Chan, T. Warburton

Abstract: Penalty fluxes are dissipative numerical fluxes for high order discontinuous Galerkin (DG) methods which depend on a penalization parameter. We investigate the dependence of the spectra of high order DG discretizations on this parameter, and show that as its value increases, the spectra of the DG discretization splits into two disjoint sets of eigenvalues. One set converges to the eigenvalues of a… ▽ More Penalty fluxes are dissipative numerical fluxes for high order discontinuous Galerkin (DG) methods which depend on a penalization parameter. We investigate the dependence of the spectra of high order DG discretizations on this parameter, and show that as its value increases, the spectra of the DG discretization splits into two disjoint sets of eigenvalues. One set converges to the eigenvalues of a conforming discretization, while the other set corresponds to spurious eigenvalues which are damped proportionally to the parameter. Numerical experiments also demonstrate that undamped spurious modes present in both in the limit of zero and large penalization parameters are damped for moderate values of the upwind parameter. △ Less

Submitted 20 October, 2017; v1 submitted 31 October, 2016; originally announced November 2016.

Comments: In CAMWA

arXiv:1609.09841 [pdf, ps, other]

GPU Acceleration of Hermite Methods for the Simulation of Wave Propagation

Authors: Arturo Vargas, Jesse Chan, Thomas Hagstrom, Timothy Warburton

Abstract: The Hermite methods of Goodrich, Hagstrom, and Lorenz (2006) use Hermite interpolation to construct high order numerical methods for hyperbolic initial value problems. The structure of the method has several favorable features for parallel computing. In this work, we propose algorithms that take advantage of the many-core architecture of Graphics Processing Units. The algorithm exploits the compac… ▽ More The Hermite methods of Goodrich, Hagstrom, and Lorenz (2006) use Hermite interpolation to construct high order numerical methods for hyperbolic initial value problems. The structure of the method has several favorable features for parallel computing. In this work, we propose algorithms that take advantage of the many-core architecture of Graphics Processing Units. The algorithm exploits the compact stencil of Hermite methods and uses data structures that allow for efficient data load and stores. Additionally the highly localized evolution operator of Hermite methods allows us to combine multi-stage time-step** methods within the new algorithms incurring minimal accesses of global memory. Using a scalar linear wave equation, we study the algorithm by considering Hermite interpolation and evolution as individual kernels and alternatively combined them into a monolithic kernel. For both approaches we demonstrate strategies to increase performance. Our numerical experiments show that although a two kernel approach allows for better performance on the hardware, a monolithic kernel can offer a comparable time to solution with less global memory usage. △ Less

Submitted 30 September, 2016; originally announced September 2016.

Comments: 12 pages. Submitted to ICOSAHOM 2016 proceedings

arXiv:1608.03836 [pdf, other]

Weight-adjusted discontinuous Galerkin methods: curvilinear meshes

Authors: Jesse Chan, Russell J. Hewett, T. Warburton

Abstract: Traditional time-domain discontinuous Galerkin (DG) methods result in large storage costs at high orders of approximation due to the storage of dense elemental matrices. In this work, we propose a weight-adjusted DG (WADG) methods for curvilinear meshes which reduce storage costs while retaining energy stability. A priori error estimates show that high order accuracy is preserved under sufficient… ▽ More Traditional time-domain discontinuous Galerkin (DG) methods result in large storage costs at high orders of approximation due to the storage of dense elemental matrices. In this work, we propose a weight-adjusted DG (WADG) methods for curvilinear meshes which reduce storage costs while retaining energy stability. A priori error estimates show that high order accuracy is preserved under sufficient conditions on the mesh, which are illustrated through convergence tests with different sequences of meshes. Numerical and computational experiments verify the accuracy and performance of WADG for a model problem on curved domains. △ Less

Submitted 12 August, 2016; originally announced August 2016.

Comments: Submitted to SISC

arXiv:1608.01944 [pdf, other]

Weight-adjusted discontinuous Galerkin methods: wave propagation in heterogeneous media

Authors: Jesse Chan, Russell J. Hewett, T. Warburton

Abstract: Time-domain discontinuous Galerkin (DG) methods for wave propagation require accounting for the inversion of dense elemental mass matrices, where each mass matrix is computed with respect to a parameter-weighted L2 inner product. In applications where the wavespeed varies spatially at a sub-element scale, these matrices are distinct over each element, necessitating additional storage. In this work… ▽ More Time-domain discontinuous Galerkin (DG) methods for wave propagation require accounting for the inversion of dense elemental mass matrices, where each mass matrix is computed with respect to a parameter-weighted L2 inner product. In applications where the wavespeed varies spatially at a sub-element scale, these matrices are distinct over each element, necessitating additional storage. In this work, we propose a weight-adjusted DG (WADG) method which reduces storage costs by replacing the weighted L2 inner product with a weight-adjusted inner product. This equivalent inner product results in an energy stable method, but does not increase storage costs for locally varying weights. A-priori error estimates are derived, and numerical examples are given illustrating the application of this method to the acoustic wave equation with heterogeneous wavespeed. △ Less

Submitted 1 January, 2017; v1 submitted 5 August, 2016; originally announced August 2016.

Comments: Submitted to SISC

arXiv:1607.03399 [pdf, other]

Reduced storage nodal discontinuous Galerkin methods on semi-structured prismatic meshes

Authors: Jesse Chan, Zheng Wang, Russell J. Hewett, T. Warburton

Abstract: We present a high order time-domain nodal discontinuous Galerkin method for wave problems on hybrid meshes consisting of both wedge and tetrahedral elements. We allow for vertically mapped wedges which can be deformed along the extruded coordinate, and present a simple method for producing quasi-uniform wedge meshes for layered domains. We show that standard mass lum** techniques result in a los… ▽ More We present a high order time-domain nodal discontinuous Galerkin method for wave problems on hybrid meshes consisting of both wedge and tetrahedral elements. We allow for vertically mapped wedges which can be deformed along the extruded coordinate, and present a simple method for producing quasi-uniform wedge meshes for layered domains. We show that standard mass lum** techniques result in a loss of energy stability on meshes of vertically mapped wedges, and propose an alternative which is both energy stable and efficient. High order convergence is demonstrated, and comparisons are made with existing low-storage methods on wedges. Finally, the computational performance of the method on Graphics Processing Units is evaluated. △ Less

Submitted 31 October, 2016; v1 submitted 12 July, 2016; originally announced July 2016.

Comments: Submitted to CAMWA

arXiv:1604.08501 [pdf, ps, other]

doi 10.1145/2935323.2935325

Array Program Transformation with Loo.py by Example: High-Order Finite Elements

Authors: Andreas Klöckner, Lucas C. Wilcox, T. Warburton

Abstract: To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, paralleliz… ▽ More To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations. △ Less

Submitted 13 April, 2016; originally announced April 2016.

ACM Class: D.3.4; D.1.3; G.4

Journal ref: ARRAY 2016 Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming Pages 9-16

arXiv:1512.06025 [pdf, other]

GPU-accelerated Bernstein-Bezier discontinuous Galerkin methods for wave problems

Authors: Jesse Chan, T. Warburton

Abstract: We evaluate the computational performance of the Bernstein-Bezier basis for discontinuous Galerkin (DG) discretizations and show how to exploit properties of derivative and lift operators specific to Bernstein polynomials for an optimal complexity quadrature-free evaluation of the DG formulation. Issues of efficiency and numerical stability are discussed in the context of a model wave propagation… ▽ More We evaluate the computational performance of the Bernstein-Bezier basis for discontinuous Galerkin (DG) discretizations and show how to exploit properties of derivative and lift operators specific to Bernstein polynomials for an optimal complexity quadrature-free evaluation of the DG formulation. Issues of efficiency and numerical stability are discussed in the context of a model wave propagation problem. We compare the performance of Bernstein-Bezier kernels to both a straightforward and a block-partitioned implementation of nodal DG kernels in a time-explicit GPU-accelerated DG solver. Computational experiments confirm the advantage of Bernstein-Bezier DG kernels over both straightforward and block-partitioned nodal DG kernels at high orders of approximation. △ Less

Submitted 20 August, 2016; v1 submitted 18 December, 2015; originally announced December 2015.

arXiv:1509.08012 [pdf, other]

Variations on Hermite methods for wave propagation

Authors: Arturo Vargas, Jesse Chan, Thomas Hagstrom, Tim Warburton

Abstract: Hermite methods, as introduced by Goodrich et al., combine Hermite interpolation and staggered (dual) grids to produce stable high order accurate schemes for the solution of hyperbolic PDEs. We introduce three variations of this Hermite method which do not involve time evolution on dual grids. Computational evidence is presented regarding stability, high order convergence, and dispersion/dissipati… ▽ More Hermite methods, as introduced by Goodrich et al., combine Hermite interpolation and staggered (dual) grids to produce stable high order accurate schemes for the solution of hyperbolic PDEs. We introduce three variations of this Hermite method which do not involve time evolution on dual grids. Computational evidence is presented regarding stability, high order convergence, and dispersion/dissipation properties for each new method. Hermite methods may also be coupled to discontinuous Galerkin (DG) methods for additional geometric flexibility. An example illustrates the simplification of this coupling of this coupling for the Hermite methods. △ Less

Submitted 26 September, 2015; originally announced September 2015.

Comments: Submitted to CICP

arXiv:1508.05609 [pdf, other]

A short note on a Bernstein-Bezier basis for the pyramid

Authors: Jesse Chan, T. Warburton

Abstract: We introduce a Bernstein-Bezier basis for the pyramid, whose restriction to the face reduces to the Bernstein-Bezier basis on the triangle or quadrilateral. The basis satisfies the standard positivity and partition of unity properties common to Bernstein polynomials, and spans the same space as non-polynomial pyramid bases in the literature. We introduce a Bernstein-Bezier basis for the pyramid, whose restriction to the face reduces to the Bernstein-Bezier basis on the triangle or quadrilateral. The basis satisfies the standard positivity and partition of unity properties common to Bernstein polynomials, and spans the same space as non-polynomial pyramid bases in the literature. △ Less

Submitted 23 August, 2015; originally announced August 2015.

Comments: Submitted

arXiv:1507.02557 [pdf, other]

doi 10.1016/j.jcp.2016.04.003

GPU-accelerated discontinuous Galerkin methods on hybrid meshes

Authors: Jesse Chan, Zheng Wang, Axel Modave, Jean-Francois Remacle, T. Warburton

Abstract: We present a time-explicit discontinuous Galerkin (DG) solver for the time-domain acoustic wave equation on hybrid meshes containing vertex-mapped hexahedral, wedge, pyramidal and tetrahedral elements. Discretely energy-stable formulations are presented for both Gauss-Legendre and Gauss-Legendre-Lobatto (Spectral Element) nodal bases for the hexahedron. Stable timestep restrictions for hybrid mesh… ▽ More We present a time-explicit discontinuous Galerkin (DG) solver for the time-domain acoustic wave equation on hybrid meshes containing vertex-mapped hexahedral, wedge, pyramidal and tetrahedral elements. Discretely energy-stable formulations are presented for both Gauss-Legendre and Gauss-Legendre-Lobatto (Spectral Element) nodal bases for the hexahedron. Stable timestep restrictions for hybrid meshes are derived by bounding the spectral radius of the DG operator using order-dependent constants in trace and Markov inequalities. Computational efficiency is achieved under a combination of element-specific kernels (including new quadrature-free operators for the pyramid), multi-rate timestep**, and acceleration using Graphics Processing Units. △ Less

Submitted 9 July, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

Comments: Submitted to CMAME

arXiv:1506.05996 [pdf, other]

doi 10.1016/j.jcp.2016.08.005

GPU accelerated spectral finite elements on all-hex meshes

Authors: J. -F. Remacle, R. Gandham, T. Warburton

Abstract: This paper presents a spectral element finite element scheme that efficiently solves elliptic problems on unstructured hexahedral meshes. The discrete equations are solved using a matrix-free preconditioned conjugate gradient algorithm. An additive Schwartz two-scale preconditioner is employed that allows h-independence convergence. An extensible multi-threading programming API is used as a common… ▽ More This paper presents a spectral element finite element scheme that efficiently solves elliptic problems on unstructured hexahedral meshes. The discrete equations are solved using a matrix-free preconditioned conjugate gradient algorithm. An additive Schwartz two-scale preconditioner is employed that allows h-independence convergence. An extensible multi-threading programming API is used as a common kernel language that allows runtime selection of different computing devices (GPU and CPU) and different threading interfaces (CUDA, OpenCL and OpenMP). Performance tests demonstrate that problems with over 50 million degrees of freedom can be solved in a few seconds on an off-the-shelf GPU. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 23 pages, 7 figures

MSC Class: 65Y05; 65Y10; 65Y20

arXiv:1502.07703 [pdf, other]

Orthogonal bases for vertex-mapped pyramids

Authors: Jesse Chan, T. Warburton

Abstract: Discontinuous Galerkin (DG) methods discretized under the method of lines must handle the inverse of a block diagonal mass matrix at each time step. Efficient implementations of the DG method hinge upon inexpensive and low-memory techniques for the inversion of each dense mass matrix block. We propose an efficient time-explicit DG method on meshes of pyramidal elements based on the construction of… ▽ More Discontinuous Galerkin (DG) methods discretized under the method of lines must handle the inverse of a block diagonal mass matrix at each time step. Efficient implementations of the DG method hinge upon inexpensive and low-memory techniques for the inversion of each dense mass matrix block. We propose an efficient time-explicit DG method on meshes of pyramidal elements based on the construction of a semi-nodal high order basis, which is orthogonal for a class of transformations of the reference pyramid, despite the non-affine nature of the map**. We give numerical results confirming both expected convergence rates and discuss efficiency of DG methods under such a basis. △ Less

Submitted 5 March, 2015; v1 submitted 26 February, 2015; originally announced February 2015.

Comments: Submitted to SIAM:SISC

arXiv:1501.02900 [pdf, ps, other]

doi 10.4208/cicp.191114.140715a

Patch-recovery filters for curvature in discontinuous Galerkin-based level-set methods

Authors: Florian Kummer, Tim Warburton

Abstract: In two-phase flow simulations, a difficult issue is usually the treatment of surface tension effects. These cause a pressure jump that is proportional to the curvature of the interface separating the two fluids. Since the evaluation of the curvature incorporates second derivatives, it is prone to numerical instabilities. Within this work, the interface is described by a level-set method based on a… ▽ More In two-phase flow simulations, a difficult issue is usually the treatment of surface tension effects. These cause a pressure jump that is proportional to the curvature of the interface separating the two fluids. Since the evaluation of the curvature incorporates second derivatives, it is prone to numerical instabilities. Within this work, the interface is described by a level-set method based on a discontinuous Galerkin discretization. In order to stabilize the evaluation of the curvature, a patch-recovery operation is employed. There are numerous ways in which this filtering operation can be applied in the whole process of curvature computation. Therefore, an extensive numerical study is performed to identify optimal settings for the patch-recovery operations with respect to computational cost and accuracy. △ Less

Submitted 13 January, 2015; originally announced January 2015.

Comments: 25 pages, 8 figures, submitted to Communications in Computational Physics

arXiv:1412.4138 [pdf, other]

A Comparison of High Order Interpolation Nodes for the Pyramid

Authors: Jesse Chan, T. Warburton

Abstract: The use of pyramid elements is crucial to the construction of efficient hex-dominant meshes. For conforming nodal finite element methods with mixed element types, it is advantageous for nodal distributions on the faces of the pyramid to match those on the faces and edges of hexahedra and tetrahedra. We adapt existing procedures for constructing optimized tetrahedral nodal sets for high order inter… ▽ More The use of pyramid elements is crucial to the construction of efficient hex-dominant meshes. For conforming nodal finite element methods with mixed element types, it is advantageous for nodal distributions on the faces of the pyramid to match those on the faces and edges of hexahedra and tetrahedra. We adapt existing procedures for constructing optimized tetrahedral nodal sets for high order interpolation to the pyramid with constrained face nodes, including two generalizations of the explicit Warp and Blend construction of nodes on the tetrahedron. △ Less

Submitted 15 December, 2014; v1 submitted 12 December, 2014; originally announced December 2014.

Comments: Submitted to SIAM:SISC

arXiv:1410.1387 [pdf, other]

High-Order Finite-differences on multi-threaded architectures using OCCA

Authors: David S. Medina, Amik St-Cyr, Timothy Warburton

Abstract: High-order finite-difference methods are commonly used in wave propagators for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. Thread parallel algorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the… ▽ More High-order finite-difference methods are commonly used in wave propagators for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. Thread parallel algorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the use of the \OCCA runtime programming interface. Finally, performance results are shown for various architectures on a representative synthetic test case. △ Less

Submitted 2 October, 2014; originally announced October 2014.

Comments: ICOSAHOM 2014 conference paper, 9 pages, 2 figures, 3 tables

arXiv:1405.1957 [pdf, other]

Residual based adaptivity and PWDG methods for the Helmholtz equation

Authors: Shelvean Kapita, Peter Monk, T. Warburton

Abstract: We present a study of two residual a posteriori error indicators for the Plane Wave Discontinuous Galerkin (PWDG) method for the Helmholtz equation. In particular we study the h-version of PWDG in which the number of plane wave directions per element is kept fixed. First we use a slight modification of the appropriate a priori analysis to determine a residual indicator. Numerical tests show that t… ▽ More We present a study of two residual a posteriori error indicators for the Plane Wave Discontinuous Galerkin (PWDG) method for the Helmholtz equation. In particular we study the h-version of PWDG in which the number of plane wave directions per element is kept fixed. First we use a slight modification of the appropriate a priori analysis to determine a residual indicator. Numerical tests show that this is reliable but pessimistic in that the ratio between the true error and the indicator increases as the mesh is refined. We therefore introduce a new analysis based on the observation that sufficiently many plane waves can approximate piecewise linear functions as the mesh is refined. Numerical results demonstrate an improvement in the efficiency of the indicators. △ Less

Submitted 8 May, 2014; originally announced May 2014.

arXiv:1403.1661 [pdf, other]

doi 10.4208/cicp.070114.271114a

GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations

Authors: R Gandham, D S Medina, T Warburton

Abstract: We discuss the development, verification, and performance of a GPU accelerated discontinuous Galerkin method for the solutions of two dimensional nonlinear shallow water equations. The shallow water equations are hyperbolic partial differential equations and are widely used in the simulation of tsunami wave propagations. Our algorithms are tailored to take advantage of the single instruction multi… ▽ More We discuss the development, verification, and performance of a GPU accelerated discontinuous Galerkin method for the solutions of two dimensional nonlinear shallow water equations. The shallow water equations are hyperbolic partial differential equations and are widely used in the simulation of tsunami wave propagations. Our algorithms are tailored to take advantage of the single instruction multiple data (SIMD) architecture of graphic processing units. The time integration is accelerated by local time step** based on a multi-rate Adams-Bashforth scheme. A total variational bounded limiter is adopted for nonlinear stability of the numerical scheme. This limiter is coupled with a mass and momentum conserving positivity preserving limiter for the special treatment of a dry or partially wet element in the triangulation. Accuracy, robustness and performance are demonstrated with the aid of test cases. We compare the performance of the kernels expressed in a portable threading language OCCA, when cross compiled with OpenCL, CUDA, and OpenMP at runtime. △ Less

Submitted 7 March, 2014; originally announced March 2014.

Comments: 26 pages, 51 figures

arXiv:1304.5546 [pdf, other]

Solving Wave Equations on Unstructured Geometries

Authors: Andreas Klöckner, Timothy Warburton, Jan S. Hesthaven

Abstract: Waves are all around us--be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit--its computational speed and its accuracy. Discontinuous Galerkin (DG) methods fall on the high-accuracy en… ▽ More Waves are all around us--be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit--its computational speed and its accuracy. Discontinuous Galerkin (DG) methods fall on the high-accuracy end of this spectrum. Fortuitously, their computational structure is so ideally suited to GPUs that they also achieve very high computational speeds. In other words, the use of DG methods on GPUs significantly lowers the cost of obtaining accurate solutions. This article aims to give the reader an easy on-ramp to the use of this technology, based on a sample implementation which demonstrates a highly accurate, GPU-capable, real-time visualizing finite element solver in about 1500 lines of code. △ Less

Submitted 19 April, 2013; originally announced April 2013.

Comments: GPU Computing Gems, edited by Wen-mei Hwu, Elsevier (2011), ISBN 9780123859631, Chapter 18

arXiv:1211.0582 [pdf, other]

High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Authors: Andreas Klöckner, Timothy Warburton, Jan S. Hesthaven

Abstract: Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively p… ▽ More Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science. △ Less

Submitted 2 November, 2012; originally announced November 2012.

Comments: To appear as part of "GPU Solutions to Multi-scale Problems in Science and Engineering", http://books.google.com/books?vid=9783642164040

Journal ref: ISBN 9783642164040, Springer, 2012

arXiv:1102.3190 [pdf, other]

doi 10.1051/mmnp/20116303

Viscous Shock Capturing in a Time-Explicit Discontinuous Galerkin Method

Authors: Andreas Klöckner, Tim Warburton, Jan S. Hesthaven

Abstract: We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on wo… ▽ More We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on work by Persson and Peraire, we thoroughly justify the detector's design and analyze its performance on a number of benchmark problems. We further explain the scaling and smoothing steps necessary to turn the output of the detector into a local, artificial viscosity. We close by providing an extensive array of numerical tests of the detector in use. △ Less

Submitted 18 March, 2011; v1 submitted 15 February, 2011; originally announced February 2011.

Comments: 26 pages, 21 figures

Report number: Brown Scientific Computing Group Report 2010-24 MSC Class: 65N30; 65N35; 65N40; 35F61

arXiv:0901.1024 [pdf, other]

doi 10.1016/j.jcp.2009.06.041

Nodal Discontinuous Galerkin Methods on Graphics Processors

Authors: Andreas Klöckner, Tim Warburton, Jeffrey Bridge, Jan S. Hesthaven

Abstract: Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in… ▽ More Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG's high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device's available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work. In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method. △ Less

Submitted 3 April, 2009; v1 submitted 8 January, 2009; originally announced January 2009.

Comments: 33 pages, 12 figures, 4 tables

MSC Class: 65M60; 65Y05; 65Y10

Showing 1–36 of 36 results for author: Warburton, T