-
Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views
Authors:
Benjamin Brock,
Robert Cohn,
Suyash Bakshi,
Tuomas Karna,
Jeongnim Kim,
Mateusz Nowak,
Łukasz Ślusarczyk,
Kacper Stefanski,
Timothy G. Mattson
Abstract:
Data structures and algorithms are essential building blocks for programs, and \emph{distributed data structures}, which automatically partition data across multiple memory locales, are essential to writing high-level parallel programs. While many projects have designed and implemented C++ distributed data structures and algorithms, there has not been widespread adoption of an interoperable model…
▽ More
Data structures and algorithms are essential building blocks for programs, and \emph{distributed data structures}, which automatically partition data across multiple memory locales, are essential to writing high-level parallel programs. While many projects have designed and implemented C++ distributed data structures and algorithms, there has not been widespread adoption of an interoperable model allowing algorithms and data structures from different libraries to work together. This paper introduces distributed ranges, which is a model for building generic data structures, views, and algorithms. A distributed range extends a C++ range, which is an iterable sequence of values, with a concept of segmentation, thus exposing how the distributed range is partitioned over multiple memory locales. Distributed data structures provide this distributed range interface, which allows them to be used with a collection of generic algorithms implemented using the distributed range interface. The modular nature of the model allows for the straightforward implementation of \textit{distributed views}, which are lightweight objects that provide a lazily evaluated view of another range. Views can be composed together recursively and combined with algorithms to implement computational kernels using efficient, flexible, and high-level standard C++ primitives. We evaluate the distributed ranges model by implementing a set of standard concepts and views as well as two execution runtimes, a multi-node, MPI-based runtime and a single-process, multi-GPU runtime. We demonstrate that high-level algorithms implemented using generic, high-level distributed ranges can achieve performance competitive with highly-tuned, expert-written code.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Efficient optimization of a regional water elevation model with an automatically generated adjoint
Authors:
Tuomas Kärnä,
Joseph G. Wallwork,
Stephan C. Kramer
Abstract:
Calibration of unknown model parameters is a common task in many ocean model applications. We present an adjoint-based optimization of an unstructured mesh shallow water model for the Baltic Sea. Spatially varying bottom friction parameter is tuned to minimize the misfit with respect to tide gauge sea surface height (SSH) observations. A key benefit of adjoint-based optimization is that computatio…
▽ More
Calibration of unknown model parameters is a common task in many ocean model applications. We present an adjoint-based optimization of an unstructured mesh shallow water model for the Baltic Sea. Spatially varying bottom friction parameter is tuned to minimize the misfit with respect to tide gauge sea surface height (SSH) observations. A key benefit of adjoint-based optimization is that computational cost does not depend on the number of unknown variables. Adjoint models are, however, typically very laborious to implement. In this work, we leverage a domain specific language framework in which the discrete adjoint model can be obtained automatically. The adjoint model is both exactly compatible with the discrete forward model and computationally efficient. A gradient-based quasi-Newton method is used to minimize the misfit. Optimizing spatially-variable parameters is typically an under-determined problem and can lead to over-fitting. We employ Hessian-based regularization to penalize the spatial curvature of the friction field to overcome this problem. The SSH dynamics in the Baltic Sea are simulated for a 3-month period. Optimization of the bottom friction parameter results in significant improvement of the model performance. The results are especially encouraging in the complex Danish Straits region, highlighting the benefit of unstructured meshes. Domain specific language frameworks enable automated model analysis and provide easy access to adjoint modeling. Our application shows that this capability can be enabled with few efforts, and the optimization procedure is robust and computationally efficient.
△ Less
Submitted 6 October, 2023; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Discontinuous Galerkin discretization for two-equation turbulence closure model
Authors:
Tuomas Kärnä
Abstract:
Accurate representation of vertical turbulent fluxes is crucial for numerical ocean modelling, both in global and coastal applications. The state-of-the-art approach is to use two-equation turbulence closure models which introduces two dynamic equations to the system. Solving these equations numerically, however, is challenging due to the strict requirement of positivity of the turbulent quantitie…
▽ More
Accurate representation of vertical turbulent fluxes is crucial for numerical ocean modelling, both in global and coastal applications. The state-of-the-art approach is to use two-equation turbulence closure models which introduces two dynamic equations to the system. Solving these equations numerically, however, is challenging due to the strict requirement of positivity of the turbulent quantities (e.g., turbulence kinetic energy and its dissipation rate), and the non-linear source terms that may render the numerical system unstable. In this paper, we present a Discontinuous Galerkin (DG) finite element discretization of the Generic Length Scale (GLS) equations designed to be incorporated in a DG coastal ocean model, Thetis. To ensure numerical stability, the function space for turbulent quantities must be chosen carefully. In this work, we propose to use zeroth degree elements for the turbulent quantities and linear discontinuous elements for the tracers and velocity. The spatial discretization is completed with a positivity preserving semi-implicit time integration scheme. We validate the implementation with standard turbulence closure model benchmarks and an idealized estuary simulation. Finally, we use the full three-dimensional model to simulate the Columbia River plume. The results confirm that the coupled model generates realistic vertical mixing, and remains stable under strongly stratified conditions and strong tidal forcing. River plume characteristics are well captured.
△ Less
Submitted 30 March, 2020; v1 submitted 10 July, 2019;
originally announced July 2019.
-
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Authors:
Amrita Mathuriya,
Deborah Bard,
Peter Mendygral,
Lawrence Meadows,
James Arnemann,
Lei Shao,
Siyu He,
Tuomas Karna,
Daina Moise,
Simon J. Pennycook,
Kristyn Maschoff,
Jason Sewall,
Nalini Kumar,
Shirley Ho,
Mike Ringenburg,
Prabhat,
Victor Lee
Abstract:
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many el…
▽ More
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters $Ω_M$, $σ_8$ and n$_s$ with unprecedented accuracy.
△ Less
Submitted 9 November, 2018; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Thetis coastal ocean model: discontinuous Galerkin discretization for the three-dimensional hydrostatic equations
Authors:
Tuomas Kärnä,
Stephan C. Kramer,
Lawrence Mitchell,
David A. Ham,
Matthew D. Piggott,
António M. Baptista
Abstract:
Unstructured grid ocean models are advantageous for simulating the coastal ocean and river-estuary-plume systems. However, unstructured grid models tend to be diffusive and/or computationally expensive which limits their applicability to real life problems. In this paper, we describe a novel discontinuous Galerkin (DG) finite element discretization for the hydrostatic equations. The formulation is…
▽ More
Unstructured grid ocean models are advantageous for simulating the coastal ocean and river-estuary-plume systems. However, unstructured grid models tend to be diffusive and/or computationally expensive which limits their applicability to real life problems. In this paper, we describe a novel discontinuous Galerkin (DG) finite element discretization for the hydrostatic equations. The formulation is fully conservative and second-order accurate in space and time. Monotonicity of the advection scheme is ensured by using a strong stability preserving time integration method and slope limiters. Compared to previous DG models advantages include a more accurate mode splitting method, revised viscosity formulation, and new second-order time integration scheme. We demonstrate that the model is capable of simulating baroclinic flows in the eddying regime with a suite of test cases. Numerical dissipation is well-controlled, being comparable or lower than in existing state-of-the-art structured grid models.
△ Less
Submitted 18 October, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.