-
Nek5000/RS Performance on Advanced GPU Architectures
Authors:
Misun Min,
Yu-Hsiang Lan,
Paul Fischer,
Thilina Rathnayake,
John Holmen
Abstract:
We demonstrate NekRS performance results on various advanced GPU architectures. NekRS is a GPU-accelerated version of Nek5000 that targets high performance on exascale platforms. It is being developed in DOE's Center of Efficient Exascale Discretizations, which is one of the co-design centers under the Exascale Computing Project. In this paper, we consider Frontier, Crusher, Spock, Polaris, Perlmu…
▽ More
We demonstrate NekRS performance results on various advanced GPU architectures. NekRS is a GPU-accelerated version of Nek5000 that targets high performance on exascale platforms. It is being developed in DOE's Center of Efficient Exascale Discretizations, which is one of the co-design centers under the Exascale Computing Project. In this paper, we consider Frontier, Crusher, Spock, Polaris, Perlmutter, ThetaGPU, and Summit. Simulations are performed with 17x17 rod-bundle geometries from small modular reactor applications. We discuss strong-scaling performance and analysis.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
A Benchmark for Cycling Close Pass Near Miss Event Detection from Video Streams
Authors:
Mingjie Li,
Tharindu Rathnayake,
Ben Beck,
Lingheng Meng,
Zijue Chen,
Akansel Cosgun,
Xiaojun Chang,
Dana Kulić
Abstract:
Cycling is a healthy and sustainable mode of transport. However, interactions with motor vehicles remain a key barrier to increased cycling participation. The ability to detect potentially dangerous interactions from on-bike sensing could provide important information to riders and policy makers. Thus, automated detection of conflict between cyclists and drivers has attracted researchers from both…
▽ More
Cycling is a healthy and sustainable mode of transport. However, interactions with motor vehicles remain a key barrier to increased cycling participation. The ability to detect potentially dangerous interactions from on-bike sensing could provide important information to riders and policy makers. Thus, automated detection of conflict between cyclists and drivers has attracted researchers from both computer vision and road safety communities. In this paper, we introduce a novel benchmark, called Cyc-CP, towards cycling close pass near miss event detection from video streams. We first divide this task into scene-level and instance-level problems. Scene-level detection asks an algorithm to predict whether there is a close pass near miss event in the input video clip. Instance-level detection aims to detect which vehicle in the scene gives rise to a close pass near miss. We propose two benchmark models based on deep learning techniques for these two problems. For training and testing those models, we construct a synthetic dataset and also collect a real-world dataset. Our models can achieve 88.13% and 84.60% accuracy on the real-world dataset, respectively. We envision this benchmark as a test-bed to accelerate cycling close pass near miss detection and facilitate interaction between the fields of road safety, intelligent transportation systems and artificial intelligence. Both the benchmark datasets and detection models will be available at https://github.com/SustainableMobility/cyc-cp to facilitate experimental reproducibility and encourage more in-depth research in the field.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Highly Optimized Full-Core Reactor Simulations on Summit
Authors:
Paul Fischer,
Elia Merzari,
Misun Min,
Stefan Kerkemeier,
Yu-Hsiang Lan,
Malachi Phillips,
Thilina Rathnayake,
April Novak,
Derek Gaston,
Noel Chalmers,
Tim Warburton
Abstract:
Nek5000/RS is a highly-performant open-source spectral element code for simulation of incompressible and low-Mach fluid flow, heat transfer, and combustion with a particular focus on turbulent flows in complex domains. It is based on high-order discretizations that realize the same (or lower) cost per gridpoint as traditional low-order methods. State-of-the-art multilevel preconditioners, efficien…
▽ More
Nek5000/RS is a highly-performant open-source spectral element code for simulation of incompressible and low-Mach fluid flow, heat transfer, and combustion with a particular focus on turbulent flows in complex domains. It is based on high-order discretizations that realize the same (or lower) cost per gridpoint as traditional low-order methods. State-of-the-art multilevel preconditioners, efficient high-order time-splitting methods, and runtime-adaptive communication strategies are built on a fast OCCA-based kernel library, libParanumal, to provide scalability and portability across the spectrum of current and future high-performance computing platforms. On Summit, Nek5000/RS has recently achieved an milestone in the simulation of nuclear reactors: the first full-core computational fluid dynamics simulations of reactor cores, including pebble beds with > 350,000 pebbles and 98M elements advanced in less than 0.25 seconds per Navier-Stokes timestep. With carefully tuned algorithms, it is possible to simulate a single flow-through time for a full reactor core in less than six hours on all of Summit.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
GPU Algorithms for Efficient Exascale Discretizations
Authors:
Ahmad Abdelfattah,
Valeria Barra,
Natalie Beams,
Ryan Bleile,
Jed Brown,
Jean-Sylvain Camier,
Robert Carson,
Noel Chalmers,
Veselin Dobrev,
Yohann Dudouit,
Paul Fischer,
Ali Karakus,
Stefan Kerkemeier,
Tzanio Kolev,
Yu-Hsiang Lan,
Elia Merzari,
Misun Min,
Malachi Phillips,
Thilina Rathnayake,
Robert Rieben,
Thomas Stitt,
Ananias Tomboulides,
Stanimire Tomov,
Vladimir Tomov,
Arturo Vargas
, et al. (2 additional authors not shown)
Abstract:
In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM,…
▽ More
In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Efficient Exascale Discretizations: High-Order Finite Element Methods
Authors:
Tzanio Kolev,
Paul Fischer,
Misun Min,
Jack Dongarra,
Jed Brown,
Veselin Dobrev,
Tim Warburton,
Stanimire Tomov,
Mark S. Shephard,
Ahmad Abdelfattah,
Valeria Barra,
Natalie Beams,
Jean-Sylvain Camier,
Noel Chalmers,
Yohann Dudouit,
Ali Karakus,
Ian Karlin,
Stefan Kerkemeier,
Yu-Hsiang Lan,
David Medina,
Elia Merzari,
Aleksandr Obabko,
Will Pazner,
Thilina Rathnayake,
Cameron W. Smith
, et al. (5 additional authors not shown)
Abstract:
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on u…
▽ More
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially-assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver
Authors:
Paul Fischer,
Stefan Kerkemeier,
Misun Min,
Yu-Hsiang Lan,
Malachi Phillips,
Thilina Rathnayake,
Elia Merzari,
Ananias Tomboulides,
Ali Karakus,
Noel Chalmers,
Tim Warburton
Abstract:
The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps…
▽ More
The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps. Critical performance sections of the Navier-Stokes time advancement are addressed. Performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit, for calculations of up to 60B gridpoints.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Scalability of High-Performance PDE Solvers
Authors:
Paul Fischer,
Misun Min,
Thilina Rathnayake,
Som Dutta,
Tzanio Kolev,
Veselin Dobrev,
Jean-Sylvain Camier,
Martin Kronbichler,
Tim Warburton,
Kasia Swirydowicz,
Jed Brown
Abstract:
Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large…
▽ More
Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large-scale numerical solvers for PDEs that govern a wide range of physical applications. We consider a sequence of PDE- motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms. We measure peak performance (degrees of freedom per second) on a fixed number of nodes and identify effective code optimization strategies for each architecture. In addition to peak performance, we identify the minimum time to solution at 80% parallel efficiency. The performance analysis is based on spectral and p-type finite elements but is equally applicable to a broad spectrum of numerical PDE discretizations, including finite difference, finite volume, and h-type finite elements.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Statistical Information Fusion for Multiple-View Sensor Data in Multi-Object Tracking
Authors:
Xiaoying Wang,
Reza Hoseinnezhad,
Amirali K. Gostar,
Tharindu Rathnayake,
Benlian Xu,
Alireza Bab-Hadiashar
Abstract:
This paper presents a novel statistical information fusion method to integrate multiple-view sensor data in multi-object tracking applications. The proposed method overcomes the drawbacks of the commonly used Generalized Covariance Intersection method, which considers constant weights allocated for sensors. Our method is based on enhancing the Generalized Covariance Intersection with adaptive weig…
▽ More
This paper presents a novel statistical information fusion method to integrate multiple-view sensor data in multi-object tracking applications. The proposed method overcomes the drawbacks of the commonly used Generalized Covariance Intersection method, which considers constant weights allocated for sensors. Our method is based on enhancing the Generalized Covariance Intersection with adaptive weights that are automatically tuned based on the amount of information carried by the measurements from each sensor. To quantify information content, Cauchy-Schwarz divergence is used. Another distinguished characteristic of our method lies in the usage of the Labeled Multi-Bernoulli filter for multi-object tracking, in which the weight of each sensor can be separately adapted for each Bernoulli component of the filter. The results of numerical experiments show that our proposed method can successfully integrate information provided by multiple sensors with different fields of view. In such scenarios, our method significantly outperforms the state of art in terms of inclusion of all existing objects and tracking accuracy.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Multi-Sensor Control for Multi-Object Bayes Filters
Authors:
Xiaoying Wang,
Reza Hoseinnezhad,
Amirali K. Gostar,
Tharindu Rathnayake,
Benlian Xu,
Alireza Bab-Hadiashar
Abstract:
Sensor management in multi-object stochastic systems is a theoretically and computationally challenging problem. This paper presents a novel approach to the multi-target multi-sensor control problem within the partially observed Markov decision process (POMDP) framework. We model the multi-object state as a labeled multi-Bernoulli random finite set (RFS), and use the labeled multi-Bernoulli filter…
▽ More
Sensor management in multi-object stochastic systems is a theoretically and computationally challenging problem. This paper presents a novel approach to the multi-target multi-sensor control problem within the partially observed Markov decision process (POMDP) framework. We model the multi-object state as a labeled multi-Bernoulli random finite set (RFS), and use the labeled multi-Bernoulli filter in conjunction with minimizing a task-driven control objective function: posterior expected error of cardinality and state (PEECS). A major contribution is a guided search for multi-dimensional optimization in the multi-sensor control command space, using coordinate descent method. In conjunction with the Generalized Covariance Intersection method for multi-sensor fusion, a fast multi-sensor algorithm is achieved. Numerical studies are presented in several scenarios where numerous controllable (mobile) sensors track multiple moving targets with different levels of observability. The results show that our method works significantly faster than the approach taken by a state of art method, with similar tracking errors.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Labeled Multi-Bernoulli Tracking for Industrial Mobile Platform Safety
Authors:
Tharindu Rathnayake,
Reza Hoseinnezhad,
Ruwan Tennakoon,
Alireza Bab-Hadiashar
Abstract:
This paper presents a track-before-detect labeled multi-Bernoulli filter tailored for industrial mobile platform safety applications. We derive two application specific separable likelihood functions that capture the geometric shape and colour information of the human targets who are wearing a high visible vest. These likelihoods are then used in a labeled multi-Bernoulli filter with a novel two s…
▽ More
This paper presents a track-before-detect labeled multi-Bernoulli filter tailored for industrial mobile platform safety applications. We derive two application specific separable likelihood functions that capture the geometric shape and colour information of the human targets who are wearing a high visible vest. These likelihoods are then used in a labeled multi-Bernoulli filter with a novel two step Bayesian update. Preliminary simulation results show that the proposed solution can successfully track human workers wearing a luminous yellow colour vest in an industrial environment.
△ Less
Submitted 10 May, 2016; v1 submitted 20 April, 2016;
originally announced April 2016.