Skip to main content

Showing 1–5 of 5 results for author: Roy, C J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.14011  [pdf, other

    cs.PF physics.comp-ph

    Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

    Authors: Weicheng Xue, Christohper John Roy

    Abstract: Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully conne… ▽ More

    Submitted 20 February, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

  2. arXiv:2305.18057  [pdf, other

    cs.DC cs.PF

    CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver

    Authors: Weicheng Xue, Hongyu Wang, Christopher J. Roy

    Abstract: This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  3. An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC

    Authors: Weicheng Xue, Charles W. Jackson, Christoper J. Roy

    Abstract: This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 43 pages, 27 figures

  4. Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

    Authors: Weicheng Xue, Christopher J. Roy

    Abstract: This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  5. A Numerical Investigation of Matrix-Free Implicit Time-Step** Methods for Large CFD Simulations

    Authors: Arash Sarshar, Paul Tranquilli, Brent Pickering, Andrew McCall, Adrian Sandu, Christopher J. Roy

    Abstract: This paper is concerned with the development and testing of advanced time-step** methods suited for the integration of time-accurate, real-world applications of computational fluid dynamics (CFD). The performance of several time discretization methods is studied numerically with regards to computational efficiency, order of accuracy, and stability, as well as the ability to treat effectively sti… ▽ More

    Submitted 30 September, 2017; v1 submitted 22 July, 2016; originally announced July 2016.

    Report number: Computational Science Lab CSL-TR-16-6 MSC Class: 65L05; 65L06; 65L20

    Journal ref: Computers & Fluids, Volume 159, 15 Dec. 2017, PP. 53-63