-
Parallel Adaptive Anisotropic Meshing on cc-NUMA Machines
Authors:
Christos Tsolakis,
Nikos Chrisochoides
Abstract:
Efficient and robust anisotropic mesh adaptation is crucial for Computational Fluid Dynamics (CFD) simulations. The CFD Vision 2030 Study highlights the pressing need for this technology, particularly for simulations targeting supercomputers. This work applies a fine-grained speculative approach to anisotropic mesh operations. Our implementation exhibits more than 90% parallel efficiency on a mult…
▽ More
Efficient and robust anisotropic mesh adaptation is crucial for Computational Fluid Dynamics (CFD) simulations. The CFD Vision 2030 Study highlights the pressing need for this technology, particularly for simulations targeting supercomputers. This work applies a fine-grained speculative approach to anisotropic mesh operations. Our implementation exhibits more than 90% parallel efficiency on a multi-core node. Additionally, we evaluate our method within an adaptive pipeline for a spectrum of publicly available test-cases that includes both analytically derived and error-based fields. For all test-cases, our results are in accordance with published results in the literature. Support for CAD-based data is introduced, and its effectiveness is demonstrated on one of NASA's High-Lift prediction workshop cases.
△ Less
Submitted 5 May, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
Tasking framework for Adaptive Speculative Parallel Mesh Generation
Authors:
Christos Tsolakis,
Polykarpos Thomadakis,
Nikos Chrisochoides
Abstract:
Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a cor…
▽ More
Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a core principle, deemed separation of concerns, where functionality is separated from performance aspects of various mesh operations. In particular, thread management and scheduling decisions are elevated into a generic and reusable tasking framework. The results indicate that our approach can successfully abstract the load balancing aspects of two case studies, while providing access to a plethora of different execution back-ends. One would expect, this new flexibility to lead to some additional cost. However, for the configurations studied in this work, we observed up to 13% speedup for some meshing operations and up to 5.8% speedup over the entire application runtime compared to hand-optimized code. Moreover, we show that by using different task creation strategies, the overhead compared to straight-forward task execution models can be improved dramatically by as much as 1200% without compromises in portability and functionality.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Towards Distributed Semi-speculative Adaptive Anisotropic Parallel Mesh Generation
Authors:
Kevin Garner,
Christos Tsolakis,
Polykarpos Thomadakis,
Nikos Chrisochoides
Abstract:
This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from performance aspects by utilizing a separate entity for each - a shared memory mesh generation code called CDT3D and PREMA for parallel runtime support. Although CDT3D…
▽ More
This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from performance aspects by utilizing a separate entity for each - a shared memory mesh generation code called CDT3D and PREMA for parallel runtime support. Although CDT3D is designed for scalability, lessons are presented regarding additional measures that were taken to enable the code's integration into the distributed memory method as a black box. In the presented method, an initial mesh is data decomposed and subdomains are distributed amongst the nodes of a high-performance computing (HPC) cluster. Meshing operations within CDT3D utilize a speculative execution model, enabling the strict adaptation of subdomains' interior elements. Interface elements undergo several iterations of shifting so that they are adapted when their data dependencies are resolved. PREMA aids in this endeavor by providing asynchronous message passing between encapsulations of data, work load balancing, and migration capabilities all within a globally addressable namespace. PREMA also assists in establishing data dependencies between subdomains, thus enabling "neighborhoods" of subdomains to work independently of each other in performing interface shifts and adaptation. Preliminary results show that the presented method is able to produce meshes of comparable quality to those generated by the original shared memory CDT3D code. Given the costly overhead of collective communication seen by existing state-of-the-art software, relative communication performance of the presented distributed memory method also shows that its emphasis on avoiding global synchronization presents a potentially viable solution in achieving scalability when targeting large configurations of cores.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects
Authors:
Nikos Chrisochoides,
Andrey Fedorov,
Yixun Liu,
Andriy Kot,
Panos Foteinos,
Fotis Drakopoulos,
Christos Tsolakis,
Emmanuel Billias,
Olivier Clatz,
Nicholas Ayache,
Alex Golby,
Peter Black,
Ron Kikinis
Abstract:
Current neurosurgical procedures utilize medical images of various modalities to enable the precise location of tumors and critical brain structures to plan accurate brain tumor resection. The difficulty of using preoperative images during the surgery is caused by the intra-operative deformation of the brain tissue (brain shift), which introduces discrepancies concerning the preoperative configura…
▽ More
Current neurosurgical procedures utilize medical images of various modalities to enable the precise location of tumors and critical brain structures to plan accurate brain tumor resection. The difficulty of using preoperative images during the surgery is caused by the intra-operative deformation of the brain tissue (brain shift), which introduces discrepancies concerning the preoperative configuration. Intra-operative imaging allows tracking such deformations but cannot fully substitute for the quality of the pre-operative data. Dynamic Data Driven Deformable Non-Rigid Registration (D4NRR) is a complex and time-consuming image processing operation that allows the dynamic adjustment of the pre-operative image data to account for intra-operative brain shift during the surgery. This paper summarizes the computational aspects of a specific adaptive numerical approximation method and its variations for registering brain MRIs. It outlines its evolution over the last 15 years and identifies new directions for the computational aspects of the technique.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.