-
Standardized Data-Parallel Rendering Using ANARI
Authors:
Ingo Wald,
Stefan Zellmann,
Jefferson Amstutz,
Qi Wu,
Kevin Griffin,
Milan Jaros,
Stefan Wesner
Abstract:
We propose and discuss a paradigm that allows for expressing \emph{data-parallel} rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel sci-vis rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing apps to show how easy it is to adopt this paradigm, and what can be gained from doin…
▽ More
We propose and discuss a paradigm that allows for expressing \emph{data-parallel} rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel sci-vis rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing apps to show how easy it is to adopt this paradigm, and what can be gained from doing so.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Scalable Ray Tracing Using the Distributed FrameBuffer
Authors:
Will Usher,
Ingo Wald,
Jefferson Amstutz,
Johannes Günther,
Carson Brownlee,
Valerio Pascucci
Abstract:
Image- and data-parallel rendering across multiple nodes on high-performance computing systems is widely used in visualization to provide higher frame rates, support large data sets, and render data in situ. Specifically for in situ visualization, reducing bottlenecks incurred by the visualization and compositing is of key concern to reduce the overall simulation runtime. Moreover, prior algorithm…
▽ More
Image- and data-parallel rendering across multiple nodes on high-performance computing systems is widely used in visualization to provide higher frame rates, support large data sets, and render data in situ. Specifically for in situ visualization, reducing bottlenecks incurred by the visualization and compositing is of key concern to reduce the overall simulation runtime. Moreover, prior algorithms have been designed to support either image- or data-parallel rendering and impose restrictions on the data distribution, requiring different implementations for each configuration. In this paper, we introduce the Distributed FrameBuffer, an asynchronous image-processing framework for multi-node rendering. We demonstrate that our approach achieves performance superior to the state of the art for common use cases, while providing the flexibility to support a wide range of parallel rendering algorithms and data distributions. By building on this framework, we extend the open-source ray tracing library OSPRay with a data-distributed API, enabling its use in data-distributed and in situ visualization applications.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Beyond ExaBricks: GPU Volume Path Tracing of AMR Data
Authors:
Stefan Zellmann,
Qi Wu,
Alper Sahistan,
Kwan-Liu Ma,
Ingo Wald
Abstract:
Adaptive Mesh Refinement (AMR) is becoming a prevalent data representation for scientific visualization. Resulting from large fluid mechanics simulations, the data is usually cell centric, imposing a number of challenges for high quality reconstruction at sample positions. While recent work has concentrated on real-time volume and isosurface rendering on GPUs, the rendering methods used still focu…
▽ More
Adaptive Mesh Refinement (AMR) is becoming a prevalent data representation for scientific visualization. Resulting from large fluid mechanics simulations, the data is usually cell centric, imposing a number of challenges for high quality reconstruction at sample positions. While recent work has concentrated on real-time volume and isosurface rendering on GPUs, the rendering methods used still focus on simple lighting models without scattering events and global illumination. As in other areas of rendering, key to real-time performance are acceleration data structures; in this work we analyze the major bottlenecks of data structures that were originally optimized for camera/primary ray traversal when used with the incoherent ray tracing workload of a volumetric path tracer, and propose strategies to overcome the challenges coming with this.
△ Less
Submitted 2 December, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
GPU-friendly, Parallel, and (Almost-)In-Place Construction of Left-Balanced k-d Trees
Authors:
Ingo Wald
Abstract:
We present an algorithm that allows for building left-balanced and complete k-d trees over k-dimensional points in a trivially parallel and GPU friendly way. Our algorithm requires exactly one int per data point as temporary storage, and uses O(log N) iterations, each of which performs one parallel sort, and one trivially parallel CUDA per-node update kernel.
We present an algorithm that allows for building left-balanced and complete k-d trees over k-dimensional points in a trivially parallel and GPU friendly way. Our algorithm requires exactly one int per data point as temporary storage, and uses O(log N) iterations, each of which performs one parallel sort, and one trivially parallel CUDA per-node update kernel.
△ Less
Submitted 4 April, 2023; v1 submitted 31 October, 2022;
originally announced November 2022.
-
A Stack-Free Traversal Algorithm for Left-Balanced k-d Trees
Authors:
Ingo Wald
Abstract:
We present an algorithm that allows for find-closest-point and kNN-style traversals of left-balanced k-d trees, without the need for either recursion or software-managed stacks; instead using only current and last previously traversed node to compute which node to traverse next.
We present an algorithm that allows for find-closest-point and kNN-style traversals of left-balanced k-d trees, without the need for either recursion or software-managed stacks; instead using only current and last previously traversed node to compute which node to traverse next.
△ Less
Submitted 2 November, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
GPU-based Data-parallel Rendering of Large, Unstructured, and Non-convexly Partitioned Data
Authors:
Alper Sahistan,
Serkan Demirci,
Ingo Wald,
Stefan Zellmann,
João Barbosa,
Nathan Morrical,
Uğur Güdükbay
Abstract:
Computational fluid dynamic simulations often produce large clusters of finite elements with non-trivial, non-convex boundaries and uneven distributions among compute nodes, posing challenges to compositing during interactive volume rendering. Correct, in-place visualization of such clusters becomes difficult because viewing rays straddle domain boundaries across multiple compute nodes. We propose…
▽ More
Computational fluid dynamic simulations often produce large clusters of finite elements with non-trivial, non-convex boundaries and uneven distributions among compute nodes, posing challenges to compositing during interactive volume rendering. Correct, in-place visualization of such clusters becomes difficult because viewing rays straddle domain boundaries across multiple compute nodes. We propose a GPU-based, scalable, memory-efficient direct volume visualization framework suitable for in~situ and post~hoc usage. Our approach reduces memory usage of the unstructured volume elements by leveraging an exclusive or-based index reduction scheme and provides fast ray-marching-based traversal without requiring large external data structures built over the elements themselves. Moreover, we present a GPU-optimized deep compositing scheme that allows correct order compositing of intermediate color values accumulated across different ranks that works even for non-convex clusters. Our method scales well on large data-parallel systems and achieves interactive frame rates during visualization. We can interactively render both Fun3D Small Mars Lander (14 GB / 798.4 million finite elements) and Huge Mars Lander (111.57 GB / 6.4 billion finite elements) data sets at 14 and 10 frames per second using 72 and 80 GPUs, respectively, on TACC's Frontera supercomputer.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Data Parallel Path Tracing in Object Space
Authors:
Ingo Wald,
Steven G Parker
Abstract:
We investigate the concept of rendering production-style content with full path tracing in a data-distributed fashion -- that is, with multiple collaborating nodes and/or GPUs that each store only part of the model. In particular, we propose a new approach to tracing rays across different nodes/GPUs that improves over traditional spatial partitioning, can support both object-space and spatial part…
▽ More
We investigate the concept of rendering production-style content with full path tracing in a data-distributed fashion -- that is, with multiple collaborating nodes and/or GPUs that each store only part of the model. In particular, we propose a new approach to tracing rays across different nodes/GPUs that improves over traditional spatial partitioning, can support both object-space and spatial partitioning (or any combination thereof), and that enables multiple techniques for reducing the number of rays sent across the network. We show that this approach can handle different kinds of model partitioning strategies, and can ultimately render non-trivial models with full path tracing even on quite moderate hardware resources with rather low-end interconnect.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Point Containment Queries on Ray Tracing Cores for AMR Flow Visualization
Authors:
Stefan Zellmann,
Daniel Seifried,
Nate Morrical,
Ingo Wald,
Will Usher,
Jamie A. P. Law-Smith,
Stefanie Walch-Gassner,
André Hinkenjann
Abstract:
Modern GPUs come with dedicated hardware to perform ray/triangle intersections and bounding volume hierarchy (BVH) traversal. While the primary use case for this hardware is photorealistic 3D computer graphics, with careful algorithm design scientists can also use this special-purpose hardware to accelerate general-purpose computations such as point containment queries. This article explains the p…
▽ More
Modern GPUs come with dedicated hardware to perform ray/triangle intersections and bounding volume hierarchy (BVH) traversal. While the primary use case for this hardware is photorealistic 3D computer graphics, with careful algorithm design scientists can also use this special-purpose hardware to accelerate general-purpose computations such as point containment queries. This article explains the principles behind these techniques and their application to vector field visualization of large simulation data using particle tracing.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
GPGPU-Parallel Re-indexing of Triangle Meshes with Duplicate-Vertex and Unused-Vertex Removal
Authors:
Ingo Wald
Abstract:
We describe a simple yet highly parallel method for re-indexing "indexed" data sets like triangle meshes or unstructured-mesh data sets -- which is useful for operations such as removing duplicate or un-used vertices, merging different meshes, etc. In particlar, our method is parallel and GPU-friendly in the sense that it all its steps are either trivially parallel, or use GPU-parallel primitives…
▽ More
We describe a simple yet highly parallel method for re-indexing "indexed" data sets like triangle meshes or unstructured-mesh data sets -- which is useful for operations such as removing duplicate or un-used vertices, merging different meshes, etc. In particlar, our method is parallel and GPU-friendly in the sense that it all its steps are either trivially parallel, or use GPU-parallel primitives like sorting, prefix-sum; thus making it well suited for highly parallel architectures like GPUs.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
NViSII: A Scriptable Tool for Photorealistic Image Generation
Authors:
Nathan Morrical,
Jonathan Tremblay,
Yunzhi Lin,
Stephen Tyree,
Stan Birchfield,
Valerio Pascucci,
Ingo Wald
Abstract:
We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning. Our tool enables the description and manipulation of complex dynamic 3D scenes containing object meshes, materials, textures, lighting, volumetric data (e.g., smoke), and backgrounds. Metadata,…
▽ More
We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning. Our tool enables the description and manipulation of complex dynamic 3D scenes containing object meshes, materials, textures, lighting, volumetric data (e.g., smoke), and backgrounds. Metadata, such as 2D/3D bounding boxes, segmentation masks, depth maps, normal maps, material properties, and optical flow vectors, can also be generated. In this work, we discuss design goals, architecture, and performance. We demonstrate the use of data generated by path tracing for training an object detector and pose estimator, showing improved performance in sim-to-real transfer in situations that are difficult for traditional raster-based renderers. We offer this tool as an easy-to-use, performant, high-quality renderer for advancing research in synthetic data generation and deep learning.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
A Virtual Frame Buffer Abstraction for Parallel Rendering of Large Tiled Display Walls
Authors:
Mengjiao Han,
Ingo Wald,
Will Usher,
Nate Morrical,
Aaron Knoll,
Valerio Pascucci,
Chris R. Johnson
Abstract:
We present dw2, a flexible and easy-to-use software infrastructure for interactive rendering of large tiled display walls. Our library represents the tiled display wall as a single virtual screen through a display "service", which renderers connect to and send image tiles to be displayed, either from an on-site or remote cluster. The display service can be easily configured to support a range of t…
▽ More
We present dw2, a flexible and easy-to-use software infrastructure for interactive rendering of large tiled display walls. Our library represents the tiled display wall as a single virtual screen through a display "service", which renderers connect to and send image tiles to be displayed, either from an on-site or remote cluster. The display service can be easily configured to support a range of typical network and display hardware configurations; the client library provides a straightforward interface for easy integration into existing renderers. We evaluate the performance of our display wall service in different configurations using a CPU and GPU ray tracer, in both on-site and remote rendering scenarios using multiple display walls
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Ray Tracing Structured AMR Data Using ExaBricks
Authors:
Ingo Wald,
Stefan Zellmann,
Will Usher,
Nate Morrical,
Ulrich Lang,
Valerio Pascucci
Abstract:
Structured Adaptive Mesh Refinement (Structured AMR) enables simulations to adapt the domain resolution to save computation and storage, and has become one of the dominant data representations used by scientific simulations; however, efficiently rendering such data remains a challenge. We present an efficient approach for volume- and iso-surface ray tracing of Structured AMR data on GPU-equipped w…
▽ More
Structured Adaptive Mesh Refinement (Structured AMR) enables simulations to adapt the domain resolution to save computation and storage, and has become one of the dominant data representations used by scientific simulations; however, efficiently rendering such data remains a challenge. We present an efficient approach for volume- and iso-surface ray tracing of Structured AMR data on GPU-equipped workstations, using a combination of two different data structures. Together, these data structures allow a ray tracing based renderer to quickly determine which segments along the ray need to be integrated and at what frequency, while also providing quick access to all data values required for a smooth sample reconstruction kernel. Our method makes use of the RTX ray tracing hardware for surface rendering, ray marching, space skip**, and adaptive sampling; and allows for interactive changes to the transfer function and implicit iso-surfacing thresholds. We demonstrate that our method achieves high performance with little memory overhead, enabling interactive high quality rendering of complex AMR data sets on individual GPU workstations.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Accelerating Force-Directed Graph Drawing with RT Cores
Authors:
Stefan Zellmann,
Martin Weier,
Ingo Wald
Abstract:
Graph drawing with spring embedders employs a V x V computation phase over the graph's vertex set to compute repulsive forces. Here, the efficacy of forces diminishes with distance: a vertex can effectively only influence other vertices in a certain radius around its position. Therefore, the algorithm lends itself to an implementation using search data structures to reduce the runtime complexity.…
▽ More
Graph drawing with spring embedders employs a V x V computation phase over the graph's vertex set to compute repulsive forces. Here, the efficacy of forces diminishes with distance: a vertex can effectively only influence other vertices in a certain radius around its position. Therefore, the algorithm lends itself to an implementation using search data structures to reduce the runtime complexity. NVIDIA RT cores implement hierarchical tree traversal in hardware. We show how to map the problem of finding graph layouts with force-directed methods to a ray tracing problem that can subsequently be implemented with dedicated ray tracing hardware. With that, we observe speedups of 4x to 13x over a CUDA software implementation.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
A Simple, General, and GPU Friendly Method for Computing Dual Mesh and Iso-Surfaces of Adaptive Mesh Refinement (AMR) Data
Authors:
Ingo Wald
Abstract:
We propose a novel approach to extracting crack-free iso-surfaces from Structured AMR data that is more general than previous techniques, is trivially simple to implement, requires no information other than the list of AMR cells, and works, in particular, for different AMR formats including octree AMR, block-structured AMR with arbitrary level differences at level boundaries, and AMR data that con…
▽ More
We propose a novel approach to extracting crack-free iso-surfaces from Structured AMR data that is more general than previous techniques, is trivially simple to implement, requires no information other than the list of AMR cells, and works, in particular, for different AMR formats including octree AMR, block-structured AMR with arbitrary level differences at level boundaries, and AMR data that consist of individual cells without any existing grid structure. We describe both the technique itself and a CUDA-based GPU implementation of this technique, and evaluate it on several non-trivial AMR data sets.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Digesting the Elephant -- Experiences with Interactive Production Quality Path Tracing of the Moana Island Scene
Authors:
Ingo Wald,
Bruce Cherniak,
Will Usher,
Carson Brownlee,
Attila Afra,
Johannes Guenther,
Jefferson Amstutz,
Tim Rowley,
Valerio Pascucci,
Chris R Johnson,
Jim Jeffers
Abstract:
New algorithmic and hardware developments over the past two decades have enabled interactive ray tracing of small to modest sized scenes, and are finding growing popularity in scientific visualization and games. However, interactive ray tracing has not been as widely explored in the context of production film rendering, where challenges due to the complexity of the models and, from a practical sta…
▽ More
New algorithmic and hardware developments over the past two decades have enabled interactive ray tracing of small to modest sized scenes, and are finding growing popularity in scientific visualization and games. However, interactive ray tracing has not been as widely explored in the context of production film rendering, where challenges due to the complexity of the models and, from a practical standpoint, their unavailability to the wider research community, have posed significant challenges. The recent release of the Disney Moana Island Scene has made one such model available to the community for experimentation. In this paper, we detail the challenges posed by this scene to an interactive ray tracer, and the solutions we have employed and developed to enable interactive path tracing of the scene with full geometric and shading detail, with the goal of providing insight and guidance to other researchers.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Efficient Space Skip** and Adaptive Sampling of Unstructured Volumes Using Hardware Accelerated Ray Tracing
Authors:
Nathan Morrical,
Will Usher,
Ingo Wald,
Valerio Pascucci
Abstract:
Sample based ray marching is an effective method for direct volume rendering of unstructured meshes. However, sampling such meshes remains expensive, and strategies to reduce the number of samples taken have received relatively little attention. In this paper, we introduce a method for rendering unstructured meshes using a combination of a coarse spatial acceleration structure and hardware-acceler…
▽ More
Sample based ray marching is an effective method for direct volume rendering of unstructured meshes. However, sampling such meshes remains expensive, and strategies to reduce the number of samples taken have received relatively little attention. In this paper, we introduce a method for rendering unstructured meshes using a combination of a coarse spatial acceleration structure and hardware-accelerated ray tracing. Our approach enables efficient empty space skip** and adaptive sampling of unstructured meshes, and outperforms a reference ray marcher by up to 7x.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.