-
A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets
Authors:
Bernhard Kerbl,
Andréas Meuleman,
Georgios Kopanas,
Michael Wimmer,
Alexandre Lanvin,
George Drettakis
Abstract:
Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual qualit…
▽ More
Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The Past, Present, and Future of Automation in Model-Driven Engineering
Authors:
Lola Burgueño,
Davide Di Ruscio,
Houari Sahraoui,
Manuel Wimmer
Abstract:
Model-Driven Engineering (MDE) provides a huge body of knowledge of automation for many different engineering tasks, especially those involving transitioning from design to implementation. With the huge progress made on Artificial Intelligence (AI) techniques, questions arise for the future of MDE such as how existing MDE techniques and technologies can be improved or how other activities which cu…
▽ More
Model-Driven Engineering (MDE) provides a huge body of knowledge of automation for many different engineering tasks, especially those involving transitioning from design to implementation. With the huge progress made on Artificial Intelligence (AI) techniques, questions arise for the future of MDE such as how existing MDE techniques and technologies can be improved or how other activities which currently lack dedicated support can also be automated. However, at the same time, it has to be revisited where and how models should be used to keep the engineers in the loop for creating, operating, and maintaining complex systems. To trigger dedicated research on these open points, we discuss the history of automation in MDE and present perspectives on how automation in MDE can be further improved and which obstacles have to be overcome in the medium and long term perspective.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Reconstructing Curves from Sparse Samples on Riemannian Manifolds
Authors:
Diana Marin,
Filippo Maggioli,
Simone Melzi,
Stefan Ohrhallinger,
Michael Wimmer
Abstract:
Reconstructing 2D curves from sample points has long been a critical challenge in computer graphics, finding essential applications in vector graphics. The design and editing of curves on surfaces has only recently begun to receive attention, primarily relying on human assistance, and where not, limited by very strict sampling conditions. In this work, we formally improve on the state-of-the-art r…
▽ More
Reconstructing 2D curves from sample points has long been a critical challenge in computer graphics, finding essential applications in vector graphics. The design and editing of curves on surfaces has only recently begun to receive attention, primarily relying on human assistance, and where not, limited by very strict sampling conditions. In this work, we formally improve on the state-of-the-art requirements and introduce an innovative algorithm capable of reconstructing closed curves directly on surfaces from a given sparse set of sample points. We extend and adapt a state-of-the-art planar curve reconstruction method to the realm of surfaces while dealing with the challenges arising from working on non-Euclidean domains. We demonstrate the robustness of our method by reconstructing multiple curves on various surface meshes. We explore novel potential applications of our approach, allowing for automated reconstruction of curves on Riemannian manifolds.
△ Less
Submitted 13 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Challenges of Quantum Software Engineering for the Next Decade: The Road Ahead
Authors:
Juan M. Murillo,
Jose Garcia-Alonso,
Enrique Moguel,
Johanna Barzen,
Frank Leymann,
Shaukat Ali,
Tao Yue,
Paolo Arcaini,
Ricardo Pérez Castillo,
Ignacio García Rodríguez de Guzmán,
Mario Piattini,
Antonio Ruiz-Cortés,
Antonio Brogi,
Jianjun Zhao,
Andriy Miranskyy,
Manuel Wimmer
Abstract:
As quantum computers evolve, so does the complexity of the software that they can run. To make this software efficient, maintainable, reusable, and cost-effective, quality attributes that any industry-grade software should strive for, mature software engineering approaches should be applied during its design, development, and operation. Due to the significant differences between classical and quan…
▽ More
As quantum computers evolve, so does the complexity of the software that they can run. To make this software efficient, maintainable, reusable, and cost-effective, quality attributes that any industry-grade software should strive for, mature software engineering approaches should be applied during its design, development, and operation. Due to the significant differences between classical and quantum software, applying classical software engineering solutions to quantum software is difficult. This resulted in the birth of Quantum Software Engineering as a discipline in the contemporary software engineering landscape. In this work, a set of active researchers is currently addressing the challenges of Quantum Software Engineering and analyzing the most recent research advances in this domain. This analysis is used to identify needed breakthroughs and future research directions for Quantum Software Engineering.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks
Authors:
Philip Matthias Winter,
Maria Wimmer,
David Major,
Dimitrios Lenis,
Astrid Berg,
Theresa Neubauer,
Gaia Romana De Paolis,
Johannes Novotny,
Sophia Ulonska,
Katja Bühler
Abstract:
In this work we address flexibility in deep learning by means of transductive reasoning. For adaptation to new tasks or new data, existing methods typically involve tuning of learnable parameters or even complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a steppi…
▽ More
In this work we address flexibility in deep learning by means of transductive reasoning. For adaptation to new tasks or new data, existing methods typically involve tuning of learnable parameters or even complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a step** stone for solving these issues. We therefore propose PARMESAN (parameter-free memory search and transduction), a scalable transduction method which leverages a memory module for solving dense prediction tasks. At inference, hidden representations in memory are being searched to find corresponding examples. In contrast to other methods, PARMESAN learns without the requirement for any continuous training or fine-tuning of learnable parameters simply by modifying the memory content. Our method is compatible with commonly used neural architectures and canonically transfers to 1D, 2D, and 3D grid-based data. We demonstrate the capabilities of our approach at complex tasks such as continual and few-shot learning. PARMESAN learns up to 370 times faster than common baselines while being on par in terms of predictive performance, knowledge retention, and data-efficiency.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
PPSURF: Combining Patches and Point Convolutions for Detailed Surface Reconstruction
Authors:
Philipp Erler,
Lizeth Fuentes,
Pedro Hermosilla,
Paul Guerrero,
Renato Pajarola,
Michael Wimmer
Abstract:
3D surface reconstruction from point clouds is a key step in areas such as content creation, archaeology, digital cultural heritage, and engineering. Current approaches either try to optimize a non-data-driven surface representation to fit the points, or learn a data-driven prior over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds. Data-d…
▽ More
3D surface reconstruction from point clouds is a key step in areas such as content creation, archaeology, digital cultural heritage, and engineering. Current approaches either try to optimize a non-data-driven surface representation to fit the points, or learn a data-driven prior over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds. Data-driven methods enable robust handling of noise and typically either focus on a global or a local prior, which trade-off between robustness to noise on the global end and surface detail preservation on the local end. We propose PPSurf as a method that combines a global prior based on point convolutions and a local prior based on processing local point cloud patches. We show that this approach is robust to noise while recovering surface details more accurately than the current state-of-the-art.
Our source code, pre-trained model and dataset are available at: https://github.com/cg-tuwien/ppsurf
△ Less
Submitted 8 February, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Multi-scale attention-based instance segmentation for measuring crystals with large size variation
Authors:
Theresa Neubauer,
Astrid Berg,
Maria Wimmer,
Dimitrios Lenis,
David Major,
Philip Matthias Winter,
Gaia Romana De Paolis,
Johannes Novotny,
Daniel Lüftner,
Katja Reinharter,
Katja Bühler
Abstract:
Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segm…
▽ More
Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segmentation errors, such as incorrectly fused or separated segments, can significantly lower the accuracy of the measured results. Instead of improving the existing pixel-wise boundary segmentation methods, we propose to use an instance-based segmentation method, which gives more robust segmentation results to improve measurement accuracy. Our novel method enhances flow maps with a size-aware multi-scale attention module. The attention module adaptively fuses information from multiple scales and focuses on the most relevant scale for each segmented image area. We demonstrate that our proposed attention fusion strategy outperforms state-of-the-art instance and boundary segmentation methods, as well as simple average fusion of multi-scale predictions. We evaluate our method on a refractory raw material dataset of high-resolution images with large variation in crystal size and show that our model can be used to calculate the crystal size more accurately than existing methods.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Re:Draw -- Context Aware Translation as a Controllable Method for Artistic Production
Authors:
Joao Liborio Cardoso,
Francesco Banterle,
Paolo Cignoni,
Michael Wimmer
Abstract:
We introduce context-aware translation, a novel method that combines the benefits of inpainting and image-to-image translation, respecting simultaneously the original input and contextual relevance -- where existing methods fall short. By doing so, our method opens new avenues for the controllable use of AI within artistic creation, from animation to digital art.
As an use case, we apply our met…
▽ More
We introduce context-aware translation, a novel method that combines the benefits of inpainting and image-to-image translation, respecting simultaneously the original input and contextual relevance -- where existing methods fall short. By doing so, our method opens new avenues for the controllable use of AI within artistic creation, from animation to digital art.
As an use case, we apply our method to redraw any hand-drawn animated character eyes based on any design specifications - eyes serve as a focal point that captures viewer attention and conveys a range of emotions, however, the labor-intensive nature of traditional animation often leads to compromises in the complexity and consistency of eye design. Furthermore, we remove the need for production data for training and introduce a new character recognition method that surpasses existing work by not requiring fine-tuning to specific productions. This proposed use case could help maintain consistency throughout production and unlock bolder and more detailed design choices without the production cost drawbacks. A user study shows context-aware translation is preferred over existing work 95.16% of the time.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Training and Predicting Visual Error for Real-Time Applications
Authors:
João Libório Cardoso,
Bernhard Kerbl,
Lei Yang,
Yury Uralsky,
Michael Wimmer
Abstract:
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual cha…
▽ More
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual characteristics of the human visual system. However, their complexity, computational expense, and reliance on reference images to compare against prevent their generalized use in real-time, restricting such applications to using only the simplest available metrics. In this work, we explore the abilities of convolutional neural networks to predict a variety of visual metrics without requiring either reference or rendered images. Specifically, we train and deploy a neural network to estimate the visual error resulting from reusing shading or using reduced shading rates. The resulting models account for 70%-90% of the variance while achieving up to an order of magnitude faster computation times. Our solution combines image-space information that is readily available in most state-of-the-art deferred shading pipelines with reprojection from previous frames to enable an adequate estimate of visual errors, even in previously unseen regions. We describe a suitable convolutional network architecture and considerations for data preparation for training. We demonstrate the capability of our network to predict complex error metrics at interactive rates in a real-time application that implements content-adaptive shading in a deferred pipeline. Depending on the portion of unseen image regions, our approach can achieve up to $2\times$ performance compared to state-of-the-art methods.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
SimLOD: Simultaneous LOD Generation and Rendering
Authors:
Markus Schütz,
Lukas Herzberger,
Michael Wimmer
Abstract:
About: We propose an incremental LOD generation approach for point clouds that allows us to simultaneously load points from disk, update an octree-based level-of-detail representation, and render the intermediate results in real time while additional points are still being loaded from disk. LOD construction and rendering are both implemented in CUDA and share the GPU's processing power, but each i…
▽ More
About: We propose an incremental LOD generation approach for point clouds that allows us to simultaneously load points from disk, update an octree-based level-of-detail representation, and render the intermediate results in real time while additional points are still being loaded from disk. LOD construction and rendering are both implemented in CUDA and share the GPU's processing power, but each incremental update is lightweight enough to leave enough time to maintain real-time frame rates.
Background: LOD construction is typically implemented as a preprocessing step that requires users to wait before they are able to view the results in real time. This approach allows users to view intermediate results right away.
Results: Our approach is able to stream points from an SSD and update the octree on the GPU at rates of up to 580 million points per second (~9.3GB/s from a PCIe 5.0 SSD) on an RTX 4090. Depending on the data set, our approach spends an average of about 1 to 2 ms to incrementally insert 1 million points into the octree, allowing us to insert several million points per frame into the LOD structure and render the intermediate results within the same frame.
Discussion/Limitations: We aim to provide near-instant, real-time visualization of large data sets without preprocessing. Out-of-core processing of arbitrarily large data sets and color-filtering for higher-quality LODs are subject to future work.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
View-Independent Adjoint Light Tracing for Lighting Design Optimization
Authors:
Lukas Lipp,
David Hahn,
Pierre Ecormier-Nocca,
Florian Rist,
Michael Wimmer
Abstract:
Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when…
▽ More
Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when attempting to apply existing methods from differentiable path tracing to this problem: first, many rendering methods produce images, which restricts the ability of a designer to define lighting objectives to image space. Second, most previous methods are designed for scene geometry or material optimization and have not been extensively tested for the case of optimizing light sources. Currently available differentiable ray-tracing methods do not provide satisfactory performance, even on fairly basic test cases in our experience. In this paper, we propose a novel adjoint light tracing method that overcomes these challenges and enables gradient-based lighting design optimization in a view-independent (camera-free) way. Thus, we allow the user to paint illumination targets directly onto the 3d scene or use existing baked illumination data (e.g., light maps). Using modern ray-tracing hardware, we achieve interactive performance. We find light tracing advantageous over path tracing in this setting, as it naturally handles irregular geometry, resulting in less noise and improved optimization convergence. We compare our adjoint gradients to state-of-the-art image-based differentiable rendering methods. We also demonstrate that our gradient data works with various common optimization algorithms, providing good convergence behaviour. Qualitative comparisons with real-world scenes underline the practical applicability of our method.
△ Less
Submitted 6 May, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches
Authors:
S. Rasoulzadeh,
M. Wimmer,
P. Stauss,
I. Kovacic
Abstract:
We present Strokes2Surface, an offline geometry reconstruction pipeline that recovers well-connected curve networks from imprecise 4D sketches to bridge concept design and digital modeling stages in architectural design. The input to our pipeline consists of 3D strokes' polyline vertices and their timestamps as the 4th dimension, along with additional metadata recorded throughout sketching. Inspir…
▽ More
We present Strokes2Surface, an offline geometry reconstruction pipeline that recovers well-connected curve networks from imprecise 4D sketches to bridge concept design and digital modeling stages in architectural design. The input to our pipeline consists of 3D strokes' polyline vertices and their timestamps as the 4th dimension, along with additional metadata recorded throughout sketching. Inspired by architectural sketching practices, our pipeline combines a classifier and two clustering models to achieve its goal. First, with a set of extracted hand-engineered features from the sketch, the classifier recognizes the type of individual strokes between those depicting boundaries (Shape strokes) and those depicting enclosed areas (Scribble strokes). Next, the two clustering models parse strokes of each type into distinct groups, each representing an individual edge or face of the intended architectural object. Curve networks are then formed through topology recovery of consolidated Shape clusters and surfaced using Scribble clusters guiding the cycle discovery. Our evaluation is threefold: We confirm the usability of the Strokes2Surface pipeline in architectural design use cases via a user study, we validate our choice of features via statistical analysis and ablation studies on our collected dataset, and we compare our outputs against a range of reconstructions computed using alternative methods.
△ Less
Submitted 10 June, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
GPU-Accelerated LOD Generation for Point Clouds
Authors:
Markus Schütz,
Bernhard Kerbl,
Philip Klaus,
Michael Wimmer
Abstract:
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-…
▽ More
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art.
Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time.
Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 -- an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory.
Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Employing similarity to highlight differences: On the impact of anatomical assumptions in chest X-ray registration methods
Authors:
Astrid Berg,
Eva Vandersmissen,
Maria Wimmer,
David Major,
Theresa Neubauer,
Dimitrios Lenis,
Jeroen Cant,
Annemiek Snoeckx,
Katja Bühler
Abstract:
To facilitate both the detection and the interpretation of findings in chest X-rays, comparison with a previous image of the same patient is very valuable to radiologists. Today, the most common approach for deep learning methods to automatically inspect chest X-rays disregards the patient history and classifies only single images as normal or abnormal. Nevertheless, several methods for assisting…
▽ More
To facilitate both the detection and the interpretation of findings in chest X-rays, comparison with a previous image of the same patient is very valuable to radiologists. Today, the most common approach for deep learning methods to automatically inspect chest X-rays disregards the patient history and classifies only single images as normal or abnormal. Nevertheless, several methods for assisting in the task of comparison through image registration have been proposed in the past. However, as we illustrate, they tend to miss specific types of pathological changes like cardiomegaly and effusion. Due to assumptions on fixed anatomical structures or their measurements of registration quality, they produce unnaturally deformed warp fields impacting visualization of differences between moving and fixed images. We aim to overcome these limitations, through a new paradigm based on individual rib pair segmentation for anatomy penalized registration. Our method proves to be a natural way to limit the folding percentage of the warp field to 1/6 of the state of the art while increasing the overlap of ribs by more than 25%, implying difference images showing pathological changes overlooked by other methods. We develop an anatomically penalized convolutional multi-stage solution on the National Institutes of Health (NIH) data set, starting from less than 25 fully and 50 partly labeled training images, employing sequential instance memory segmentation with hole dropout, weak labeling, coarse-to-fine refinement and Gaussian mixture model histogram matching. We statistically evaluate the benefits of our method and highlight the limits of currently used metrics for registration of chest X-rays.
△ Less
Submitted 24 January, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Anomaly Detection using Generative Models and Sum-Product Networks in Mammography Scans
Authors:
Marc Dietrichstein,
David Major,
Martin Trapp,
Maria Wimmer,
Dimitrios Lenis,
Philip Winter,
Astrid Berg,
Theresa Neubauer,
Katja Bühler
Abstract:
Unsupervised anomaly detection models which are trained solely by healthy data, have gained importance in the recent years, as the annotation of medical data is a tedious task. Autoencoders and generative adversarial networks are the standard anomaly detection methods that are utilized to learn the data distribution. However, they fall short when it comes to inference and evaluation of the likelih…
▽ More
Unsupervised anomaly detection models which are trained solely by healthy data, have gained importance in the recent years, as the annotation of medical data is a tedious task. Autoencoders and generative adversarial networks are the standard anomaly detection methods that are utilized to learn the data distribution. However, they fall short when it comes to inference and evaluation of the likelihood of test samples. We propose a novel combination of generative models and a probabilistic graphical model. After encoding image samples by autoencoders, the distribution of data is modeled by Random and Tensorized Sum-Product Networks ensuring exact and efficient inference at test time. We evaluate different autoencoder architectures in combination with Random and Tensorized Sum-Product Networks on mammography images using patch-wise processing and observe superior performance over utilizing the models standalone and state-of-the-art in anomaly detection for medical data.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Assesment of material layers in building walls using GeoRadar
Authors:
Ildar Gilmutdinov,
Ingrid Schloegel,
Alois Hinterleitner,
Peter Wonka,
Michael Wimmer
Abstract:
Assessing the structure of a building with non-invasive methods is an important problem. One of the possible approaches is to use GeoRadar to examine wall structures by analyzing the data obtained from the scans. We propose a data-driven approach to evaluate the material composition of a wall from its GPR radargrams. In order to generate training data, we use gprMax to model the scanning process.…
▽ More
Assessing the structure of a building with non-invasive methods is an important problem. One of the possible approaches is to use GeoRadar to examine wall structures by analyzing the data obtained from the scans. We propose a data-driven approach to evaluate the material composition of a wall from its GPR radargrams. In order to generate training data, we use gprMax to model the scanning process. Using simulation data, we use a convolutional neural network to predict the thicknesses and dielectric properties of walls per layer. We evaluate the generalization abilities of the trained model on data collected from real buildings.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Software Rasterization of 2 Billion Points in Real Time
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, leve…
▽ More
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, level-of-detail rendering, and choosing the appropriate coordinate precision for a given batch of points directly within a compute workgroup. Adaptive coordinate precision, in conjunction with visibility buffers, reduces the number of loaded bytes for the majority of points down to 4, thus making our approach several times faster than the bandwidth-limited state of the art. Furthermore, support for LOD rendering makes our software-rasterization approach suitable for rendering arbitrarily large point clouds, and to meet the increased performance demands of virtual reality rendering.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Gaussian Mixture Convolution Networks
Authors:
Adam Celarek,
Pedro Hermosilla,
Bernhard Kerbl,
Timo Ropinski,
Michael Wimmer
Abstract:
This paper proposes a novel method for deep learning based on the analytical convolution of multidimensional Gaussian mixtures. In contrast to tensors, these do not suffer from the curse of dimensionality and allow for a compact representation, as data is only stored where details exist. Convolution kernels and data are Gaussian mixtures with unconstrained weights, positions, and covariance matric…
▽ More
This paper proposes a novel method for deep learning based on the analytical convolution of multidimensional Gaussian mixtures. In contrast to tensors, these do not suffer from the curse of dimensionality and allow for a compact representation, as data is only stored where details exist. Convolution kernels and data are Gaussian mixtures with unconstrained weights, positions, and covariance matrices. Similar to discrete convolutional networks, each convolution step produces several feature channels, represented by independent Gaussian mixtures. Since traditional transfer functions like ReLUs do not produce Gaussian mixtures, we propose using a fitting of these functions instead. This fitting step also acts as a pooling layer if the number of Gaussian components is reduced appropriately. We demonstrate that networks based on this architecture reach competitive accuracy on Gaussian mixtures fitted to the MNIST and ModelNet data sets.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Multi-task fusion for improving mammography screening data classification
Authors:
Maria Wimmer,
Gert Sluiter,
David Major,
Dimitrios Lenis,
Astrid Berg,
Theresa Neubauer,
Katja Bühler
Abstract:
Machine learning and deep learning methods have become essential for computer-assisted prediction in medicine, with a growing number of applications also in the field of mammography. Typically these algorithms are trained for a specific task, e.g., the classification of lesions or the prediction of a mammogram's pathology status. To obtain a comprehensive view of a patient, models which were all t…
▽ More
Machine learning and deep learning methods have become essential for computer-assisted prediction in medicine, with a growing number of applications also in the field of mammography. Typically these algorithms are trained for a specific task, e.g., the classification of lesions or the prediction of a mammogram's pathology status. To obtain a comprehensive view of a patient, models which were all trained for the same task(s) are subsequently ensembled or combined. In this work, we propose a pipeline approach, where we first train a set of individual, task-specific models and subsequently investigate the fusion thereof, which is in contrast to the standard model ensembling strategy. We fuse model predictions and high-level features from deep learning models with hybrid patient models to build stronger predictors on patient level. To this end, we propose a multi-branch deep learning model which efficiently fuses features across different tasks and mammograms to obtain a comprehensive patient-level prediction. We train and evaluate our full pipeline on public mammography data, i.e., DDSM and its curated version CBIS-DDSM, and report an AUC score of 0.962 for predicting the presence of any lesion and 0.791 for predicting the presence of malignant lesions on patient level. Overall, our fusion approaches improve AUC scores significantly by up to 0.04 compared to standard model ensembling. Moreover, by providing not only global patient-level predictions but also task-specific model results that are related to radiological features, our pipeline aims to closely support the reading workflow of radiologists.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hard…
▽ More
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures.
We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5x in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Soft Tissue Sarcoma Co-Segmentation in Combined MRI and PET/CT Data
Authors:
Theresa Neubauer,
Maria Wimmer,
Astrid Berg,
David Major,
Dimitrios Lenis,
Thomas Beyer,
Jelena Saponjski,
Katja Bühler
Abstract:
Tumor segmentation in multimodal medical images has seen a growing trend towards deep learning based methods. Typically, studies dealing with this topic fuse multimodal image data to improve the tumor segmentation contour for a single imaging modality. However, they do not take into account that tumor characteristics are emphasized differently by each modality, which affects the tumor delineation.…
▽ More
Tumor segmentation in multimodal medical images has seen a growing trend towards deep learning based methods. Typically, studies dealing with this topic fuse multimodal image data to improve the tumor segmentation contour for a single imaging modality. However, they do not take into account that tumor characteristics are emphasized differently by each modality, which affects the tumor delineation. Thus, the tumor segmentation is modality- and task-dependent. This is especially the case for soft tissue sarcomas, where, due to necrotic tumor tissue, the segmentation differs vastly. Closing this gap, we develop a modalityspecific sarcoma segmentation model that utilizes multimodal image data to improve the tumor delineation on each individual modality. We propose a simultaneous co-segmentation method, which enables multimodal feature learning through modality-specific encoder and decoder branches, and the use of resource-effcient densely connected convolutional layers. We further conduct experiments to analyze how different input modalities and encoder-decoder fusion strategies affect the segmentation result. We demonstrate the effectiveness of our approach on public soft tissue sarcoma data, which comprises MRI (T1 and T2 sequence) and PET/CT scans. The results show that our multimodal co-segmentation model provides better modality-specific tumor segmentation than models using only the PET or MRI (T1 and T2) scan as input.
△ Less
Submitted 24 September, 2020; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Test Scene Design for Physically Based Rendering
Authors:
Elias Brugger,
Christian Freude,
Michael Wimmer
Abstract:
Physically based rendering is a discipline in computer graphics which aims at reproducing certain light and material appearances that occur in the real world. Complex scenes can be difficult to compute for rendering algorithms. This paper introduces a new comprehensive test database of scenes that treat different light setups in conjunction with diverse materials and discusses its design principle…
▽ More
Physically based rendering is a discipline in computer graphics which aims at reproducing certain light and material appearances that occur in the real world. Complex scenes can be difficult to compute for rendering algorithms. This paper introduces a new comprehensive test database of scenes that treat different light setups in conjunction with diverse materials and discusses its design principles. A lot of research is focused on the development of new algorithms that can deal with difficult light conditions and materials efficiently. This database delivers a comprehensive foundation for evaluating existing and newly developed rendering techniques. A final evaluation compares different results of different rendering algorithms for all scenes.
△ Less
Submitted 30 August, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Points2Surf: Learning Implicit Surfaces from Point Cloud Patches
Authors:
Philipp Erler,
Paul Guerrero,
Stefan Ohrhallinger,
Michael Wimmer,
Niloy J. Mitra
Abstract:
A key step in any scanning-based asset creation workflow is to convert unordered point clouds to a surface. Classical methods (e.g., Poisson reconstruction) start to degrade in the presence of noisy and partial scans. Hence, deep learning based methods have recently been proposed to produce complete surfaces, even from partial scans. However, such data-driven methods struggle to generalize to new…
▽ More
A key step in any scanning-based asset creation workflow is to convert unordered point clouds to a surface. Classical methods (e.g., Poisson reconstruction) start to degrade in the presence of noisy and partial scans. Hence, deep learning based methods have recently been proposed to produce complete surfaces, even from partial scans. However, such data-driven methods struggle to generalize to new shapes with large geometric and topological variations. We present Points2Surf, a novel patch-based learning framework that produces accurate surfaces directly from raw scans without normals. Learning a prior over a combination of detailed local patches and coarse global information improves generalization performance and reconstruction accuracy. Our extensive comparison on both synthetic and real data demonstrates a clear advantage of our method over state-of-the-art alternatives on previously unseen classes (on average, Points2Surf brings down reconstruction error by 30% over SPR and by 270%+ over deep learning based SotA methods) at the cost of longer computation times and a slight increase in small-scale topological noise in some cases. Our source code, pre-trained model, and dataset are available on: https://github.com/ErlerPhilipp/points2surf
△ Less
Submitted 13 February, 2024; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Domain aware medical image classifier interpretation by counterfactual impact analysis
Authors:
Dimitrios Lenis,
David Major,
Maria Wimmer,
Astrid Berg,
Gert Sluiter,
Katja Bühler
Abstract:
The success of machine learning methods for computer vision tasks has driven a surge in computer assisted prediction for medicine and biology. Based on a data-driven relationship between input image and pathological classification, these predictors deliver unprecedented accuracy. Yet, the numerous approaches trying to explain the causality of this learned relationship have fallen short: time const…
▽ More
The success of machine learning methods for computer vision tasks has driven a surge in computer assisted prediction for medicine and biology. Based on a data-driven relationship between input image and pathological classification, these predictors deliver unprecedented accuracy. Yet, the numerous approaches trying to explain the causality of this learned relationship have fallen short: time constraints, coarse, diffuse and at times misleading results, caused by the employment of heuristic techniques like Gaussian noise and blurring, have hindered their clinical adoption.
In this work, we discuss and overcome these obstacles by introducing a neural-network based attribution method, applicable to any trained predictor. Our solution identifies salient regions of an input image in a single forward-pass by measuring the effect of local image-perturbations on a predictor's score. We replace heuristic techniques with a strong neighborhood conditioned inpainting approach, avoiding anatomically implausible, hence adversarial artifacts. We evaluate on public mammography data and compare against existing state-of-the-art methods. Furthermore, we exemplify the approach's generalizability by demonstrating results on chest X-rays. Our solution shows, both quantitatively and qualitatively, a significant reduction of localization ambiguity and clearer conveying results, without sacrificing time efficiency.
△ Less
Submitted 1 October, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Interpreting Medical Image Classifiers by Optimization Based Counterfactual Impact Analysis
Authors:
David Major,
Dimitrios Lenis,
Maria Wimmer,
Gert Sluiter,
Astrid Berg,
Katja Bühler
Abstract:
Clinical applicability of automated decision support systems depends on a robust, well-understood classification interpretation. Artificial neural networks while achieving class-leading scores fall short in this regard. Therefore, numerous approaches have been proposed that map a salient region of an image to a diagnostic classification. Utilizing heuristic methodology, like blurring and noise, th…
▽ More
Clinical applicability of automated decision support systems depends on a robust, well-understood classification interpretation. Artificial neural networks while achieving class-leading scores fall short in this regard. Therefore, numerous approaches have been proposed that map a salient region of an image to a diagnostic classification. Utilizing heuristic methodology, like blurring and noise, they tend to produce diffuse, sometimes misleading results, hindering their general adoption. In this work we overcome these issues by presenting a model agnostic saliency map** framework tailored to medical imaging. We replace heuristic techniques with a strong neighborhood conditioned inpainting approach, which avoids anatomically implausible artefacts. We formulate saliency attribution as a map-quality optimization task, enforcing constrained and focused attributions. Experiments on public mammography data show quantitatively and qualitatively more precise localization and clearer conveying results than existing state-of-the-art methods.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Photorealistic Material Editing Through Direct Image Manipulation
Authors:
Károly Zsolnai-Fehér,
Peter Wonka,
Michael Wimmer
Abstract:
Creating photorealistic materials for light transport algorithms requires carefully fine-tuning a set of material properties to achieve a desired artistic effect. This is typically a lengthy process that involves a trained artist with specialized knowledge. In this work, we present a technique that aims to empower novice and intermediate-level users to synthesize high-quality photorealistic materi…
▽ More
Creating photorealistic materials for light transport algorithms requires carefully fine-tuning a set of material properties to achieve a desired artistic effect. This is typically a lengthy process that involves a trained artist with specialized knowledge. In this work, we present a technique that aims to empower novice and intermediate-level users to synthesize high-quality photorealistic materials by only requiring basic image processing knowledge. In the proposed workflow, the user starts with an input image and applies a few intuitive transforms (e.g., colorization, image inpainting) within a 2D image editor of their choice, and in the next step, our technique produces a photorealistic result that approximates this target image. Our method combines the advantages of a neural network-augmented optimizer and an encoder neural network to produce high-quality output results within 30 seconds. We also demonstrate that it is resilient against poorly-edited target images and propose a simple extension to predict image sequences with a strict time budget of 1-2 seconds per image.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
Sabrina: Modeling and Visualization of Economy Data with Incremental Domain Knowledge
Authors:
Alessio Arleo,
Christos Tsigkanos,
Chao Jia,
Roger A. Leite,
Ilir Murturi,
Manfred Klaffenboeck,
Schahram Dustdar,
Michael Wimmer,
Silvia Miksch,
Johannes Sorger
Abstract:
Investment planning requires knowledge of the financial landscape on a large scale, both in terms of geo-spatial and industry sector distribution. There is plenty of data available, but it is scattered across heterogeneous sources (newspapers, open data, etc.), which makes it difficult for financial analysts to understand the big picture. In this paper, we present Sabrina, a financial data analysi…
▽ More
Investment planning requires knowledge of the financial landscape on a large scale, both in terms of geo-spatial and industry sector distribution. There is plenty of data available, but it is scattered across heterogeneous sources (newspapers, open data, etc.), which makes it difficult for financial analysts to understand the big picture. In this paper, we present Sabrina, a financial data analysis and visualization approach that incorporates a pipeline for the generation of firm-to-firm financial transaction networks. The pipeline is capable of fusing the ground truth on individual firms in a region with (incremental) domain knowledge on general macroscopic aspects of the economy. Sabrina unites these heterogeneous data sources within a uniform visual interface that enables the visual analysis process. In a user study with three domain experts, we illustrate the usefulness of Sabrina, which eases their analysis process.
△ Less
Submitted 8 January, 2020; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Rendering Point Clouds with Compute Shaders
Authors:
Markus Schütz,
Michael Wimmer
Abstract:
We propose a compute shader based point cloud rasterizer with up to 10 times higher performance than classic point-based rendering with the GL_POINT primitive. In addition to that, our rasterizer offers 5 byte depth-buffer precision with uniform or customizable distribution, and we show that it is possible to implement a high-quality splatting method that blends together overlap** fragments whil…
▽ More
We propose a compute shader based point cloud rasterizer with up to 10 times higher performance than classic point-based rendering with the GL_POINT primitive. In addition to that, our rasterizer offers 5 byte depth-buffer precision with uniform or customizable distribution, and we show that it is possible to implement a high-quality splatting method that blends together overlap** fragments while still maintaining higher frame-rates than the traditional approach.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
StretchDenoise: Parametric Curve Reconstruction with Guarantees by Separating Connectivity from Residual Uncertainty of Samples
Authors:
Stefan Ohrhallinger,
Michael Wimmer
Abstract:
We reconstruct a closed denoised curve from an unstructured and highly noisy 2D point cloud. Our proposed method uses a two- pass approach: Previously recovered manifold connectivity is used for ordering noisy samples along this manifold and express these as residuals in order to enable parametric denoising. This separates recovering low-frequency features from denoising high frequencies, which av…
▽ More
We reconstruct a closed denoised curve from an unstructured and highly noisy 2D point cloud. Our proposed method uses a two- pass approach: Previously recovered manifold connectivity is used for ordering noisy samples along this manifold and express these as residuals in order to enable parametric denoising. This separates recovering low-frequency features from denoising high frequencies, which avoids over-smoothing. The noise probability density functions (PDFs) at samples are either taken from sensor noise models or from estimates of the connectivity recovered in the first pass. The output curve balances the signed distances (inside/outside) to the samples. Additionally, the angles between edges of the polygon representing the connectivity become minimized in the least-square sense. The movement of the polygon's vertices is restricted to their noise extent, i.e., a cut-off distance corresponding to a maximum variance of the PDFs. We approximate the resulting optimization model, which consists of higher-order functions, by a linear model with good correspondence. Our algorithm is parameter-free and operates fast on the local neighborhoods determined by the connectivity. We augment a least-squares solver constrained by a linear system to also handle bounds. This enables us to guarantee stochastic error bounds for sampled curves corrupted by noise, e.g., silhouettes from sensed data, and we improve on the reconstruction error from ground truth. Open source to reproduce figures and tables in this paper is available at: https://github.com/stefango74/stretchdenoise
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Deep Sequential Segmentation of Organs in Volumetric Medical Scans
Authors:
Alexey Novikov,
David Major,
Maria Wimmer,
Dimitrios Lenis,
Katja Bühler
Abstract:
Segmentation in 3D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints - first, they require resizing the volume to the lower-resolutional refer…
▽ More
Segmentation in 3D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints - first, they require resizing the volume to the lower-resolutional reference dimensions, second, the capacity of such approaches is very limited due to memory restrictions, and third, all slices of volumes have to be available at any given training or testing time. We address these problems by a U-Net-like architecture consisting of bidirectional convolutional LSTM and convolutional, pooling, upsampling and concatenation layers enclosed into time-distributed wrappers. Our network can either process the full volumes in a sequential manner, or segment slabs of slices on demand. We demonstrate performance of our architecture on vertebrae and liver segmentation tasks in 3D CT scans.
△ Less
Submitted 11 March, 2019; v1 submitted 6 July, 2018;
originally announced July 2018.
-
Gaussian Material Synthesis
Authors:
Károly Zsolnai-Fehér,
Peter Wonka,
Michael Wimmer
Abstract:
We present a learning-based system for rapid mass-scale material synthesis that is useful for novice and expert users alike. The user preferences are learned via Gaussian Process Regression and can be easily sampled for new recommendations. Typically, each recommendation takes 40-60 seconds to render with global illumination, which makes this process impracticable for real-world workflows. Our neu…
▽ More
We present a learning-based system for rapid mass-scale material synthesis that is useful for novice and expert users alike. The user preferences are learned via Gaussian Process Regression and can be easily sampled for new recommendations. Typically, each recommendation takes 40-60 seconds to render with global illumination, which makes this process impracticable for real-world workflows. Our neural network eliminates this bottleneck by providing high-quality image predictions in real time, after which it is possible to pick the desired materials from a gallery and assign them to a scene in an intuitive manner. Workflow timings against Disney's "principled" shader reveal that our system scales well with the number of sought materials, thus empowering even novice users to generate hundreds of high-quality material models without any expertise in material modeling. Similarly, expert users experience a significant decrease in the total modeling time when populating a scene with materials. Furthermore, our proposed solution also offers controllable recommendations and a novel latent space variant generation step to enable the real-time fine-tuning of materials without requiring any domain expertise.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs
Authors:
Alexey A. Novikov,
Dimitrios Lenis,
David Major,
Jiri Hladůvka,
Maria Wimmer,
Katja Bühler
Abstract:
The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address seve…
▽ More
The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address several open challenges including model overfitting, reducing number of parameters and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization and a large number of high resolution low level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multi-class configuration with three target classes and are trained and tested on the publicly available JSRT database, consisting of 247 X-ray images the ground-truth masks for which are available in the SCR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95.0\% for lungs, 86.8\% for clavicles and 88.2\% for heart. This architecture outperformed the human observer results for lungs and heart.
△ Less
Submitted 13 February, 2018; v1 submitted 30 January, 2017;
originally announced January 2017.
-
Benchmarking Concurrent Priority Queues: Performance of k-LSM and Related Data Structures
Authors:
Jakob Gruber,
Jesper Larsson Träff,
Martin Wimmer
Abstract:
A number of concurrent, relaxed priority queues have recently been proposed and implemented. Results are commonly reported for a throughput benchmark that uses a uniform distribution of keys drawn from a large integer range, and mostly for single systems. We have conducted more extensive benchmarking of three recent, relaxed priority queues on four different types of systems with different key ran…
▽ More
A number of concurrent, relaxed priority queues have recently been proposed and implemented. Results are commonly reported for a throughput benchmark that uses a uniform distribution of keys drawn from a large integer range, and mostly for single systems. We have conducted more extensive benchmarking of three recent, relaxed priority queues on four different types of systems with different key ranges and distributions. While we can show superior throughput and scalability for our own k-LSM priority queue for the uniform key distribution, the picture changes drastically for other distributions, both with respect to achieved throughput and relative merit of the priority queues. The throughput benchmark alone is thus not sufficient to characterize the performance of concurrent priority queues. Our benchmark code and k-LSM priority queue are publicly available to foster future comparison.
△ Less
Submitted 16 March, 2016;
originally announced March 2016.
-
The Lock-free $k$-LSM Relaxed Priority Queue
Authors:
Martin Wimmer,
Jakob Gruber,
Jesper Larsson Träff,
Philippas Tsigas
Abstract:
Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data structures that can be used…
▽ More
Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the $ρ+1$ smallest keys, where $ρ$ is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
An improved, easily computable combinatorial lower bound for weighted graph bipartitioning
Authors:
Jesper Larsson Träff,
Martin Wimmer
Abstract:
There has recently been much progress on exact algorithms for the (un)weighted graph (bi)partitioning problem using branch-and-bound and related methods. In this note we present and improve an easily computable, purely combinatorial lower bound for the weighted bipartitioning problem. The bound is computable in $O(n\log n+m)$ time steps for weighted graphs with $n$ vertices and $m$ edges. In the b…
▽ More
There has recently been much progress on exact algorithms for the (un)weighted graph (bi)partitioning problem using branch-and-bound and related methods. In this note we present and improve an easily computable, purely combinatorial lower bound for the weighted bipartitioning problem. The bound is computable in $O(n\log n+m)$ time steps for weighted graphs with $n$ vertices and $m$ edges. In the branch-and-bound setting, the bound for each new subproblem can be updated in $O(n+(m/n)\log n)$ time steps amortized over a series of $n$ branching steps; a rarely triggered tightening of the bound requires search on the graph of unassigned vertices and can take from $O(n+m)$ to $O(nm+n^2\log n)$ steps depending on implementation and possible bound quality. Representing a subproblem uses $O(n)$ space. Although the bound is weak, we believe that it can be advantageous in a parallel setting to be able to generate many subproblems fast, possibly out-weighting the advantages of tighter, but much more expensive (algebraic, spectral, flow) lower bounds.
We use a recent priority task-scheduling framework for giving a parallel implementation, and show the relative improvements in bound quality and solution speed by the different contributions of the lower bound. A detailed comparison with standardized input graphs to other lower bounds and frameworks is pending. Detailed investigations of branching and subproblem selection rules are likewise not the focus here, but various options are discussed.
△ Less
Submitted 2 October, 2014;
originally announced October 2014.
-
Data Structures for Task-based Priority Scheduling
Authors:
Martin Wimmer,
Daniel Cederman,
Francesco Versaci,
Jesper Larsson Träff,
Philippas Tsigas
Abstract:
Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide…
▽ More
Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide any guarantees for task-ordering in-between threads. Next, we present a centralized priority data structure based on $k$-fifo queues, which provides strong (but still relaxed with regard to a sequential specification) guarantees. The parameter $k$ allows to dynamically configure the trade-off between scalability and the required ordering guarantee. Third, and finally, we combine both data structures into a hybrid, $k$-priority data structure, which provides scalability similar to the work-stealing based approach for larger $k$, while giving strong ordering guarantees for smaller $k$. We argue for using the hybrid data structure as the best compromise for generic, priority-based task-scheduling.
We analyze the behavior and trade-offs of our data structures in the context of a simple parallelization of Dijkstra's single-source shortest path algorithm. Our theoretical analysis and simulations show that both the centralized and the hybrid $k$-priority based data structures can give strong guarantees on the useful work performed by the parallel Dijkstra algorithm. We support our results with experimental evidence on an 80-core Intel Xeon system.
△ Less
Submitted 9 December, 2013;
originally announced December 2013.
-
Configurable Strategies for Work-stealing
Authors:
Martin Wimmer,
Daniel Cederman,
Jesper Larsson Träff,
Philippas Tsigas
Abstract:
Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual task execution order is typically determined by the underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing t…
▽ More
Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual task execution order is typically determined by the underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing task parallel executions by providing information on specific tasks and their preferred execution order to the scheduling system.
We introduce scheduling strategies to enable applications to dynamically provide hints to the task-scheduling system on the nature of specific tasks. Scheduling strategies can be used to independently control both local task execution order as well as steal order. In contrast to conventional scheduling policies that are normally global in scope, strategies allow the scheduler to apply optimizations on individual tasks. This flexibility greatly improves composability as it allows the scheduler to apply different, specific scheduling choices for different parts of applications simultaneously. We present a number of benchmarks that highlight diverse, beneficial effects that can be achieved with scheduling strategies. Some benchmarks (branch-and-bound, single-source shortest path) show that prioritization of tasks can reduce the total amount of work compared to standard work-stealing execution order. For other benchmarks (triangle strip generation) qualitatively better results can be achieved in shorter time. Other optimizations, such as dynamic merging of tasks or stealing of half the work, instead of half the tasks, are also shown to improve performance. Composability is demonstrated by examples that combine different strategies, both within the same kernel (prefix sum) as well as when scheduling multiple kernels (prefix sum and unbalanced tree search).
△ Less
Submitted 28 May, 2013;
originally announced May 2013.
-
Efficient numerical computation of the Pfaffian for dense and banded skew-symmetric matrices
Authors:
M. Wimmer
Abstract:
Computing the Pfaffian of a skew-symmetric matrix is a problem that arises in various fields of physics. Both computing the Pfaffian and a related problem, computing the canonical form of a skew-symmetric matrix under unitary congruence, can be solved easily once the skew-symmetric matrix has been reduced to skew-symmetric tridiagonal form. We develop efficient numerical methods for computing this…
▽ More
Computing the Pfaffian of a skew-symmetric matrix is a problem that arises in various fields of physics. Both computing the Pfaffian and a related problem, computing the canonical form of a skew-symmetric matrix under unitary congruence, can be solved easily once the skew-symmetric matrix has been reduced to skew-symmetric tridiagonal form. We develop efficient numerical methods for computing this tridiagonal form based on Gauss transformations, using a skew-symmetric, blocked form of the Parlett-Reid algorithm, or based on unitary transformations, using block Householder transformations and Givens rotations, that are applicable to dense and banded matrices, respectively. We also give a complete and fully optimized implementation of these algorithms in Fortran, and also provide Python, Matlab and Mathematica implementations for convenience. Finally, we apply these methods to compute the topological charge of a class D nanowire, and show numerically the equivalence of definitions based on the Hamiltonian and the scattering matrix.
△ Less
Submitted 6 April, 2011; v1 submitted 16 February, 2011;
originally announced February 2011.
-
Work-stealing for mixed-mode parallelism by deterministic team-building
Authors:
Martin Wimmer,
Jesper Larsson Träff
Abstract:
We show how to extend classical work-stealing to deal also with data parallel tasks that can require any number of threads r >= 1 for their execution. We explain in detail the so introduced idea of work-stealing with deterministic team-building which in a natural way generalizes classical work-stealing. A prototype C++ implementation of the generalized work-stealing algorithm has been given and is…
▽ More
We show how to extend classical work-stealing to deal also with data parallel tasks that can require any number of threads r >= 1 for their execution. We explain in detail the so introduced idea of work-stealing with deterministic team-building which in a natural way generalizes classical work-stealing. A prototype C++ implementation of the generalized work-stealing algorithm has been given and is briefly described. Building on this, a serious, well-known contender for a best parallel Quicksort algorithm has been implemented, which naturally relies on both task and data parallelism. For instance, sorting 2^27-1 randomly generated integers we could improve the speed-up from 5.1 to 8.7 on a 32-core Intel Nehalem EX system, being consistently better than the tuned, task-parallel Cilk++ system.
△ Less
Submitted 22 December, 2010;
originally announced December 2010.