-
Certified Robustness to Data Poisoning in Gradient-Based Training
Authors:
Philip Sosnin,
Mark N. Müller,
Maximilian Baader,
Calvin Tsay,
Matthew Wicker
Abstract:
Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained…
▽ More
Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. In particular, our framework certifies robustness against untargeted and targeted poisoning as well as backdoor attacks for both input and label manipulations. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
DAGER: Exact Gradient Inversion for Large Language Models
Authors:
Ivo Petrov,
Dimitar I. Dimitrov,
Maximilian Baader,
Mark Niklas Müller,
Martin Vechev
Abstract:
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit…
▽ More
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Overcoming the Paradox of Certified Training with Gaussian Smoothing
Authors:
Stefan Balauca,
Mark Niklas Müller,
Yuhao Mao,
Maximilian Baader,
Marc Fischer,
Martin Vechev
Abstract:
Training neural networks with high certified accuracy against adversarial examples remains an open problem despite significant efforts. While certification methods can effectively leverage tight convex relaxations for bound computation, in training, these methods perform worse than looser relaxations. Prior work hypothesized that this is caused by the discontinuity and perturbation sensitivity of…
▽ More
Training neural networks with high certified accuracy against adversarial examples remains an open problem despite significant efforts. While certification methods can effectively leverage tight convex relaxations for bound computation, in training, these methods perform worse than looser relaxations. Prior work hypothesized that this is caused by the discontinuity and perturbation sensitivity of the loss surface induced by these tighter relaxations. In this work, we show theoretically that Gaussian Loss Smoothing can alleviate both issues. We confirm this empirically by proposing a certified training method combining PGPE, an algorithm computing gradients of a smoothed loss, with different convex relaxations. When using this training method, we observe that tighter bounds indeed lead to strictly better networks. While scaling PGPE training remains challenging due to high computational cost, we show that by using a not theoretically sound, yet much cheaper smoothing approximation, we obtain better certified accuracies than state-of-the-art methods when training on the same network architecture. Our results clearly demonstrate the promise of Gaussian Loss Smoothing for training certifiably robust neural networks.
△ Less
Submitted 25 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
SPEAR:Exact Gradient Inversion of Batches in Federated Learning
Authors:
Dimitar I. Dimitrov,
Maximilian Baader,
Mark Niklas Müller,
Martin Vechev
Abstract:
Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larg…
▽ More
Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b \lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.
△ Less
Submitted 3 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Evading Data Contamination Detection for Language Models is (too) Easy
Authors:
Jasper Dekoninck,
Mark Niklas Müller,
Maximilian Baader,
Marc Fischer,
Martin Vechev
Abstract:
Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they ove…
▽ More
Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We argue that this setting is of crucial importance as it casts doubt on the reliability of public benchmarks. To more rigorously study this issue, we propose a categorization of both model providers and contamination detection methods. This reveals vulnerabilities in existing methods that we exploit with EAL, a simple yet effective contamination technique that significantly inflates benchmark performance while completely evading current detection methods.
△ Less
Submitted 12 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Expressivity of ReLU-Networks under Convex Relaxations
Authors:
Maximilian Baader,
Mark Niklas Müller,
Yuhao Mao,
Martin Vechev
Abstract:
Convex relaxations are a key component of training and certifying provably safe neural networks. However, despite substantial progress, a wide and poorly understood accuracy gap to standard networks remains, raising the question of whether this is due to fundamental limitations of convex relaxations. Initial work investigating this question focused on the simple and widely used IBP relaxation. It…
▽ More
Convex relaxations are a key component of training and certifying provably safe neural networks. However, despite substantial progress, a wide and poorly understood accuracy gap to standard networks remains, raising the question of whether this is due to fundamental limitations of convex relaxations. Initial work investigating this question focused on the simple and widely used IBP relaxation. It revealed that some univariate, convex, continuous piecewise linear (CPWL) functions cannot be encoded by any ReLU network such that its IBP-analysis is precise. To explore whether this limitation is shared by more advanced convex relaxations, we conduct the first in-depth study on the expressive power of ReLU networks across all commonly used convex relaxations. We show that: (i) more advanced relaxations allow a larger class of univariate functions to be expressed as precisely analyzable ReLU networks, (ii) more precise relaxations can allow exponentially larger solution spaces of ReLU networks encoding the same functions, and (iii) even using the most precise single-neuron relaxations, it is impossible to construct precisely analyzable ReLU networks that express multivariate, convex, monotone CPWL functions.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Conversational Swarm Intelligence, a Pilot Study
Authors:
Louis Rosenberg,
Gregg Willcox,
Hans Schumann,
Miles Bader,
Ganesh Mani,
Kokoro Sagae,
Devang Acharya,
Yuxin Zheng,
Andrew Kim,
Jialing Deng
Abstract:
Conversational Swarm Intelligence (CSI) is a new method for enabling large human groups to hold real-time networked conversations using a technique modeled on the dynamics of biological swarms. Through the novel use of conversational agents powered by Large Language Models (LLMs), the CSI structure simultaneously enables local dialog among small deliberative groups and global propagation of conver…
▽ More
Conversational Swarm Intelligence (CSI) is a new method for enabling large human groups to hold real-time networked conversations using a technique modeled on the dynamics of biological swarms. Through the novel use of conversational agents powered by Large Language Models (LLMs), the CSI structure simultaneously enables local dialog among small deliberative groups and global propagation of conversational content across a larger population. In this way, CSI combines the benefits of small-group deliberative reasoning and large-scale collective intelligence. In this pilot study, participants deliberating in conversational swarms (via text chat) (a) produced 30% more contributions (p<0.05) than participants deliberating in a standard centralized chat room and (b) demonstrated 7.2% less variance in contribution quantity. These results indicate that users contributed more content and participated more evenly when using the CSI structure.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Abstraqt: Analysis of Quantum Circuits via Abstract Stabilizer Simulation
Authors:
Benjamin Bichsel,
Anouk Paradis,
Maximilian Baader,
Martin Vechev
Abstract:
Stabilizer simulation can efficiently simulate an important class of quantum circuits consisting exclusively of Clifford gates. However, all existing extensions of this simulation to arbitrary quantum circuits including non-Clifford gates suffer from an exponential runtime.
To address this challenge, we present a novel approach for efficient stabilizer simulation on arbitrary quantum circuits, a…
▽ More
Stabilizer simulation can efficiently simulate an important class of quantum circuits consisting exclusively of Clifford gates. However, all existing extensions of this simulation to arbitrary quantum circuits including non-Clifford gates suffer from an exponential runtime.
To address this challenge, we present a novel approach for efficient stabilizer simulation on arbitrary quantum circuits, at the cost of lost precision. Our key idea is to compress an exponential sum representation of the quantum state into a single abstract summand covering (at least) all occurring summands. This allows us to introduce an abstract stabilizer simulator that efficiently manipulates abstract summands by over-approximating the effect of circuit operations including Clifford gates, non-Clifford gates, and (internal) measurements.
We implemented our abstract simulator in a tool called Abstraqt and experimentally demonstrate that Abstraqt can establish circuit properties intractable for existing techniques.
△ Less
Submitted 14 November, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
The Fundamental Limits of Interval Arithmetic for Neural Networks
Authors:
Matthew Mirman,
Maximilian Baader,
Martin Vechev
Abstract:
Interval analysis (or interval bound propagation, IBP) is a popular technique for verifying and training provably robust deep neural networks, a fundamental challenge in the area of reliable machine learning. However, despite substantial efforts, progress on addressing this key challenge has stagnated, calling into question whether interval arithmetic is a viable path forward.
In this paper we p…
▽ More
Interval analysis (or interval bound propagation, IBP) is a popular technique for verifying and training provably robust deep neural networks, a fundamental challenge in the area of reliable machine learning. However, despite substantial efforts, progress on addressing this key challenge has stagnated, calling into question whether interval arithmetic is a viable path forward.
In this paper we present two fundamental results on the limitations of interval arithmetic for analyzing neural networks. Our main impossibility theorem states that for any neural network classifying just three points, there is a valid specification over these points that interval analysis can not prove. Further, in the restricted case of one-hidden-layer neural networks we show a stronger impossibility result: given any radius $α< 1$, there is a set of $O(α^{-1})$ points with robust radius $α$, separated by distance $2$, that no one-hidden-layer network can be proven to classify robustly via interval analysis.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Latent Space Smoothing for Individually Fair Representations
Authors:
Momchil Peychev,
Anian Ruoss,
Mislav Balunović,
Maximilian Baader,
Martin Vechev
Abstract:
Fair representation learning transforms user data into a representation that ensures fairness and utility regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation…
▽ More
Fair representation learning transforms user data into a representation that ensures fairness and utility regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation learning method for certifying individual fairness of high-dimensional data. Our key insight is to leverage recent advances in generative modeling to capture the set of similar individuals in the generative latent space. This enables us to learn individually fair representations that map similar individuals close together by using adversarial training to minimize the distance between their representations. Finally, we employ randomized smoothing to provably map similar individuals close together, in turn ensuring that local robustness verification of the downstream application results in end-to-end fairness certification. Our experimental evaluation on challenging real-world image data demonstrates that our method increases certified individual fairness by up to 90% without significantly affecting task utility.
△ Less
Submitted 26 July, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Doubt and Redundancy Kill Soft Errors -- Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software
Authors:
Philipp Samfass,
Tobias Weinzierl,
Anne Reinarz,
Michael Bader
Abstract:
Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection scheme that relies on error criteria per task outcome. They formalise how ``dubious'' an outcome is, i.e. how likely it contains an error. Our whole simulation…
▽ More
Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection scheme that relies on error criteria per task outcome. They formalise how ``dubious'' an outcome is, i.e. how likely it contains an error. Our whole simulation is replicated once, forming two teams of MPI ranks that share their task results. Thus, ideally each team handles only around half of the workload. If a task yields large error criteria values, i.e.~is dubious, we compute the task redundantly and compare the outcomes. Whenever they disagree, the task result with a lower error likeliness is accepted. We obtain a self-healing, resilient algorithm which can compensate silent floating-point errors without a significant performance, I/O or memory footprint penalty. Case studies however suggest that a careful, domain-specific tailoring of the error criteria remains essential.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
An Efficient ADER-DG Local Time Step** Scheme for 3D HPC Simulation of Seismic Waves in Poroelastic Media
Authors:
Sebastian Wolf,
Martin Galis,
Carsten Uphoff,
Alice-Agnes Gabriel,
Peter Moczo,
David Gregor,
Michael Bader
Abstract:
Many applications from geosciences require simulations of seismic waves in porous media. Biot's theory of poroelasticity describes the coupling between solid and fluid phases and introduces a stiff source term, thereby increasing computational cost and motivating efficient methods utilising High-Performance Computing. We present a novel realisation of the discontinuous Galerkin scheme with Arbitra…
▽ More
Many applications from geosciences require simulations of seismic waves in porous media. Biot's theory of poroelasticity describes the coupling between solid and fluid phases and introduces a stiff source term, thereby increasing computational cost and motivating efficient methods utilising High-Performance Computing. We present a novel realisation of the discontinuous Galerkin scheme with Arbitrary DERivative time step** (ADER-DG) that copes with stiff source terms.
To integrate this source term with a reasonable time step size, we use an element-local space-time predictor, which needs to solve medium-sized linear systems - with 1000 to 10000 unknowns - in each element update (i.e., billions of times). We present a novel block-wise back-substitution algorithm for solving these systems efficiently. In comparison to LU decomposition, we reduce the number of floating-point operations by a factor of up to 25. The block-wise back-substitution is mapped to a sequence of small matrix-matrix multiplications, for which code generators are available to generate highly optimised code.
We verify the new solver thoroughly in problems of increasing complexity. We demonstrate high-order convergence for 3D problems. We verify the correct treatment of point sources, material interfaces and traction-free boundary conditions. In addition, we compare against a finite difference code for a newly defined layer over half-space problem. We find that extremely high accuracy is required to resolve the slow P-wave at a free surface, while solid particle velocities are not affected by coarser resolutions. By using a clustered local time step** scheme, we reduce time to solution by a factor of 6 to 10 compared to global time step**. We conclude our study with a scaling and performance analysis, demonstrating our implementation's efficiency and its potential for extreme-scale simulations.
△ Less
Submitted 1 March, 2022; v1 submitted 24 August, 2021;
originally announced August 2021.
-
High Performance Uncertainty Quantification with Parallelized Multilevel Markov Chain Monte Carlo
Authors:
Linus Seelinger,
Anne Reinarz,
Leonhard Rannabauer,
Michael Bader,
Peter Bastian,
Robert Scheichl
Abstract:
Numerical models of complex real-world phenomena often necessitate High Performance Computing (HPC). Uncertainties increase problem dimensionality further and pose even greater challenges.
We present a parallelization strategy for multilevel Markov chain Monte Carlo, a state-of-the-art, algorithmically scalable Uncertainty Quantification (UQ) algorithm for Bayesian inverse problems, and a new so…
▽ More
Numerical models of complex real-world phenomena often necessitate High Performance Computing (HPC). Uncertainties increase problem dimensionality further and pose even greater challenges.
We present a parallelization strategy for multilevel Markov chain Monte Carlo, a state-of-the-art, algorithmically scalable Uncertainty Quantification (UQ) algorithm for Bayesian inverse problems, and a new software framework allowing for large-scale parallelism across forward model evaluations and the UQ algorithms themselves. The main scalability challenge presents itself in the form of strong data dependencies introduced by the MLMCMC method, prohibiting trivial parallelization.
Our software is released as part of the modular and open-source MIT UQ Library (MUQ), and can easily be coupled with arbitrary user codes. We demonstrate it using the DUNE and the ExaHyPE Engine. The latter provides a realistic, large-scale tsunami model in which identify the source of a tsunami from buoy-elevation data.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
3D Acoustic-Elastic Coupling with Gravity: The Dynamics of the 2018 Palu, Sulawesi Earthquake and Tsunami
Authors:
Lukas Krenz,
Carsten Uphoff,
Thomas Ulrich,
Alice-Agnes Gabriel,
Lauren S. Abrahams,
Eric M. Dunham,
Michael Bader
Abstract:
We present a highly scalable 3D fully-coupled Earth & ocean model of earthquake rupture and tsunami generation. We model seismic, acoustic and surface gravity wave propagation in elastic (Earth) and acoustic (ocean) materials sourced by physics-based non-linear earthquake dynamic rupture. Complicated geometries, including high-resolution bathymetry, coastlines and segmented earthquake faults are d…
▽ More
We present a highly scalable 3D fully-coupled Earth & ocean model of earthquake rupture and tsunami generation. We model seismic, acoustic and surface gravity wave propagation in elastic (Earth) and acoustic (ocean) materials sourced by physics-based non-linear earthquake dynamic rupture. Complicated geometries, including high-resolution bathymetry, coastlines and segmented earthquake faults are discretized by adaptive unstructured tetrahedral meshes. A Discontinuous Galerkin discretization with ADER local time-step** (ADER-DG) yields petascale computational efficiency and high-order accuracy in time and space.
We compare the 3D fully-coupled approach to a benchmark problem for 3D-2D linked models that use 2D shallow-water modeling. We present a large-scale fully-coupled model of the 2018 Sulawesi events that links the dynamics from supershear earthquake faulting to elastic and acoustic waves in Earth and ocean to tsunami gravity wave propagation in the narrow Palu Bay. And we demonstrate scalability and performance of the MPI+OpenMP parallelization on three petascale supercomputers.
△ Less
Submitted 22 November, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Scalable Certified Segmentation via Randomized Smoothing
Authors:
Marc Fischer,
Maximilian Baader,
Martin Vechev
Abstract:
We present a new certification method for image and point cloud segmentation based on randomized smoothing. The method leverages a novel scalable algorithm for prediction and certification that correctly accounts for multiple testing, necessary for ensuring statistical guarantees. The key to our approach is reliance on established multiple-testing correction mechanisms as well as the ability to ab…
▽ More
We present a new certification method for image and point cloud segmentation based on randomized smoothing. The method leverages a novel scalable algorithm for prediction and certification that correctly accounts for multiple testing, necessary for ensuring statistical guarantees. The key to our approach is reliance on established multiple-testing correction mechanisms as well as the ability to abstain from classifying single pixels or points while still robustly segmenting the overall input. Our experimental evaluation on synthetic data and challenging datasets, such as Pascal Context, Cityscapes, and ShapeNet, shows that our algorithm can achieve, for the first time, competitive accuracy and certification guarantees on real-world segmentation tasks. We provide an implementation at https://github.com/eth-sri/segmentation-smoothing.
△ Less
Submitted 27 July, 2022; v1 submitted 1 July, 2021;
originally announced July 2021.
-
On the Paradox of Certified Training
Authors:
Nikola Jovanović,
Mislav Balunović,
Maximilian Baader,
Martin Vechev
Abstract:
Certified defenses based on convex relaxations are an established technique for training provably robust models. The key component is the choice of relaxation, varying from simple intervals to tight polyhedra. Counterintuitively, loose interval-based training often leads to higher certified robustness than what can be achieved with tighter relaxations, which is a well-known but poorly understood p…
▽ More
Certified defenses based on convex relaxations are an established technique for training provably robust models. The key component is the choice of relaxation, varying from simple intervals to tight polyhedra. Counterintuitively, loose interval-based training often leads to higher certified robustness than what can be achieved with tighter relaxations, which is a well-known but poorly understood paradox. While recent works introduced various improvements aiming to circumvent this issue in practice, the fundamental problem of training models with high certified robustness remains unsolved. In this work, we investigate the underlying reasons behind the paradox and identify two key properties of relaxations, beyond tightness, that impact certified training dynamics: continuity and sensitivity. Our extensive experimental evaluation with a number of popular convex relaxations provides strong evidence that these factors can explain the drop in certified robustness observed for tighter relaxations. We also systematically explore modifications of existing relaxations and discover that improving unfavorable properties is challenging, as such attempts often harm other properties, revealing a complex tradeoff. Our findings represent an important first step towards understanding the intricate optimization challenges involved in certified training.
△ Less
Submitted 12 October, 2022; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC
Authors:
Mohamed Bader,
Ismail Shahin,
Abdelfatah Hassan
Abstract:
Recently there has been a formidable work which has been put up from the people who are working in the frontlines such as hospitals, clinics, and labs alongside researchers and scientists who are also putting tremendous efforts in the fight against COVID-19 pandemic. Due to the preposterous spread of the virus, the integration of the artificial intelligence has taken a considerable part in the hea…
▽ More
Recently there has been a formidable work which has been put up from the people who are working in the frontlines such as hospitals, clinics, and labs alongside researchers and scientists who are also putting tremendous efforts in the fight against COVID-19 pandemic. Due to the preposterous spread of the virus, the integration of the artificial intelligence has taken a considerable part in the health sector, by implementing the fundamentals of Automatic Speech Recognition (ASR) and deep learning algorithms. In this paper, we illustrate the importance of speech signal processing in the extraction of the Mel-Frequency Cepstral Coefficients (MFCCs) of the COVID-19 and non-COVID-19 samples and find their relationship using Pearson correlation coefficients. Our results show high similarity in MFCCs between different COVID-19 cough and breathing sounds, while MFCC of voice is more robust between COVID-19 and non-COVID-19 samples. Moreover, our results are preliminary, and there is a possibility to exclude the voices of COVID-19 patients from further processing in diagnosing the disease.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Efficient Certification of Spatial Robustness
Authors:
Anian Ruoss,
Maximilian Baader,
Mislav Balunović,
Martin Vechev
Abstract:
Recent work has exposed the vulnerability of computer vision models to vector field attacks. Due to the widespread usage of such models in safety-critical applications, it is crucial to quantify their robustness against such spatial transformations. However, existing work only provides empirical robustness quantification against vector field deformations via adversarial attacks, which lack provabl…
▽ More
Recent work has exposed the vulnerability of computer vision models to vector field attacks. Due to the widespread usage of such models in safety-critical applications, it is crucial to quantify their robustness against such spatial transformations. However, existing work only provides empirical robustness quantification against vector field deformations via adversarial attacks, which lack provable guarantees. In this work, we propose novel convex relaxations, enabling us, for the first time, to provide a certificate of robustness against vector field transformations. Our relaxations are model-agnostic and can be leveraged by a wide range of neural network verifiers. Experiments on various network architectures and different datasets demonstrate the effectiveness and scalability of our method.
△ Less
Submitted 30 January, 2021; v1 submitted 19 September, 2020;
originally announced September 2020.
-
TeaMPI -- Replication-based Resilience without the (Performance) Pain
Authors:
Philipp Samfass,
Tobias Weinzierl,
Benjamin Hazelwood,
Michael Bader
Abstract:
In an era where we can not afford to checkpoint frequently, replication is a generic way forward to construct numerical simulations that can continue to run even if hardware parts fail. Yet, replication often is not employed on larger scales, as naïvely mirroring a computation once effectively halves the machine size, and as kee** replicated simulations consistent with each other is not trivial.…
▽ More
In an era where we can not afford to checkpoint frequently, replication is a generic way forward to construct numerical simulations that can continue to run even if hardware parts fail. Yet, replication often is not employed on larger scales, as naïvely mirroring a computation once effectively halves the machine size, and as kee** replicated simulations consistent with each other is not trivial. We demonstrate for the ExaHyPE engine -- a task-based solver for hyperbolic equation systems -- that it is possible to realise resiliency without major code changes on the user side, while we introduce a novel algorithmic idea where replication reduces the time-to-solution. The redundant CPU cycles are not burned "for nothing". Our work employs a weakly consistent data model where replicas run independently yet inform each other through heartbeat messages whether they are still up and running. Our key performance idea is to let the tasks of the replicated simulations share some of their outcomes, while we shuffle the actual task execution order per replica. This way, replicated ranks can skip some local computations and automatically start to synchronise with each other. Our experiments with a production-level seismic wave-equation solver provide evidence that this novel concept has the potential to make replication affordable for large-scale simulations in high-performance computing.
△ Less
Submitted 1 July, 2020; v1 submitted 25 May, 2020;
originally announced May 2020.
-
An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action
Authors:
Hartwig Anzt,
Felix Bach,
Stephan Druskat,
Frank Löffler,
Axel Loewe,
Bernhard Y. Renard,
Gunnar Seemann,
Alexander Struck,
Elke Achhammer,
Piush Aggarwal,
Franziska Appel,
Michael Bader,
Lutz Brusch,
Christian Busse,
Gerasimos Chourdakis,
Piotr W. Dabrowski,
Peter Ebert,
Bernd Flemisch,
Sven Friedl,
Bernadette Fritzsch,
Maximilian D. Funk,
Volker Gast,
Florian Goth,
Jean-Noël Grad,
Sibylle Hermann
, et al. (18 additional authors not shown)
Abstract:
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software…
▽ More
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
△ Less
Submitted 5 May, 2020; v1 submitted 27 April, 2020;
originally announced May 2020.
-
Vectorization and Minimization of Memory Footprint for Linear High-Order Discontinuous Galerkin Schemes
Authors:
Jean-Matthieu Gallard,
Leonhard Rannabauer,
Anne Reinarz,
Michael Bader
Abstract:
We present a sequence of optimizations to the performance-critical compute kernels of the high-order discontinuous Galerkin solver of the hyperbolic PDE engine ExaHyPE -- successively tackling bottlenecks due to SIMD operations, cache hierarchies and restrictions in the software design.
Starting from a generic scalar implementation of the numerical scheme, our first optimized variant applies sta…
▽ More
We present a sequence of optimizations to the performance-critical compute kernels of the high-order discontinuous Galerkin solver of the hyperbolic PDE engine ExaHyPE -- successively tackling bottlenecks due to SIMD operations, cache hierarchies and restrictions in the software design.
Starting from a generic scalar implementation of the numerical scheme, our first optimized variant applies state-of-the-art optimization techniques by vectorizing loops, improving the data layout and using Loop-over-GEMM to perform tensor contractions via highly optimized matrix multiplication functions provided by the LIBXSMM library. We show that memory stalls due to a memory footprint exceeding our L2 cache size hindered the vectorization gains. We therefore introduce a new kernel that applies a sum factorization approach to reduce the kernel's memory footprint and improve its cache locality. With the L2 cache bottleneck removed, we were able to exploit additional vectorization opportunities, by introducing a hybrid Array-of-Structure-of-Array data layout that solves the data layout conflict between matrix multiplications kernels and the point-wise functions to implement PDE-specific terms.
With this last kernel, evaluated in a benchmark simulation at high polynomial order, only 2\% of the floating point operations are still performed using scalar instructions and 22.5\% of the available performance is achieved.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.
-
Certified Defense to Image Transformations via Randomized Smoothing
Authors:
Marc Fischer,
Maximilian Baader,
Martin Vechev
Abstract:
We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with $\ell^p$ norms). We…
▽ More
We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with $\ell^p$ norms). We address this challenge by introducing three different kinds of defenses, each with a different guarantee (heuristic, distributional and individual) stemming from the method used to bound the interpolation error. Importantly, we show how individual certificates can be obtained via either statistical error bounds or efficient online inverse computation of the image transformation. We provide an implementation of all methods at https://github.com/eth-sri/transformation-smoothing.
△ Less
Submitted 25 August, 2021; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Role-Oriented Code Generation in an Engine for Solving Hyperbolic PDE Systems
Authors:
Jean-Matthieu Gallard,
Lukas Krenz,
Leonhard Rannabauer,
Anne Reinarz,
Michael Bader
Abstract:
The development of a high performance PDE solver requires the combined expertise of interdisciplinary teams with respect to application domain, numerical scheme and low-level optimization. In this paper, we present how the ExaHyPE engine facilitates the collaboration of such teams by isolating three roles: application, algorithms, and optimization expert. We thus support team members in letting th…
▽ More
The development of a high performance PDE solver requires the combined expertise of interdisciplinary teams with respect to application domain, numerical scheme and low-level optimization. In this paper, we present how the ExaHyPE engine facilitates the collaboration of such teams by isolating three roles: application, algorithms, and optimization expert. We thus support team members in letting them focus on their own area of expertise while integrating their contributions into an HPC production code. Inspired by web application development practices, ExaHyPE relies on two custom code generation modules, the Toolkit and the Kernel Generator, which follow a Model-View-Controller architectural pattern on top of the **ja2 template engine library. Using **ja2's templates to abstract the critical components of the engine and generated glue code, we isolate the application development from the engine. The template language also allows us to define and use custom template macros that isolate low-level optimizations from the numerical scheme described in the templates. We present three use cases, each focusing on one of our user roles, showcasing how the design of the code generation modules allows to easily expand the solver schemes to support novel demands from applications, to add optimized algorithmic schemes (with reduced memory footprint, e.g.), or provide improved low-level SIMD vectorization support.
△ Less
Submitted 28 March, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
A stable discontinuous Galerkin method for the perfectly matched layer for elastodynamics in first order form
Authors:
Kenneth Duru,
Leonhard Rannabauer,
Alice-Agnes Gabriel,
Gunilla Kreiss,
Michael Bader
Abstract:
We present a stable discontinuous Galerkin (DG) method with a perfectly matched layer (PML) for three and two space dimensional linear elastodynamics, in velocity-stress formulation, subject to well-posed linear boundary conditions. First, we consider the elastodynamics equation, in a cuboidal domain, and derive an unsplit PML truncating the domain using complex coordinate stretching. Leveraging t…
▽ More
We present a stable discontinuous Galerkin (DG) method with a perfectly matched layer (PML) for three and two space dimensional linear elastodynamics, in velocity-stress formulation, subject to well-posed linear boundary conditions. First, we consider the elastodynamics equation, in a cuboidal domain, and derive an unsplit PML truncating the domain using complex coordinate stretching. Leveraging the hyperbolic structure of the underlying system, we construct continuous energy estimates, in the time domain for the elastic wave equation, and in the Laplace space for a sequence of PML model problems, with variations in one, two and three space dimensions, respectively. They correspond to PMLs normal to boundary faces, along edges and in corners. Second, we develop a DG numerical method for the linear elastodynamics equation using physically motivated numerical flux and penalty parameters, which are compatible with all well-posed, internal and external, boundary conditions. When the PML dam** vanishes, by construction, our choice of penalty parameters yield an upwind scheme and a discrete energy estimate analogous to the continuous energy estimate. Third, to ensure numerical stability of the discretization when PML dam** is present, it is necessary to extend the numerical DG fluxes, and the numerical inter-element and boundary procedures, to the PML auxiliary differential equations. This is crucial for deriving discrete energy estimates analogous to the continuous energy estimates. By combining the DG spatial approximation with the high order ADER time step** scheme and the accuracy of the PML we obtain an arbitrarily accurate wave propagation solver in the time domain. Numerical experiments are presented in two and three space dimensions corroborating the theoretical results.
△ Less
Submitted 6 January, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Universal Approximation with Certified Networks
Authors:
Maximilian Baader,
Matthew Mirman,
Martin Vechev
Abstract:
Training neural networks to be certifiably robust is critical to ensure their safety against adversarial attacks. However, it is currently very difficult to train a neural network that is both accurate and certifiably robust. In this work we take a step towards addressing this challenge. We prove that for every continuous function $f$, there exists a network $n$ such that: (i) $n$ approximates…
▽ More
Training neural networks to be certifiably robust is critical to ensure their safety against adversarial attacks. However, it is currently very difficult to train a neural network that is both accurate and certifiably robust. In this work we take a step towards addressing this challenge. We prove that for every continuous function $f$, there exists a network $n$ such that: (i) $n$ approximates $f$ arbitrarily close, and (ii) simple interval bound propagation of a region $B$ through $n$ yields a result that is arbitrarily close to the optimal output of $f$ on $B$. Our result can be seen as a Universal Approximation Theorem for interval-certified ReLU networks. To the best of our knowledge, this is the first work to prove the existence of accurate, interval-certified networks.
△ Less
Submitted 14 January, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement
Authors:
Philipp Samfass,
Tobias Weinzierl,
Dominic E. Charrier,
Michael Bader
Abstract:
Balancing the workload of sophisticated simulations is inherently difficult, since we have to balance both computational workload and memory footprint over meshes that can change any time or yield unpredictable cost per mesh entity, while modern supercomputers and their interconnects start to exhibit fluctuating performance. We propose a novel lightweight balancing technique for MPI+X to accompany…
▽ More
Balancing the workload of sophisticated simulations is inherently difficult, since we have to balance both computational workload and memory footprint over meshes that can change any time or yield unpredictable cost per mesh entity, while modern supercomputers and their interconnects start to exhibit fluctuating performance. We propose a novel lightweight balancing technique for MPI+X to accompany traditional, prediction-based load balancing. It is a reactive diffusion approach that uses online measurements of MPI idle time to migrate tasks temporarily from overloaded to underemployed ranks. Tasks are deployed to ranks which otherwise would wait, processed with high priority, and made available to the overloaded ranks again. This migration is non-persistent. Our approach hijacks idle time to do meaningful work and is totally non-blocking, asynchronous and distributed without a global data view. Tests with a seismic simulation code developed in the ExaHyPE engine uncover the method's potential. We found speed-ups of up to 2-3 for ill-balanced scenarios without logical modifications of the code base and show that the strategy is capable to react quickly to temporarily changing workload or node performance.
△ Less
Submitted 14 April, 2020; v1 submitted 13 September, 2019;
originally announced September 2019.
-
A stable discontinuous Galerkin method for linear elastodynamics in 3D geometrically complex media using physics based numerical fluxes
Authors:
Kenneth Duru,
Leonhard Rannabauer,
Alice-Agnes Gabriel,
On Ki Angel Ling,
Heiner Igel,
Michael Bader
Abstract:
High order accurate and explicit time-stable solvers are well suited for hyperbolic wave propagation problems. As a result of the complexities of real geometries, internal interfaces and nonlinear boundary and interface conditions, discontinuities and sharp wave fronts may become fundamental features of the solution. Thus, geometrically flexible and adaptive numerical algorithms are critical for h…
▽ More
High order accurate and explicit time-stable solvers are well suited for hyperbolic wave propagation problems. As a result of the complexities of real geometries, internal interfaces and nonlinear boundary and interface conditions, discontinuities and sharp wave fronts may become fundamental features of the solution. Thus, geometrically flexible and adaptive numerical algorithms are critical for high fidelity and efficient simulations of wave phenomena in many applications. Adaptive curvilinear meshes hold promise to minimise the effort to represent complicated geometries or heterogeneous material data avoiding the bottleneck of feature-preserving meshing. To enable the design of stable DG methods on three space dimensional (3D) curvilinear elements we construct a structure preserving anti-symmetric coordinate transformation motivated by the underlying physics. Using a physics-based numerical penalty-flux, we develop a 3D provably energy-stable discontinuous Galerkin finite element approximation of the elastic wave equation in geometrically complex and heterogenous media. By construction, our numerical flux is upwind and yields a discrete energy estimate analogous to the continuous energy estimate. The ability to treat conforming and non-conforming curvilinear elements allows for flexible adaptive mesh refinement strategies. The numerical scheme has been implemented in ExaHyPE, a simulation engine for parallel dynamically adaptive simulations of wave problems on adaptive Cartesian meshes. We present 3D numerical experiments of wave propagation in heterogeneous isotropic and anisotropic elastic solids demonstrating stability and high order accuracy. We demonstrate the potential of our approach for computational seismology in a regional wave propagation scenario in a geologically constrained 3D model including the geometrically complex free-surface topography of Mount Zugspitze, Germany.
△ Less
Submitted 10 April, 2021; v1 submitted 4 July, 2019;
originally announced July 2019.
-
ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems
Authors:
Anne Reinarz,
Dominic E. Charrier,
Michael Bader,
Luke Bovard,
Michael Dumbser,
Kenneth Duru,
Francesco Fambri,
Alice-Agnes Gabriel,
Jean-Matthieu Gallard,
Sven Köppel,
Lukas Krenz,
Leonhard Rannabauer,
Luciano Rezzolla,
Philipp Samfass,
Maurizio Tavelli,
Tobias Weinzierl
Abstract:
ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor c…
▽ More
ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor cores on state-of-the-art supercomputers. The engine is able to dynamically increase the accuracy of the simulation using adaptive mesh refinement where required. Due to the robustness and shock capturing abilities of ExaHyPE's numerical methods, users of the engine can simulate linear and non-linear hyperbolic PDEs with very high accuracy. Users can tailor the engine to their particular PDE by specifying evolved quantities, fluxes, and source terms. A complete simulation code for a new hyperbolic PDE can often be realised within a few hours - a task that, traditionally, can take weeks, months, often years for researchers starting from scratch. In this paper, we showcase ExaHyPE's workflow and capabilities through real-world scenarios from our two main application areas: seismology and astrophysics.
△ Less
Submitted 18 May, 2020; v1 submitted 20 May, 2019;
originally announced May 2019.
-
Yet Another Tensor Toolbox for discontinuous Galerkin methods and other applications
Authors:
Carsten Uphoff,
Michael Bader
Abstract:
The numerical solution of partial differential equations is at the heart of many grand challenges in supercomputing. Solvers based on high-order discontinuous Galerkin (DG) discretisation have been shown to scale on large supercomputers with excellent performance and efficiency, if the implementation exploits all levels of parallelism and is tailored to the specific architecture. However, every ye…
▽ More
The numerical solution of partial differential equations is at the heart of many grand challenges in supercomputing. Solvers based on high-order discontinuous Galerkin (DG) discretisation have been shown to scale on large supercomputers with excellent performance and efficiency, if the implementation exploits all levels of parallelism and is tailored to the specific architecture. However, every year new supercomputers emerge and the list of hardware-specific considerations grows, simultaneously with the list of desired features in a DG code. Thus we believe that a sustainable DG code needs an abstraction layer to implement the numerical scheme in a suitable language. We explore the possibility to abstract the numerical scheme as small tensor operations, describe them in a domain-specific language (DSL) resembling the Einstein notation, and to map them to existing code generators which generate small matrix matrix multiplication routines. The compiler for our DSL implements classic optimisations that are used for large tensor contractions, and we present novel optimisation techniques such as equivalent sparsity patterns and optimal index permutations for temporary tensors. Our application examples, which include the earthquake simulation software SeisSol, show that the generated kernels achieve over 50 % peak performance while the DSL considerably simplifies the implementation.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
Exploiting the Space Filling Curve Ordering of Particles in the Neighbour Search of Gadget3
Authors:
Antonio Ragagnin,
Nikola Tchipev,
Michael Bader,
Klaus Dolag,
Nicolay J. Hammer
Abstract:
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the N…
▽ More
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the Neighbour Search which takes advantage of the space-filling-curve particle ordering. Instead of performing Neighbour Search for all particles individually, nearby active particles can be grouped and one single Neighbour Search can be performed to obta\ in a common superset of neighbours. Thus, with this approach we reduce the number of searches. On the other hand, tree walks are performed within a larger searching radius. There is an optimal size of grou** that maximize the speedup, which we found by numerical experiments. We tested the algorithm within the boxes of the Magneticum project. As a result we obtained a speedup of $1.65$ in the Density and of $1.30$ in the Hydrodynamics computation, respectively, and a total speedup of $1.34.$
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes
Authors:
Anne Reinarz,
Jean-Matthieu Gallard,
Michael Bader
Abstract:
Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent re…
▽ More
Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.
△ Less
Submitted 21 May, 2019; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver
Authors:
Dominic E. Charrier,
Benjamin Hazelwood,
Ekaterina Tutlyaeva,
Michael Bader,
Michael Dumbser,
Andrey Kudryavtsev,
Alexander Moskovsky,
Tobias Weinzierl
Abstract:
We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive ta…
▽ More
We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive tasks and tasks which challenge the memory's latency. The expensive tasks and thus the whole code benefit from AVX vectorization, though we suffer from memory access bursts. A frequency reduction of the chip improves the code's energy-to-solution. Yet, it does not mitigate burst effects. The bursts' latency penalty becomes worse once we add Intel Optane technology, increase the core count significantly, or make individual, computationally heavy tasks fall out of close caches. Thread overbooking to hide away these latency penalties contra-productive with non-inclusive caches as it destroys the cache and vectorization character. In cases where memory-intense and computationally expensive tasks overlap, ExaHyPE's cache-oblivious implementation can exploit deep, non-inclusive, heterogeneous memory effectively, as main memory misses arise infrequently and slow down only few cores. We thus propose that upcoming supercomputing simulation codes with dynamic, inhomogeneous task graphs are actively supported by thread runtimes in intermixing tasks of different compute character, and we propose that future hardware actively allows codes to downclock the cores running particular task types.
△ Less
Submitted 25 March, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Visual Room-Awareness for Humanoid Robot Self-Localization
Authors:
Markus Bader,
Johann Prankl,
Markus Vincze
Abstract:
Humanoid robots without internal sensors such as a compass tend to lose their orientation after a fall. Furthermore, re-initialisation is often ambiguous due to symmetric man-made environments. The room-awareness module proposed here is inspired by the results of psychological experiments and improves existing self-localization strategies by map** and matching the visual background with colour h…
▽ More
Humanoid robots without internal sensors such as a compass tend to lose their orientation after a fall. Furthermore, re-initialisation is often ambiguous due to symmetric man-made environments. The room-awareness module proposed here is inspired by the results of psychological experiments and improves existing self-localization strategies by map** and matching the visual background with colour histograms. The matching algorithm uses a particle-filter to generate hypotheses of the viewing directions independent of the self-localization algorithm and generates confidence values for various possible poses. The robot's behaviour controller uses those confidence values to control self-localization algorithm to converge to the most likely pose and prevents the algorithm from getting stuck in local minima. Experiments with a symmetric Standard Platform League RoboCup playing field with a simulated and a real humanoid NAO robot show the significant improvement of the system.
△ Less
Submitted 22 April, 2013;
originally announced April 2013.
-
Fast GPGPU Data Rearrangement Kernels using CUDA
Authors:
Michael Bader,
Hans-Joachim Bungartz,
Dheevatsa Mudigere,
Srihari Narasimhan,
Babu Narayanan
Abstract:
Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including…
▽ More
Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including Permute, Reorder, Interlace/De-interlace, etc. We have also built kernels for generic Stencil computations on a two-dimensional data using templates and functors that allow application developers to rapidly build customized high performance kernels. All the kernels built achieve or surpass best-known performance in terms of bandwidth utilization.
△ Less
Submitted 15 November, 2010;
originally announced November 2010.