Search | arXiv e-print repository

Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

Authors: Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki

Abstract: Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations… ▽ More Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2407.09064 [pdf, other]

Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports

Authors: Malte Tölle, Lukas Burger, Halvar Kelm, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Groß, Anja Hennemuth, Lars Kaderali, Nina Krüger, Andreas Leha, Simon Martin, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer, Moritz Seiffert, Jan Moritz Seliger, Stefan Simm, Tim Friede, Tim Seidler, Sandy Engelhardt

Abstract: Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance.… ▽ More Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09038 [pdf, other]

High-Resolution Hyperspectral Video Imaging Using A Hexagonal Camera Array

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspect… ▽ More Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspectral image at once, also called snapshot hyperspectral imaging, to avoid spectral smearing. For this task, a high-resolution snapshot hyperspectral camera array using a hexagonal shape is introduced.The hexagonal array for hyperspectral imaging uses off-the-shelf hardware, which enables high flexibility regarding employed cameras, lenses and filters. Hence, the spectral range can be easily varied by mounting a different set of filters. Moreover, the concept of using off-the-shelf hardware enables low prices in comparison to other approaches with highly specialized hardware. Since classical industrial cameras are used in this hyperspectral camera array, the spatial and temporal resolution is very high, while recording 37 hyperspectral channels in the range from 400 nm to 760 nm in 10 nm steps. A registration process is required for near-field imaging, which maps the peripheral camera views to the center view. It is shown that this combination using a hyperspectral camera array and the corresponding image registration pipeline is superior in comparison to other popular snapshot approaches. For this evaluation, a synthetic hyperspectral database is rendered. On the synthetic data, the novel approach outperforms its best competitor by more than 3 dB in reconstruction quality. This synthetic data is also used to show the superiority of the hexagonal shape in comparison to an orthogonal-spaced one. Moreover, a real-world high resolution hyperspectral video database is provided. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07838 [pdf]

In-plane staging in lithium-ion intercalation of bilayer graphene

Authors: Thomas Astles, James G. McHugh, Rui Zhang, Qian Guo, Madeleine Howe, Zefei Wu, Kornelia Indykiewicz, Alex Summerfield, Zachary A. H. Goodwin, Sergey Slizovskiy, Daniil Domaretskiy, Andre K. Geim, Vladimir Falko, Irina V. Grigorieva

Abstract: The ongoing efforts to optimize Li-ion batteries led to the interest in intercalation of nanoscale layered compounds, including bilayer graphene. Its lithium intercalation has been demonstrated recently but the mechanisms underpinning the storage capacity remain poorly understood. Here, using magnetotransport measurements, we report in-operando intercalation dynamics of bilayer graphene. Unexpecte… ▽ More The ongoing efforts to optimize Li-ion batteries led to the interest in intercalation of nanoscale layered compounds, including bilayer graphene. Its lithium intercalation has been demonstrated recently but the mechanisms underpinning the storage capacity remain poorly understood. Here, using magnetotransport measurements, we report in-operando intercalation dynamics of bilayer graphene. Unexpectedly, we find four distinct intercalation stages that correspond to well-defined Li-ion densities. We refer to these stages as 'in-plane', with no in-plane analogues in bulk graphite. The fully intercalated bilayers represent a stoichiometric compound C14LiC14 with a Li density of 2.7x10^{14} cm^{-2}, notably lower than fully intercalated graphite. Combining the experimental findings and DFT calculations, we show that the critical step in bilayer intercalation is a transition from AB to AA stacking which occurs at a density of 0.9x10^{14} cm^{-2}. Our findings reveal the mechanism and limits for electrochemical intercalation of bilayer graphene and suggest possible avenues for increasing the Li storage capacity. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 30 pages, 17 figures

arXiv:2407.07726 [pdf, other]

PaliGemma: A versatile 3B VLM for transfer

Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07557 [pdf, other]

Federated Foundation Model for Cardiac CT Imaging

Authors: Malte Tölle, Philipp Garthe, Clemens Scherer, Jan Moritz Seliger, Andreas Leha, Nina Krüger, Stefan Simm, Simon Martin, Sebastian Eble, Halvar Kelm, Moritz Bednorz, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Stefan Groß, Anja Hennemuth, Lars Kaderali, Alexander Meyer, Eike Nagel, Stefan Orwat, Moritz Seiffert, Tim Friede, Tim Seidler, Sandy Engelhardt

Abstract: Federated learning (FL) is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often involve inherent challenges such as partially labeled datasets, where not all clients possess expert annotations of all labels of interest, leaving large portions of unlabeled data unused. In this study, we conduct the largest federated cardiac CT imagin… ▽ More Federated learning (FL) is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often involve inherent challenges such as partially labeled datasets, where not all clients possess expert annotations of all labels of interest, leaving large portions of unlabeled data unused. In this study, we conduct the largest federated cardiac CT imaging analysis to date, focusing on partially labeled datasets ($n=8,124$) of Transcatheter Aortic Valve Implantation (TAVI) patients over eight hospital clients. Transformer architectures, which are the major building blocks of current foundation models, have shown superior performance when trained on larger cohorts than traditional CNNs. However, when trained on small task-specific labeled sample sizes, it is currently not feasible to exploit their underlying attention mechanism for improved performance. Therefore, we developed a two-stage semi-supervised learning strategy that distills knowledge from several task-specific CNNs (landmark detection and segmentation of calcification) into a single transformer model by utilizing large amounts of unlabeled data typically residing unused in hospitals to mitigate these issues. This method not only improves the predictive accuracy and generalizability of transformer-based architectures but also facilitates the simultaneous learning of all partial labels within a single transformer model across the federation. Additionally, we show that our transformer-based model extracts more meaningful features for further downstream tasks than the UNet-based one by only training the last layer to also solve segmentation of coronary arteries. We make the code and weights of the final model openly available, which can serve as a foundation model for further research in cardiac CT imaging. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.05900 [pdf, other]

SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

Abstract: Enabling high compression efficiency while kee** encoding energy consumption at a low level, requires prioritization of which videos need more sophisticated encoding techniques. However, the effects vary highly based on the content, and information on how good a video can be compressed is required. This can be measured by estimating the encoded bitstream size prior to encoding. We identified the… ▽ More Enabling high compression efficiency while kee** encoding energy consumption at a low level, requires prioritization of which videos need more sophisticated encoding techniques. However, the effects vary highly based on the content, and information on how good a video can be compressed is required. This can be measured by estimating the encoded bitstream size prior to encoding. We identified the errors between estimated motion vectors from Motion Search, an algorithm that predicts temporal changes in videos, correlates well to the encoded bitstream size. Combining Motion Search with Random Forests, the encoding bitrate can be estimated with a Pearson correlation of above 0.96. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures, accepted for European Signal Processing Conference (EUSIPCO) 2024

arXiv:2407.05789 [pdf, other]

CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC

Authors: Philipp Bordne, M. Asif Hasan, Eddie Bergman, Noor Awad, André Biedenkapp

Abstract: High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduc… ▽ More High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 16 pages, 9 figures

arXiv:2407.05489 [pdf, other]

How Effective are State Space Models for Machine Translation?

Authors: Hugo Pitorro, Pavlo Vasylenko, Marcos Treviso, André F. T. Martins

Abstract: Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models, which enjoy efficient training and inference. However, it remains unclear whether these models are competitive with transformers in machine translation (MT). In… ▽ More Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models, which enjoy efficient training and inference. However, it remains unclear whether these models are competitive with transformers in machine translation (MT). In this paper, we provide a rigorous and comprehensive experimental comparison between transformers and linear recurrent models for MT. Concretely, we experiment with RetNet, Mamba, and hybrid versions of Mamba which incorporate attention mechanisms. Our findings demonstrate that Mamba is highly competitive with transformers on sentence and paragraph-level datasets, where in the latter both models benefit from shifting the training distribution towards longer sequences. Further analysis show that integrating attention into Mamba improves translation quality, robustness to sequence length extrapolation, and the ability to recall named entities. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05340 [pdf, other]

Interpreting the Residual Stream of ResNet18

Authors: André Longon

Abstract: A mechanistic understanding of the computations learned by deep neural networks (DNNs) is far from complete. In the domain of visual object recognition, prior research has illuminated inner workings of InceptionV1, but DNNs with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural mechanism which… ▽ More A mechanistic understanding of the computations learned by deep neural networks (DNNs) is far from complete. In the domain of visual object recognition, prior research has illuminated inner workings of InceptionV1, but DNNs with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural mechanism which InceptionV1 lacks. We observe that for a given block, channel features of the stream are updated along a spectrum: either the input feature skips to the output, the block feature overwrites the output, or the output is some mixture between the input and block features. Furthermore, we show that many residual stream channels compute scale invariant representations through a mixture of the input's smaller-scale feature with the block's larger-scale feature. This not only mounts evidence for the universality of scale equivariance, but also presents how the residual stream further implements scale invariance. Collectively, our results begin an interpretation of the residual stream in visual object recognition, finding it to be a flexible feature manager and a medium to build scale invariant representations. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04289 [pdf, other]

Electronic Correlations in Multielectron Silicon Quantum Dots

Authors: Dylan H. Liang, MengKe Feng, Philip Y. Mai, Jesus D. Cifuentes, Andrew S. Dzurak, Andre Saraiva

Abstract: Silicon quantum computing has the potential to revolutionize technology with capabilities to solve real-life problems that are computationally complex or even intractable for modern computers [1] by offering sufficient high quality qubits to perform complex error-corrected calculations. Silicon metal-oxide-semiconductor based quantum dots present a promising pathway for realizing practical quantum… ▽ More Silicon quantum computing has the potential to revolutionize technology with capabilities to solve real-life problems that are computationally complex or even intractable for modern computers [1] by offering sufficient high quality qubits to perform complex error-corrected calculations. Silicon metal-oxide-semiconductor based quantum dots present a promising pathway for realizing practical quantum computers. To improve certain qubit properties, it is a common strategy to incorporate multiple electrons in the same dot in order to form qubits in higher confined orbital states. Theoretical modelling is an essential part of understanding the quantum behaviour of these electrons, providing a basis for validating the physical working of device models as well as providing insights into experimental data. Hartree-Fock theory is an imperative tool for the electronic structure modelling of multi-electron quantum dots due to its ability to simulate a large number of electrons with manageable computation load. However, an efficient calculation of the self-consistent field becomes hard because dot formations in silicon are characterized by strong electron-electron interactions and conduction band valleys, besides the relatively high comparative effective mass, which add to create a behaviour dominated by repulsion between electrons rather than a well established shell structure. In this paper, we present a Hartree-Fock-based method that accounts for these complexities for the modelling of silicon quantum dots. With this method, we first establish the significance of including electron-electron interactions and valley degree of freedom and their implications. We then explore a simple case of anisotropic dots and observe the impact of anisotropy on dot formations. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03652 [pdf, other]

Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence

Authors: Teo Susnjak, Timothy R. McIntosh, Andre L. C. Barczak, Napoleon H. Reyes, Tong Liu, Paul Watters, Malka N. Halgamuge

Abstract: In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex syste… ▽ More In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. We employed agent-based modelling (ABM) to simulate hypothetical scenarios of AI systems' evolution under specific assumptions, using benchmark performance as a proxy for capability and complexity. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours. Additionally, we developed a practical methodology for detecting these critical thresholds using simulation data and stochastic gradient descent to fine-tune detection thresholds. This research offers a novel perspective on AI advancement that has a particular relevance to Large Language Models (LLMs), emphasising the need for a tempered approach to extrapolating AI's growth potential and underscoring the importance of develo** more robust and comprehensive AI performance benchmarks. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03532 [pdf, other]

Fast Radio Bursts and Artificial Neural Networks: a cosmological-model-independent estimation of the Hubble Constant

Authors: Jéferson A. S. Fortunato, David J. Bacon, Wiliam S. Hipólito-Ricaldi, David Wands

Abstract: Fast Radio Bursts (FRBs) have emerged as powerful cosmological probes in recent years offering valuable insights into cosmic expansion. These predominantly extragalactic transients encode information on the expansion of the Universe through their dispersion measure, reflecting interactions with the intervening medium along the line of sight. In this study, we introduce a novel method for reconstru… ▽ More Fast Radio Bursts (FRBs) have emerged as powerful cosmological probes in recent years offering valuable insights into cosmic expansion. These predominantly extragalactic transients encode information on the expansion of the Universe through their dispersion measure, reflecting interactions with the intervening medium along the line of sight. In this study, we introduce a novel method for reconstructing the late-time cosmic expansion rate and estimating the Hubble constant, solely derived from FRBs measurements coupled with their redshift information while employing Artificial Neural Networks. Our approach yields a Hubble constant estimate of $H_0 = 67.3\pm6.6\rm \ km \ s^{-1} \ Mpc^{-1}$. With a dataset comprising 23 localised data points, we demonstrate a precision of $\sim10\%$. However, our forecasts using simulated datasets indicate that in the future it could be possible to achieve precision comparable to the SH0ES collaboration or the Planck satellite. Our findings underscore the potential of FRBs as alternative, independent tools for probing cosmic dynamics. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.01951 [pdf, other]

Spanner for the $0/1/\infty$ weighted region problem

Authors: Joachim Gudmundsson, Zi** Huang, André van Renssen, Sampson Wong

Abstract: We consider the problem of computing an approximate weighted shortest path in a weighted subdivision, with weights assigned from the set $\{0, 1, \infty\}$. We present a data structure $B$, which stores a set of convex, non-overlap** regions. These include zero-cost regions (0-regions) with a weight of $0$ and obstacles with a weight of $\infty$, all embedded in a plane with a weight of $1$. The… ▽ More We consider the problem of computing an approximate weighted shortest path in a weighted subdivision, with weights assigned from the set $\{0, 1, \infty\}$. We present a data structure $B$, which stores a set of convex, non-overlap** regions. These include zero-cost regions (0-regions) with a weight of $0$ and obstacles with a weight of $\infty$, all embedded in a plane with a weight of $1$. The data structure $B$ can be constructed in expected time $O(N + (n/\varepsilon^3)(\log(n/\varepsilon) + \log N))$, where $n$ is the total number of regions, $N$ represents the total complexity of the regions, and $1 + \varepsilon$ is the approximation factor, for any $0 < \varepsilon < 1$. Using $B$, one can compute an approximate weighted shortest path from any point $s$ to any point $t$ in $O(N + n/\varepsilon^3 + (n/\varepsilon^2) \log(n/\varepsilon) + (\log N)/\varepsilon)$ time. In the special case where the 0-regions and obstacles are polygons (not necessarily convex), $B$ contains a $(1 + \varepsilon)$-spanner of the input vertices. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01764 [pdf, other]

Object Proxy Patterns for Accelerating Distributed Applications

Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

Abstract: Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area r… ▽ More Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area references that can resolve to data regardless of location, has been demonstrated as an effective low-level building block in such situations. Here we propose three high-level proxy-based programming patterns -- distributed futures, streaming, and ownership -- that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage. △ Less