-
Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
Authors:
Zhongyi Lin,
Ning Sun,
Pallab Bhattacharya,
Xizhou Feng,
Louis Feng,
John D. Owens
Abstract:
Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance i…
▽ More
Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance in input data distribution, and the use of different communication devices and topologies (e.g., NVLink, PCIe, network cards) that connect multiple compute devices, coupled with the desire for flexible training configurations. Built on top of our prior work for single-GPU platforms, we address these challenges and enable multi-GPU performance modeling by incorporating (1) data-distribution-aware performance models for embedding table lookup, and (2) data movement prediction of communication collectives, into our upgraded performance modeling pipeline equipped with inter-and intra-rank synchronization for ML workloads trained on multi-GPU platforms. Beyond accurately predicting the per-iteration training time of DLRM models with random configurations with a geomean error of 5.21% on two multi-GPU platforms, our prediction pipeline generalizes well to other types of ML workloads, such as Transformer-based NLP models with a geomean error of 3.00%. Moreover, even without actually running ML workloads like DLRMs on the hardware, it is capable of generating insights such as quickly selecting the fastest embedding table sharding configuration (with a success rate of 85%).
△ Less
Submitted 27 April, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
The EDGE Language: Extended General Einsums for Graph Algorithms
Authors:
Toluwanimi O. Odemuyiwa,
Joel S. Emer,
John D. Owens
Abstract:
In this work, we propose a unified abstraction for graph algorithms: the Extended General Einsums language, or EDGE. The EDGE language expresses graph algorithms in the language of tensor algebra, providing a rigorous, succinct, and expressive mathematical framework. EDGE leverages two ideas: (1) the well-known foundations provided by the graph-matrix duality, where a graph is simply a 2D tensor,…
▽ More
In this work, we propose a unified abstraction for graph algorithms: the Extended General Einsums language, or EDGE. The EDGE language expresses graph algorithms in the language of tensor algebra, providing a rigorous, succinct, and expressive mathematical framework. EDGE leverages two ideas: (1) the well-known foundations provided by the graph-matrix duality, where a graph is simply a 2D tensor, and (2) the power and expressivity of Einsum notation in the tensor algebra world. In this work, we describe our design goals for EDGE and walk through the extensions we add to Einsums to support more complex operations common in graph algorithms. Additionally, we provide a few examples of how to express graph algorithms in our proposed notation. We hope that a single, mathematical notation for graph algorithms will (1) allow researchers to more easily compare different algorithms and different implementations of a graph algorithm; (2) enable developers to factor complexity by separating the concerns of what to compute (described with the extended Einsum notation) from the lower level details of how to compute; and (3) enable the discovery of different algorithmic variants of a problem through algebraic manipulations and transformations on a given EDGE expression.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
A Multi-Model Ensemble System for the outer Heliosphere (MMESH): Solar Wind Conditions near Jupiter
Authors:
M. J. Rutala,
C. M. Jackman,
M. J. Owens,
C. Tao,
A. R. Fogg,
S. A. Murray,
L. Barnard
Abstract:
How the solar wind influences the magnetospheres of the outer planets is a fundamentally important question, but is difficult to answer in the absence of consistent, simultaneous monitoring of the upstream solar wind and the large-scale dynamics internal to the magnetosphere. To compensate for the relative lack of in-situ data, propagation models are often used to estimate the ambient solar wind c…
▽ More
How the solar wind influences the magnetospheres of the outer planets is a fundamentally important question, but is difficult to answer in the absence of consistent, simultaneous monitoring of the upstream solar wind and the large-scale dynamics internal to the magnetosphere. To compensate for the relative lack of in-situ data, propagation models are often used to estimate the ambient solar wind conditions at the outer planets for comparison to remote observations or in-situ measurements. This introduces another complication: the propagation of near-Earth solar wind measurements introduces difficult-to-assess uncertainties. Here, we present the Multi-Model Ensemble System for the outer Heliosphere (MMESH) to begin to address these issues, along with the resultant multi-model ensemble (MME) of the solar wind conditions near Jupiter. MMESH accepts as input any number of solar wind models together with contemporaneous in-situ spacecraft data. From these, the system characterizes typical uncertainties in model timing, quantifies how these uncertainties vary under different conditions, attempts to correct for systematic biases in the input model timing, and composes a MME with uncertainties from the results. For the case of the Jupiter-MME presented here, three solar wind propagation models were compared to in-situ measurements from the near-Jupiter spacecraft Ulysses and Juno which span diverse geometries and phases of the solar cycle, amounting to more than 14,000 hours of data over 2.5 decades. The MME gives the most-probable near-Jupiter solar wind conditions for times within the tested epoch, outperforming the input models and returning quantified estimates of uncertainty.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
On the Origin of the sudden Heliospheric Open Magnetic Flux Enhancement during the 2014 Pole Reversal
Authors:
Stephan G. Heinemann,
Mathew J. Owens,
Manuela Temmer,
James A. Turtle,
Charles N. Arge,
Carl J. Henney,
Jens Pomoell,
Eleanna Asvestari,
Jon A. Linker,
Cooper Downs,
Ronald M. Caplan,
Stefan J. Hofmeister,
Camilla Scolini,
Rui F. Pinto,
Maria S. Madjarska
Abstract:
Coronal holes are recognized as the primary sources of heliospheric open magnetic flux (OMF). However, a noticeable gap exists between in-situ measured OMF and that derived from remote sensing observations of the Sun. In this study, we investigate the OMF evolution and its connection to solar structures throughout 2014, with special emphasis on the period from September to October, where a sudden…
▽ More
Coronal holes are recognized as the primary sources of heliospheric open magnetic flux (OMF). However, a noticeable gap exists between in-situ measured OMF and that derived from remote sensing observations of the Sun. In this study, we investigate the OMF evolution and its connection to solar structures throughout 2014, with special emphasis on the period from September to October, where a sudden and significant OMF increase was reported. By deriving the OMF evolution at 1au, modeling it at the source surface, and analyzing solar photospheric data, we provide a comprehensive analysis of the observed phenomenon. First, we establish a strong correlation between the OMF increase and the solar magnetic field derived from a Potential Field Source Surface (PFSS) model ($cc_{\mathrm{Pearson}}=0.94$). Moreover, we find a good correlation between the OMF and the open flux derived from solar coronal holes ($cc_{\mathrm{Pearson}}=0.88$), although the coronal holes only contain $14-32\%$ of the Sun's total open flux. However, we note that while the OMF evolution correlates with coronal hole open flux, there is no correlation with the coronal hole area evolution ($cc_{\mathrm{Pearson}}=0.0$). The temporal increase in OMF correlates with the vanishing remnant magnetic field at the southern pole, caused by poleward flux circulations from the decay of numerous active regions months earlier. Additionally, our analysis suggests a potential link between the OMF enhancement and the concurrent emergence of the largest active region in solar cycle 24. In conclusion, our study provides insights into the strong increase in OMF observed during September to October 2014.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Tracking solar radio bursts using Bayesian multilateration
Authors:
L. A. Cañizares,
S. T. Badman,
S. A. Maloney,
M. J. Owens,
D. M. Weigt,
E. P. Carley,
P. T. Gallagher
Abstract:
Solar radio bursts (SRBs), are emitted by electrons propagating through the corona and interplanetary space. Tracking such bursts is key to understanding the properties of accelerated electrons and radio wave propagation as well as the local plasma environment that they propagate through. Here, we present a novel multilateration algorithm called BayEsian LocaLisation Algorithm (BELLA). In addition…
▽ More
Solar radio bursts (SRBs), are emitted by electrons propagating through the corona and interplanetary space. Tracking such bursts is key to understanding the properties of accelerated electrons and radio wave propagation as well as the local plasma environment that they propagate through. Here, we present a novel multilateration algorithm called BayEsian LocaLisation Algorithm (BELLA). In addition, apparent SRB positions from BELLA are compared with comparable localisation methods and the predictions of solar wind models. BELLA uses Bayesian inference to create probabilistic distributions of source positions and their uncertainties. This facilitates the estimation of algorithmic, instrumental, and physical uncertainties in a quantitative manner. We validated BELLA using simulations and a Type III SRB observed by STEREO A/B and Wind. BELLA tracked the Type III source from $\sim$ 10--150 $R_{sun}$ (2-0.15 MHz) along a spiral trajectory. This allowed for an estimate of an apparent solar wind speed of $v_{sw} \sim$ 400 km s$^{-1}$ and a source longitude of $φ_0 \sim$ 30deg. We compared these results with well-established methods of positioning: Goniopolarimetric (GP), analytical time-difference-of-arrival (TDOA), and Solar radio burst Electron Motion Tracker (SEMP). We found them to be in agreement with the results obtained by BELLA. Additionally, the results aligned with solar wind properties assimilated by the Heliospheric Upwind Extrapolation with time dependence (HUXt) model. We have validated BELLA and used it to identify apparent source positions as well as velocities and densities of the solar wind. Furthermore, we identified higher than expected electron densities, suggesting that the true emission sources were at lower altitudes than those identified by BELLA, an effect that may be due to appreciable scattering of electromagnetic waves by electrons in interplanetary space.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Coronal Models and Detection of Open Magnetic Field
Authors:
Eleanna Asvestari,
Manuela Temmer,
Ronald M. Caplan,
Jon A. Linker,
Stephan G. Heinemann,
Rui F. Pinto,
Carl J. Henney,
Charles N. Arge,
Mathew J. Owens,
Maria S. Madjarska,
Jens Pomoell,
Stefan J. Hofmeister,
Camilla Scolini,
Evangelia Samara
Abstract:
A plethora of coronal models, from empirical to more complex magnetohydrodynamic (MHD) ones, are being used for reconstructing the coronal magnetic field topology and estimating the open magnetic flux. However, no individual solution fully agrees with coronal hole observations and in situ measurements of open flux at 1~AU, as there is a strong deficit between model and observations contributing to…
▽ More
A plethora of coronal models, from empirical to more complex magnetohydrodynamic (MHD) ones, are being used for reconstructing the coronal magnetic field topology and estimating the open magnetic flux. However, no individual solution fully agrees with coronal hole observations and in situ measurements of open flux at 1~AU, as there is a strong deficit between model and observations contributing to the known problem of the missing open flux. In this paper we investigate the possible origin of the discrepancy between modeled and observed magnetic field topology by assessing the effect on the simulation output by the choice of the input boundary conditions and the simulation set up, including the choice of numerical schemes and the parameter initialization. In the frame of this work, we considered four potential field source surface based models and one fully MHD model, different types of global magnetic field maps and model initiation parameters. After assessing the model outputs using a variety of metrics, we conclude that they are highly comparable regardless of the differences set at initiation. When comparing all models to coronal hole boundaries extracted by extreme ultraviolet (EUV) filtergrams we find that they do not compare well. This miss-match between observed and modeled regions of open field is a candidate contributing to the open flux problem.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Phonon engineering of atomic-scale defects in superconducting quantum circuits
Authors:
Mo Chen,
John Clai Owens,
Harald Putterman,
Max Schäfer,
Oskar Painter
Abstract:
Noise within solid-state systems at low temperatures, where many of the degrees of freedom of the host material are frozen out, can typically be traced back to material defects that support low-energy excitations. These defects can take a wide variety of microscopic forms, and for amorphous materials are broadly described using generic models such as the tunneling two-level systems (TLS) model. Al…
▽ More
Noise within solid-state systems at low temperatures, where many of the degrees of freedom of the host material are frozen out, can typically be traced back to material defects that support low-energy excitations. These defects can take a wide variety of microscopic forms, and for amorphous materials are broadly described using generic models such as the tunneling two-level systems (TLS) model. Although the details of TLS, and their impact on the low-temperature behavior of materials have been studied since the 1970s, these states have recently taken on further relevance in the field of quantum computing, where the limits to the coherence of superconducting microwave quantum circuits are dominated by TLS. Efforts to mitigate the impact of TLS have thus far focused on circuit design, material selection, and material surface treatment. In this work, we take a new approach that seeks to directly modify the properties of TLS through nanoscale-engineering. This is achieved by periodically structuring the host material, forming an acoustic bandgap that suppresses all microwave-frequency phonons in a GHz-wide frequency band around the operating frequency of a transmon qubit superconducting quantum circuit. For embedded TLS that are strongly coupled to the electric qubit, we measure a pronounced increase in relaxation time by two orders of magnitude when the TLS transition frequency lies within the acoustic bandgap, with the longest $T_1$ time exceeding $5$ milliseconds. Our work paves the way for in-depth investigation and coherent control of TLS, which is essential for deepening our understanding of noise in amorphous materials and advancing solid-state quantum devices.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks
Authors:
Cameron Shinn,
Collin McCarthy,
Saurav Muralidharan,
Muhammad Osama,
John D. Owens
Abstract:
We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks. The Sparsity Roofline jointly models network accuracy, sparsity, and theoretical inference speedup. Our approach does not require implementing and benchmarking optimized kernels, and the theoretical speedup becomes equal to the actual speedup when the corresponding dense and sparse kernels ar…
▽ More
We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks. The Sparsity Roofline jointly models network accuracy, sparsity, and theoretical inference speedup. Our approach does not require implementing and benchmarking optimized kernels, and the theoretical speedup becomes equal to the actual speedup when the corresponding dense and sparse kernels are well-optimized. We achieve this through a novel analytical model for predicting sparse network performance, and validate the predicted speedup using several real-world computer vision architectures pruned across a range of sparsity patterns and degrees. We demonstrate the utility and ease-of-use of our model through two case studies: (1) we show how machine learning researchers can predict the performance of unimplemented or unoptimized block-structured sparsity patterns, and (2) we show how hardware designers can predict the performance implications of new sparsity patterns and sparse data formats in hardware. In both scenarios, the Sparsity Roofline helps performance experts identify sparsity regimes with the highest performance potential.
△ Less
Submitted 6 November, 2023; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Extraction of the neutron F2 structure function from inclusive proton and deuteron deep-inelastic scattering data
Authors:
Shujie Li,
Alberto Accardi,
Ishara. P. Fernando,
Cynthia E. Keppel,
Wally Melnitchouk,
Peter Monaghan,
Gabriel Niculescu,
Maria I. Niculescu,
Jeff. F. Owens
Abstract:
The available world deep-inelastic scattering data on proton and deuteron structure functions F2p, F2d, and their ratios, are leveraged to extract the free neutron F2n structure function, the F2n/F2p ratio, and associated uncertainties using the latest nuclear effect calculations in the deuteron.Special attention is devoted to the normalization of the proton and deuteron experimental datasets and…
▽ More
The available world deep-inelastic scattering data on proton and deuteron structure functions F2p, F2d, and their ratios, are leveraged to extract the free neutron F2n structure function, the F2n/F2p ratio, and associated uncertainties using the latest nuclear effect calculations in the deuteron.Special attention is devoted to the normalization of the proton and deuteron experimental datasets and to the treatment of correlated systematic errors, as well as the quantification of procedural and theoretical uncertainties. The extracted F2n dataset is utilized to evaluate the Q2 dependence of the Gottfried sum rule and the nonsinglet F2p - F2n moments. To facilitate replication of our study, as well as for general applications, a comprehensive DIS database including all recent JLab 6 GeV measurements, the extracted F2n, a modified CTEQ-JLab global PDF fit named CJ15nlo_mod, and grids with calculated proton, neutron and deuteron DIS structure functions at next-to-leading order, are discussed and made publicly available.
△ Less
Submitted 22 October, 2023; v1 submitted 28 September, 2023;
originally announced September 2023.
-
CME Propagation Through the Heliosphere: Status and Future of Observations and Model Development
Authors:
M. Temmer,
C. Scolini,
I. G. Richardson,
S. G. Heinemann,
E. Paouris,
A. Vourlidas,
M. M. Bisi,
writing teams,
:,
N. Al-Haddad,
T. Amerstorfer,
L. Barnard,
D. Buresova,
S. J. Hofmeister,
K. Iwai,
B. V. Jackson,
R. Jarolim,
L. K. Jian,
J. A. Linker,
N. Lugaz,
P. K. Manoharan,
M. L. Mays,
W. Mishra,
M. J. Owens,
E. Palmerio
, et al. (9 additional authors not shown)
Abstract:
The ISWAT clusters H1+H2 have a focus on interplanetary space and its characteristics, especially on the large-scale co-rotating and transient structures impacting Earth. SIRs, generated by the interaction between high-speed solar wind originating in large-scale open coronal magnetic fields and slower solar wind from closed magnetic fields, are regions of compressed plasma and magnetic field follo…
▽ More
The ISWAT clusters H1+H2 have a focus on interplanetary space and its characteristics, especially on the large-scale co-rotating and transient structures impacting Earth. SIRs, generated by the interaction between high-speed solar wind originating in large-scale open coronal magnetic fields and slower solar wind from closed magnetic fields, are regions of compressed plasma and magnetic field followed by high-speed streams that recur at the ca. 27 day solar rotation period. Short-term reconfigurations of the lower coronal magnetic field generate flare emissions and provide the energy to accelerate enormous amounts of magnetised plasma and particles in the form of CMEs into interplanetary space. The dynamic interplay between these phenomena changes the configuration of interplanetary space on various temporal and spatial scales which in turn influences the propagation of individual structures. While considerable efforts have been made to model the solar wind, we outline the limitations arising from the rather large uncertainties in parameters inferred from observations that make reliable predictions of the structures impacting Earth difficult. Moreover, the increased complexity of interplanetary space as solar activity rises in cycle 25 is likely to pose a challenge to these models. Combining observational and modeling expertise will extend our knowledge of the relationship between these different phenomena and the underlying physical processes, leading to improved models and scientific understanding and more-reliable space-weather forecasting. The current paper summarizes the efforts and progress achieved in recent years, identifies open questions, and gives an outlook for the next 5-10 years. It acts as basis for updating the existing COSPAR roadmap by Schrijver+ (2015), as well as providing a useful and practical guide for peer-users and the next generation of space weather scientists.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Demonstrating a long-coherence dual-rail erasure qubit using tunable transmons
Authors:
Harry Levine,
Arbel Haim,
Jimmy S. C. Hung,
Nasser Alidoust,
Mahmoud Kalaee,
Laura DeLorenzo,
E. Alex Wollack,
Patricio Arrangoiz-Arriola,
Amirhossein Khalajhedayati,
Rohan Sanil,
Hesam Moradinejad,
Yotam Vaknin,
Aleksander Kubica,
David Hover,
Shahriar Aghaeimeibodi,
Joshua Ari Alcid,
Christopher Baek,
James Barnett,
Kaustubh Bawdekar,
Przemyslaw Bienias,
Hugh Carson,
Cliff Chen,
Li Chen,
Harut Chinkezian,
Eric M. Chisholm
, et al. (88 additional authors not shown)
Abstract:
Quantum error correction with erasure qubits promises significant advantages over standard error correction due to favorable thresholds for erasure errors. To realize this advantage in practice requires a qubit for which nearly all errors are such erasure errors, and the ability to check for erasure errors without dephasing the qubit. We demonstrate that a "dual-rail qubit" consisting of a pair of…
▽ More
Quantum error correction with erasure qubits promises significant advantages over standard error correction due to favorable thresholds for erasure errors. To realize this advantage in practice requires a qubit for which nearly all errors are such erasure errors, and the ability to check for erasure errors without dephasing the qubit. We demonstrate that a "dual-rail qubit" consisting of a pair of resonantly coupled transmons can form a highly coherent erasure qubit, where transmon $T_1$ errors are converted into erasure errors and residual dephasing is strongly suppressed, leading to millisecond-scale coherence within the qubit subspace. We show that single-qubit gates are limited primarily by erasure errors, with erasure probability $p_\text{erasure} = 2.19(2)\times 10^{-3}$ per gate while the residual errors are $\sim 40$ times lower. We further demonstrate mid-circuit detection of erasure errors while introducing $< 0.1\%$ dephasing error per check. Finally, we show that the suppression of transmon noise allows this dual-rail qubit to preserve high coherence over a broad tunable operating range, offering an improved capacity to avoid frequency collisions. This work establishes transmon-based dual-rail qubits as an attractive building block for hardware-efficient quantum error correction.
△ Less
Submitted 20 March, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
BOBA: A Parallel Lightweight Graph Reordering Algorithm with Heavyweight Implications
Authors:
Matthew Drescher,
Muhammad A. Awad,
Serban D. Porumbescu,
John D. Owens
Abstract:
We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our
``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of vertices for writes through to main memory. It is highly parallelizable on GPUs\@. We show that, compared to a randomized baseline, the ordering produced gives…
▽ More
We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our
``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of vertices for writes through to main memory. It is highly parallelizable on GPUs\@. We show that, compared to a randomized baseline, the ordering produced gives improved locality of reference in sparse matrix-vector multiplication (SpMV) as well as other graph algorithms. Moreover, it can substantially speed up the conversion from a COO representation to the compressed format CSR, a very common workflow. Thus, it can give \emph{end-to-end} speedups even in SpMV\@. Unlike other lightweight approaches, this reordering does not rely on explicitly knowing the degrees of the vertices, and indeed its runtime is comparable to that of computing degrees. Instead, it uses the structure and edge distribution inherent in the input edge list, making it a candidate for default use in a pragmatic graph creation pipeline. This algorithm is suitable for road-type networks as well as scale-free. It improves cache locality on both CPUs and GPUs, achieving hit rates similar to the heavyweight techniques (e.g., for SpMV, 7--52\% and 11--67\% in the L1 and L2 caches, respectively). Compared to randomly labeled graphs, BOBA-reordered graphs achieve end-to-end speedups of up to 3.45. The reordering time is approximately one order of magnitude faster than existing lightweight techniques and up to 2.5 orders of magnitude faster than heavyweight techniques.
△ Less
Submitted 21 June, 2023; v1 submitted 17 June, 2023;
originally announced June 2023.
-
Light quark and antiquark constraints from new electroweak data
Authors:
Alberto Accardi,
Xiaoxian **g,
Joseph Francis Owens,
Sanghwa Park
Abstract:
We present a new parton distribution function analysis which includes new data for W boson production in proton-proton collisions and lepton pair production in proton-proton and proton-deuteron collisions. The new data provide strong constraints on the light antiquark parton distribution functions in the proton. We identify an interesting correlation between the $d/u$ ratio and the…
▽ More
We present a new parton distribution function analysis which includes new data for W boson production in proton-proton collisions and lepton pair production in proton-proton and proton-deuteron collisions. The new data provide strong constraints on the light antiquark parton distribution functions in the proton. We identify an interesting correlation between the $d/u$ ratio and the $\bar{d}/\bar{u}$ ratio which leads to a modification of our previous results for the $d/u$ ratio as the parton momentum fraction $x \rightarrow 1.$
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Target mass corrections in lepton--nucleus DIS: theory and applications to nuclear PDFs
Authors:
R. Ruiz,
K. F. Muzakka,
C. Leger,
P. Risse,
A. Accardi,
P. Duwentäster,
T. J. Hobbs,
T. Ježo,
C. Keppel,
M. Klasen,
K. Kovařík,
A. Kusina,
J. G. Morfín,
F. I. Olness,
J. F. Owens,
I. Schienbein,
J. Y. Yu
Abstract:
Motivated by the wide range of kinematics covered by current and planned deep-inelastic scattering (DIS) facilities, we revisit the formalism, practical implementation, and numerical impact of target mass corrections (TMCs) for DIS on unpolarized nuclear targets. An important aspect is that we only use nuclear and later partonic degrees of freedom, carefully avoiding a picture of the nucleus in te…
▽ More
Motivated by the wide range of kinematics covered by current and planned deep-inelastic scattering (DIS) facilities, we revisit the formalism, practical implementation, and numerical impact of target mass corrections (TMCs) for DIS on unpolarized nuclear targets. An important aspect is that we only use nuclear and later partonic degrees of freedom, carefully avoiding a picture of the nucleus in terms of nucleons. After establishing that formulae used for individual nucleon targets $(p,n)$, derived in the Operator Product Expansion (OPE) formalism, are indeed applicable to nuclear targets, we rewrite expressions for nuclear TMCs in terms of \mbox{re-scaled} (or averaged) kinematic variables. As a consequence, we find a representation for nuclear TMCs that is approximately independent of the nuclear target. We go on to construct a single-parameter fit for all nuclear targets that is in good numerical agreement with full computations of TMCs. We discuss in detail qualitative and quantitative differences between nuclear TMCs built in the OPE and the parton model formalisms, as well as give numerical predictions for current and future facilities.
△ Less
Submitted 12 March, 2024; v1 submitted 18 January, 2023;
originally announced January 2023.
-
A Programming Model for GPU Load Balancing
Authors:
Muhammad Osama,
Serban D. Porumbescu,
John D. Owens
Abstract:
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-…
▽ More
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when develo** irregular-parallel algorithms on the GPU, and also improve the overall performance characteristics for such applications by allowing a quick path to experimentation with a variety of existing load-balancing techniques. Consequently, we also hope that by separating the concerns of load-balancing from work processing within our abstraction, managing and extending existing code to future architectures becomes easier.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU
Authors:
Muhammad Osama,
Duane Merrill,
Cris Cecka,
Michael Garland,
John D. Owens
Abstract:
We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless o…
▽ More
We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements.
On GPU processors, our Stream-K parallelization of GEMM produces a peak speedup of up to 14$\times$ and 6.7$\times$, and an average performance response that is both higher and more consistent across 32,824 GEMM problem geometries than state-of-the-art math libraries such as CUTLASS and cuBLAS. Furthermore, we achieve this performance from a single tile size configuration per floating-point precision, whereas today's math libraries employ complex kernel-selection heuristics to select from a large ensemble of kernel variants.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Detection of Active Emergency Vehicles using Per-Frame CNNs and Output Smoothing
Authors:
Meng Fan,
Craig Bidstrup,
Zhaoen Su,
Jason Owens,
Gary Yang,
Nemanja Djuric
Abstract:
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this prob…
▽ More
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Essentials of Parallel Graph Analytics
Authors:
Muhammad Osama,
Serban D. Porumbescu,
John D. Owens
Abstract:
We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these essential components, we propose an abstraction that captures all the significant programming models within graph analytics, such as bulk-synchronous, asynchronous, shared-memory, messa…
▽ More
We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these essential components, we propose an abstraction that captures all the significant programming models within graph analytics, such as bulk-synchronous, asynchronous, shared-memory, message-passing, and push vs. pull traversals. Finally, we demonstrate the power of our abstraction with an elegant modern C++ implementation of single-source shortest path and its required components.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Modelling Cosmic Radiation Events in the Tree-ring Radiocarbon Record
Authors:
Qingyuan Zhang,
Utkarsh Sharma,
Jordan A. Dennis,
Andrea Scifo,
Margot Kuitems,
Ulf Buentgen,
Mathew J. Owens,
Michael W. Dee,
Benjamin J. S. Pope
Abstract:
Annually-resolved measurements of the radiocarbon content in tree-rings have revealed rare sharp rises in carbon-14 production. These 'Miyake events' are likely produced by rare increases in cosmic radiation from the Sun or other energetic astrophysical sources. The radiocarbon produced is not only circulated through the Earth's atmosphere and oceans, but also absorbed by the biosphere and locked…
▽ More
Annually-resolved measurements of the radiocarbon content in tree-rings have revealed rare sharp rises in carbon-14 production. These 'Miyake events' are likely produced by rare increases in cosmic radiation from the Sun or other energetic astrophysical sources. The radiocarbon produced is not only circulated through the Earth's atmosphere and oceans, but also absorbed by the biosphere and locked in the annual growth rings of trees. To interpret high-resolution tree-ring radiocarbon measurements therefore necessitates modelling the entire global carbon cycle. Here, we introduce 'ticktack', the first open-source Python package that connects box models of the carbon cycle with modern Bayesian inference tools. We use this to analyse all public annual 14C tree data, and infer posterior parameters for all six known Miyake events. They do not show a consistent relationship to the solar cycle, and several display extended durations that challenge either astrophysical or geophysical models.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Neutrino Scattering Measurements on Hydrogen and Deuterium: A Snowmass White Paper
Authors:
Luis Alvarez-Ruso,
Joshua L. Barrow,
Leo Bellantoni,
Minerba Betancourt,
Alan Bross,
Linda Cremonesi,
Kirsty Duffy,
Steven Dytman,
Laura Fields,
Tsutomu Fukuda,
Diego González-Díaz,
Mikhail Gorchtein,
Richard J. Hill,
Thomas Junk,
Dustin Keller,
Huey-Wen Lin,
Xianguo Lu,
Kendall Mahn,
Aaron S. Meyer,
Tanaz Mohayai,
Jorge G. Morfín,
Joseph Owens,
Jonathan Paley,
Vishvas Pandey,
Gil Paz
, et al. (8 additional authors not shown)
Abstract:
Neutrino interaction uncertainties are a limiting factor in current and next-generation experiments probing the fundamental physics of neutrinos, a unique window on physics beyond the Standard Model. Neutrino-nucleon scattering amplitudes are an important part of the neutrino interaction program. However, since all modern neutrino detectors are composed primarily of heavy nuclei, knowledge of elem…
▽ More
Neutrino interaction uncertainties are a limiting factor in current and next-generation experiments probing the fundamental physics of neutrinos, a unique window on physics beyond the Standard Model. Neutrino-nucleon scattering amplitudes are an important part of the neutrino interaction program. However, since all modern neutrino detectors are composed primarily of heavy nuclei, knowledge of elementary neutrino-nucleon amplitudes relies heavily on experiments performed in the 1970s and 1980s, whose statistical and systematic precision are insufficient for current needs. In this white paper, we outline the motivation for attempting measurements on hydrogen and deuterium that would improve this knowledge, and we discuss options for making these measurements either with the DUNE near detector or with a dedicated facility.
△ Less
Submitted 1 June, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Authors:
Zhongyi Lin,
Louis Feng,
Ehsan K. Ardestani,
Jaewon Lee,
John Lundell,
Changkyu Kim,
Arun Kejariwal,
John D. Owens
Abstract:
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) but also the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-b…
▽ More
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) but also the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 4.61% and 7.96% geomean errors for GPU active time and overall E2E per-batch training time prediction with overheads from individual workloads, respectively. A slight increase of 2.19% incurred in E2E prediction error with shared overheads across workloads suggests the feasibility of using shared overheads in large-scale prediction. We show that our general performance model not only achieves low prediction error on DLRM, which has highly customized configurations and is dominated by multiple factors but also yields comparable accuracy on other compute-bound ML models targeted by most previous methods. Using this performance model and graph-level data and task dependency analysis, we show our system can provide more general model-system co-design than previous methods.
△ Less
Submitted 16 November, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations
Authors:
Yuxin Chen,
Benjamin Brock,
Serban Porumbescu,
Aydın Buluç,
Katherine Yelick,
John D. Owens
Abstract:
We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems wit…
▽ More
We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Supporting Unified Shader Specialization by Co-opting C++ Features
Authors:
Kerry A. Seitz Jr.,
Theresa Foley,
Serban D. Porumbescu,
John D. Owens
Abstract:
Modern unified programming models (such as CUDA and SYCL) that combine host (CPU) code and GPU code into the same programming language, same file, and same lexical scope lack adequate support for GPU code specialization, which is a key optimization in real-time graphics. Furthermore, current methods used to implement specialization do not translate to a unified environment. In this paper, we creat…
▽ More
Modern unified programming models (such as CUDA and SYCL) that combine host (CPU) code and GPU code into the same programming language, same file, and same lexical scope lack adequate support for GPU code specialization, which is a key optimization in real-time graphics. Furthermore, current methods used to implement specialization do not translate to a unified environment. In this paper, we create a unified shader programming environment in C++ that provides first-class support for specialization by co-opting C++'s attribute and virtual function features and reimplementing them with alternate semantics to express the services required. By co-opting existing features, we enable programmers to use familiar C++ programming techniques to write host and GPU code together, while still achieving efficient generated C++ and HLSL code via our source-to-source translator.
△ Less
Submitted 16 July, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Chiral Cavity Quantum Electrodynamics
Authors:
John Clai Owens,
Margaret G. Panetta,
Brendan Saxberg,
Gabrielle Roberts,
Srivatsan Chakram,
Ruichao Ma,
Andrei Vrajitoarea,
Jonathan Simon,
David Schuster
Abstract:
Cavity quantum electrodynamics, which explores the granularity of light by coupling a resonator to a nonlinear emitter, has played a foundational role in the development of modern quantum information science and technology. In parallel, the field of condensed matter physics has been revolutionized by the discovery of underlying topological robustness in the face of disorder, often arising from the…
▽ More
Cavity quantum electrodynamics, which explores the granularity of light by coupling a resonator to a nonlinear emitter, has played a foundational role in the development of modern quantum information science and technology. In parallel, the field of condensed matter physics has been revolutionized by the discovery of underlying topological robustness in the face of disorder, often arising from the breaking of time-reversal symmetry, as in the case of the quantum Hall effect. In this work, we explore for the first time cavity quantum electrodynamics of a transmon qubit in the topological vacuum of a Harper-Hofstadter topological lattice. To achieve this, we assemble a square lattice of niobium superconducting resonators and break time-reversal symmetry by introducing ferrimagnets before coupling the system to a single transmon qubit. We spectroscopically resolve the individual bulk and edge modes of this lattice, detect vacuum-stimulated Rabi oscillations between the excited transmon and each mode, and thereby measure the synthetic-vacuum-induced Lamb shift of the transmon. Finally, we demonstrate the ability to employ the transmon to count individual photons within each mode of the topological band structure. This work opens the field of chiral quantum optics experiment, suggesting new routes to topological many-body physics and offering unique approaches to backscatter-resilient quantum communication.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Better GPU Hash Tables
Authors:
Muhammad A. Awad,
Saman Ashkiani,
Serban D. Porumbescu,
Martín Farach-Colton,
John D. Owens
Abstract:
We revisit the problem of building static hash tables on the GPU and design and build three bucketed hash tables that use different probing schemes. Our implementations are lock-free and offer efficient memory access patterns; thus, only the probing scheme is the factor affecting the performance of the hash table's different operations. Our results show that a bucketed cuckoo hash table that uses…
▽ More
We revisit the problem of building static hash tables on the GPU and design and build three bucketed hash tables that use different probing schemes. Our implementations are lock-free and offer efficient memory access patterns; thus, only the probing scheme is the factor affecting the performance of the hash table's different operations. Our results show that a bucketed cuckoo hash table that uses three hash functions (BCHT) outperforms alternative methods that use power-of-two choices, iceberg hashing, and a cuckoo hash table that uses a bucket size one. At high load factors as high as 0.99, BCHT enjoys an average probe count of 1.43 during insertion. Using three hash functions only, positive and negative queries require at most 1.39 and 2.8 average probes per key, respectively.
△ Less
Submitted 17 December, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.
-
CJ15 global PDF analysis with new electroweak data from the STAR and SeaQuest experiments
Authors:
Sanghwa Park,
Alberto Accardi,
Xiaoxian **g,
J. F. Owens
Abstract:
We present updates to a recent CTEQ-Jefferson Lab (CJ) global analysis of parton distribution functions with a new set of electroweak data that provide unique access to quark flavor separation in the proton. In particular, recent $W$ and $Z$ boson measurements from the STAR experiment at RHIC put additional constraints on light quarks and antiquarks near the valence regime. The new measurement of…
▽ More
We present updates to a recent CTEQ-Jefferson Lab (CJ) global analysis of parton distribution functions with a new set of electroweak data that provide unique access to quark flavor separation in the proton. In particular, recent $W$ and $Z$ boson measurements from the STAR experiment at RHIC put additional constraints on light quarks and antiquarks near the valence regime. The new measurement of the Drell-Yan lepton pair production ratio in $p+p$ and $p+d$ collisions by the SeaQuest experiment at Fermilab extends the large-$x$ coverage of the previous E866 experiment and sheds new light on the light antiquarks distribution. In this report, the impact of these new data sets on parton distribution functions will be presented with emphasis given to the flavor asymmetry of the light antiquark sea at large values of the parton momentum $x$.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Coronal Hole Detection and Open Magnetic Flux
Authors:
J. A. Linker,
S. G. Heinemann,
M. Temmer,
M. J. Owens,
R. M. Caplan,
C. N. Arge,
E. Asvestari,
V. Delouille,
C. Downs,
S. J. Hofmeister,
I. C. Jebaraj,
M. Madjarska,
R. Pinto,
J. Pomoell,
E. Samara,
C. Scolini,
B. Vrsnak
Abstract:
Many scientists use coronal hole (CH) detections to infer open magnetic flux. Detection techniques differ in the areas that they assign as open, and may obtain different values for the open magnetic flux. We characterize the uncertainties of these methods, by applying six different detection methods to deduce the area and open flux of a near-disk center CH observed on 9/19/2010, and applying a sin…
▽ More
Many scientists use coronal hole (CH) detections to infer open magnetic flux. Detection techniques differ in the areas that they assign as open, and may obtain different values for the open magnetic flux. We characterize the uncertainties of these methods, by applying six different detection methods to deduce the area and open flux of a near-disk center CH observed on 9/19/2010, and applying a single method to five different EUV filtergrams for this CH. Open flux was calculated using five different magnetic maps. The standard deviation (interpreted as the uncertainty) in the open flux estimate for this CH was about 26%. However, including the variability of different magnetic data sources, this uncertainty almost doubles to 45%. We use two of the methods to characterize the area and open flux for all CHs in this time period. We find that the open flux is greatly underestimated compared to values inferred from in-situ measurements (by 2.2-4 times). We also test our detection techniques on simulated emission images from a thermodynamic MHD model of the solar corona. We find that the methods overestimate the area and open flux in the simulated CH, but the average error in the flux is only about 7%. The full-Sun detections on the simulated corona underestimate the model open flux, but by factors well below what is needed to account for the missing flux in the observations. Under-detection of open flux in coronal holes likely contributes to the recognized deficit in solar open flux, but is unlikely to resolve it.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Multi-spacecraft Study of the Solar Wind at Solar Minimum: Dependence on Latitude and Transient Outflows
Authors:
R. Laker,
T. S. Horbury,
S. D. Bale,
L. Matteini,
T. Woolley,
L. D. Woodham,
J. E. Stawarz,
E. E. Davies,
J. P. Eastwood,
M. J. Owens,
H. O'Brien,
V. Evans,
V. Angelini,
I. Richter,
D. Heyner,
C. J. Owen,
P. Louarn,
A. Federov
Abstract:
The recent launches of Parker Solar Probe (PSP), Solar Orbiter (SO) and BepiColombo, along with several older spacecraft, have provided the opportunity to study the solar wind at multiple latitudes and distances from the Sun simultaneously. We take advantage of this unique spacecraft constellation, along with low solar activity across two solar rotations between May and July 2020, to investigate h…
▽ More
The recent launches of Parker Solar Probe (PSP), Solar Orbiter (SO) and BepiColombo, along with several older spacecraft, have provided the opportunity to study the solar wind at multiple latitudes and distances from the Sun simultaneously. We take advantage of this unique spacecraft constellation, along with low solar activity across two solar rotations between May and July 2020, to investigate how the solar wind structure, including the Heliospheric Current Sheet (HCS), varies with latitude. We visualise the sector structure of the inner heliosphere by ballistically map** the polarity and solar wind speed from several spacecraft onto the Sun's source surface. We then assess the HCS morphology and orientation with the in situ data and compare with a predicted HCS shape. We resolve ripples in the HCS on scales of a few degrees in longitude and latitude, finding that the local orientation of sector boundaries were broadly consistent with the shape of the HCS but were steepened with respect to a modelled HCS at the Sun. We investigate how several CIRs varied with latitude, finding evidence for the compression region affecting slow solar wind outside the latitude extent of the faster stream. We also identified several transient structures associated with HCS crossings, and speculate that one such transient may have disrupted the local HCS orientation up to five days after its passage. We have shown that the solar wind structure varies significantly with latitude, with this constellation providing context for solar wind measurements that would not be possible with a single spacecraft. These measurements provide an accurate representation of the solar wind within $\pm 10^{\circ}$ latitude, which could be used as a more rigorous constraint on solar wind models and space weather predictions. In the future, this range of latitudes will increase as SO's orbit becomes more inclined.
△ Less
Submitted 22 June, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Why are ELEvoHI CME arrival predictions different if based on STEREO-A or STEREO-B heliospheric imager observations?
Authors:
Jürgen Hinterreiter,
Tanja Amerstorfer,
Martin A. Reiss,
Christian Möstl,
Manuela Temmer,
Maike Bauer,
Ute V. Amerstorfer,
Rachel L. Bailey,
Andreas J. Weiss,
Jackie A. Davies,
Luke A. Barnard,
Mathew J. Owens
Abstract:
Accurate forecasting of the arrival time and arrival speed of coronal mass ejections (CMEs) is a unsolved problem in space weather research. In this study, a comparison of the predicted arrival times and speeds for each CME based, independently, on the inputs from the two STEREO vantage points is carried out. We perform hindcasts using ELlipse Evolution model based on Heliospheric Imager observati…
▽ More
Accurate forecasting of the arrival time and arrival speed of coronal mass ejections (CMEs) is a unsolved problem in space weather research. In this study, a comparison of the predicted arrival times and speeds for each CME based, independently, on the inputs from the two STEREO vantage points is carried out. We perform hindcasts using ELlipse Evolution model based on Heliospheric Imager observations (ELEvoHI) ensemble modelling. An estimate of the ambient solar wind conditions is obtained by the Wang-Sheeley-Arge/Heliospheric Upwind eXtrapolation (WSA/HUX) model combination that serves as input to ELEvoHI. We carefully select 12 CMEs between February 2010 and July 2012 that show clear signatures in both STEREO-A and STEREO-B HI time-elongation maps, that propagate close to the ecliptic plane, and that have corresponding in situ signatures at Earth. We find a mean arrival time difference of 6.5 hrs between predictions from the two different viewpoints, which can reach up to 9.5 hrs for individual CMEs, while the mean arrival speed difference is 63 km s$^{-1}$. An ambient solar wind with a large speed variance leads to larger differences in the STEREO-A and STEREO-B CME arrival time predictions ($cc~=~0.92$). Additionally, we compare the predicted arrivals, from both spacecraft, to the actual in situ arrivals at Earth and find a mean absolute error of 7.5 $\pm$ 9.5 hrs for the arrival time and 87 $\pm$ 111 km s$^{-1}$ for the arrival speed. There is no tendency for one spacecraft to provide more accurate arrival predictions than the other.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Evolving Solar Wind Flow Properties of Magnetic Inversions Observed by Helios
Authors:
Allan R Macneil,
Mathew J Owens,
Robert T Wicks,
Mike Lockwood
Abstract:
In its first encounter at solar distances as close as r = 0.16AU, Parker Solar Probe (PSP) observed numerous local reversals, or inversions, in the heliospheric magnetic field (HMF), which were accompanied by large spikes in solar wind speed. Both solar and in situ mechanisms have been suggested to explain the existence of HMF inversions in general. Previous work using Helios 1, covering 0.3-1AU,…
▽ More
In its first encounter at solar distances as close as r = 0.16AU, Parker Solar Probe (PSP) observed numerous local reversals, or inversions, in the heliospheric magnetic field (HMF), which were accompanied by large spikes in solar wind speed. Both solar and in situ mechanisms have been suggested to explain the existence of HMF inversions in general. Previous work using Helios 1, covering 0.3-1AU, observed inverted HMF to become more common with increasing r, suggesting that some heliospheric driving process creates or amplifies inversions. This study expands upon these findings, by analysing inversion-associated changes in plasma properties for the same large data set, facilitated by observations of 'strahl' electrons to identify the unperturbed magnetic polarity. We find that many inversions exhibit anti-correlated field and velocity perturbations, and are thus characteristically Alfvénic, but many also depart strongly from this relationship over an apparent continuum of properties. Inversions depart further from the 'ideal' Alfvénic case with increasing r, as more energy is partitioned in the field, rather than the plasma, component of the perturbation. This departure is greatest for inversions with larger density and magnetic field strength changes, and characteristic slow solar wind properties. We find no evidence that inversions which stray further from 'ideal' Alfvénicity have different generation processes from those which are more Alfvénic. Instead, different inversion properties could be imprinted based on transport or formation within different solar wind streams.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
Semi-annual, annual and Universal Time variations in the magnetosphere and in geomagnetic activity: 4. Polar Cap motions and origins of the Universal Time effect
Authors:
Mike Lockwood,
Carl Haines,
Luke A. Barnard,
Mathew J. Owens,
Chris J. Scott,
Aude Chambodut,
Kathryn A. McWilliams
Abstract:
We use the am, an, as and the a-sigma geomagnetic indices to the explore a previously overlooked factor in magnetospheric electrodynamics, namely the inductive effect of diurnal motions of the Earth's magnetic poles toward and away from the Sun caused by Earth's rotation. Because the offset of the (eccentric dipole) geomagnetic pole from the rotational axis is roughly twice as large in the souther…
▽ More
We use the am, an, as and the a-sigma geomagnetic indices to the explore a previously overlooked factor in magnetospheric electrodynamics, namely the inductive effect of diurnal motions of the Earth's magnetic poles toward and away from the Sun caused by Earth's rotation. Because the offset of the (eccentric dipole) geomagnetic pole from the rotational axis is roughly twice as large in the southern hemisphere compared to the northern, the effects there are predicted to be roughly twice the amplitude. Hemispheric differences have previously been discussed in terms of polar ionospheric conductivities, effects which we allow for by studying the dipole tilt effect on time-of-year variations of the indices. The electric field induced in a geocentric frame is shown to also be a significant factor and gives a modulation of the voltage applied by the solar wind flow in the southern hemisphere of typically a 30% diurnal modulation for disturbed intervals rising to 76% in quiet times. Motion towards/away from the Sun reduces/enhances the directly-driven ionospheric voltages and reduces/enhances the magnetic energy stored in the near-Earth tail: 10% of the effect being directly-driven and 90% being in tail energy storage/release. Combined with the effect of solar wind dynamic pressure and dipole tilt on the pressure balance in the near-Earth tail, the effect provides an excellent explanation of how the observed Russell-McPherron pattern in the driving power input into the magnetosphere is converted into the equinoctial pattern in average geomagnetic activity (after correction is made for dipole tilt effects on ionospheric conductivity), added to a pronounced UT variation with minimum at 02-10UT. In addition, we show that the predicted and observed UT variations in average geomagnetic activity has implications for the occurrence of the largest events that also show the nett UT variation.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
In situ multi-spacecraft and remote imaging observations of the first CME detected by Solar Orbiter and BepiColombo
Authors:
E. E. Davies,
C. Möstl,
M. J. Owens,
A. J. Weiss,
T. Amerstorfer,
J. Hinterreiter,
M. Bauer,
R. L. Bailey,
M. A. Reiss,
R. J. Forsyth,
T. S. Horbury,
H. O'Brien,
V. Evans,
V. Angelini,
D. Heyner,
I. Richter,
H-U. Auster,
W. Magnes,
W. Baumjohann,
D. Fischer,
D. Barnes,
J. A. Davies,
R. A. Harrison
Abstract:
On 2020 April 19 a coronal mass ejection (CME) was detected in situ by Solar Orbiter at a heliocentric distance of about 0.8 AU. The CME was later observed in situ on April 20th by the Wind and BepiColombo spacecraft whilst BepiColombo was located very close to Earth. This CME presents a good opportunity for a triple radial alignment study, as the spacecraft were separated by less than 5$^\circ$ i…
▽ More
On 2020 April 19 a coronal mass ejection (CME) was detected in situ by Solar Orbiter at a heliocentric distance of about 0.8 AU. The CME was later observed in situ on April 20th by the Wind and BepiColombo spacecraft whilst BepiColombo was located very close to Earth. This CME presents a good opportunity for a triple radial alignment study, as the spacecraft were separated by less than 5$^\circ$ in longitude. The source of the CME, which was launched on April 15th, was an almost entirely isolated streamer blowout. STEREO-A observed the event remotely from -75.1$^\circ$ longitude, which is an exceptionally well suited viewpoint for heliospheric imaging of an Earth directed CME. The configuration of the four spacecraft has provided an exceptionally clean link between remote imaging and in situ observations of the CME. We have used the in situ observations of the CME at Solar Orbiter, Wind, and BepiColombo, and the remote observations of the CME at STEREO-A in combination with flux rope models to determine the global shape of the CME and its evolution as it propagated through the inner heliosphere. A clear flattening of the CME cross-section has been observed by STEREO-A, and further confirmed by comparing profiles of the flux rope models to the in situ data, where the distorted flux rope cross-section qualitatively agrees most with in situ observations of the magnetic field at Solar Orbiter. Comparing in situ observations of the magnetic field between spacecraft, we find that the dependence of the maximum (mean) magnetic field strength decreases with heliocentric distance as $r^{-1.24 \pm 0.50}$ ($r^{-1.12 \pm 0.14}$), in disagreement with previous studies. Further assessment of the axial and poloidal magnetic field strength dependencies suggests that the expansion of the CME is likely neither self-similar nor cylindrically symmetric.
△ Less
Submitted 23 February, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Energy-based Out-of-distribution Detection
Authors:
Weitang Liu,
Xiaoyun Wang,
John D. Owens,
Yixuan Li
Abstract:
Determining whether inputs are out-of-distribution (OOD) is an essential building block for safely deploying machine learning models in the open world. However, previous methods relying on the softmax confidence score suffer from overconfident posterior distributions for OOD data. We propose a unified framework for OOD detection that uses an energy score. We show that energy scores better distingu…
▽ More
Determining whether inputs are out-of-distribution (OOD) is an essential building block for safely deploying machine learning models in the open world. However, previous methods relying on the softmax confidence score suffer from overconfident posterior distributions for OOD data. We propose a unified framework for OOD detection that uses an energy score. We show that energy scores better distinguish in- and out-of-distribution samples than the traditional approach using the softmax scores. Unlike softmax confidence scores, energy scores are theoretically aligned with the probability density of the inputs and are less susceptible to the overconfidence issue. Within this framework, energy can be flexibly used as a scoring function for any pre-trained neural classifier as well as a trainable cost function to shape the energy surface explicitly for OOD detection. On a CIFAR-10 pre-trained WideResNet, using the energy score reduces the average FPR (at TPR 95%) by 18.03% compared to the softmax confidence score. With energy-based training, our method outperforms the state-of-the-art on common benchmarks.
△ Less
Submitted 26 April, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Parker Solar Probe Observations of Suprathermal Electron Flux Enhancements Originating from Coronal Hole Boundaries
Authors:
Allan R Macneil,
Mathew J Owens,
Laura Berčič,
Adam J Finley
Abstract:
Reconnection between pairs of solar magnetic flux elements, one open and the other a closed loop, is theorised to be a crucial process for both maintaining the structure of the corona and producing the solar wind. This 'interchange reconnection' is expected to be particularly active at the open-closed boundaries of coronal holes (CHs). Previous analysis of solar wind data at 1AU indicated that pea…
▽ More
Reconnection between pairs of solar magnetic flux elements, one open and the other a closed loop, is theorised to be a crucial process for both maintaining the structure of the corona and producing the solar wind. This 'interchange reconnection' is expected to be particularly active at the open-closed boundaries of coronal holes (CHs). Previous analysis of solar wind data at 1AU indicated that peaks in the flux of suprathermal electrons at slow-fast stream interfaces may arise from magnetic connection to the CH boundary, rather than dynamic effects such as compression. Further, offsets between the peak and stream interface locations are suggested to be the result of interchange reconnection at the source. As a preliminary test of these suggestions, we analyse two solar wind streams observed during the first Parker Solar Probe (PSP) perihelion encounter, each associated with equatorial CH boundaries (one leading and one trailing with respect to rotation). Each stream features a peak in suprathermal electron flux, the locations and associated plasma properties of which are indicative of a solar origin, in agreement with previous suggestions from 1AU observations. Discrepancies between locations of the flux peaks and other features suggest these peaks may too be shifted by source region interchange reconnection. Our interpretation of each event is compatible with a global pattern of open flux transport, although random footpoint motions or other explanations remain feasible. These exploratory results highlight future opportunities for statistical studies regarding interchange reconnection and flux transport at CH boundaries with modern near-Sun missions.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
An experimental program with high duty-cycle polarized and unpolarized positron beams at Jefferson Lab
Authors:
A. Accardi,
A. Afanasev,
I. Albayrak,
S. F. Ali,
M. Amaryan,
J. R. M. Annand,
J. Arrington,
A. Asaturyan,
H. Atac,
H. Avakian,
T. Averett,
C. Ayerbe Gayoso,
X. Bai,
L. Barion,
M. Battaglieri,
V. Bellini,
R. Beminiwattha,
F. Benmokhtar,
V. V. Berdnikov,
J. C. Bernauer,
V. Bertone,
A. Bianconi,
A. Biselli,
P. Bisio,
P. Blunden
, et al. (205 additional authors not shown)
Abstract:
Positron beams, both polarized and unpolarized, are identified as essential ingredients for the experimental programs at the next generation of lepton accelerators. In the context of the hadronic physics program at Jefferson Lab (JLab), positron beams are complementary, even essential, tools for a precise understanding of the electromagnetic structure of nucleons and nuclei, in both the elastic an…
▽ More
Positron beams, both polarized and unpolarized, are identified as essential ingredients for the experimental programs at the next generation of lepton accelerators. In the context of the hadronic physics program at Jefferson Lab (JLab), positron beams are complementary, even essential, tools for a precise understanding of the electromagnetic structure of nucleons and nuclei, in both the elastic and deep-inelastic regimes. For instance, elastic scattering of polarized and unpolarized electrons and positrons from the nucleon enables a model independent determination of its electromagnetic form factors. Also, the deeply-virtual scattering of polarized and unpolarized electrons and positrons allows unambiguous separation of the different contributions to the cross section of the lepto-production of photons and of lepton-pairs, enabling an accurate determination of the nucleons and nuclei generalized parton distributions, and providing an access to the gravitational form factors. Furthermore, positron beams offer the possibility of alternative tests of the Standard Model of particle physics through the search of a dark photon, the precise measurement of electroweak couplings, and the investigation of charged lepton flavor violation. This document discusses the perspectives of an experimental program with high duty-cycle positron beams at JLab.
△ Less
Submitted 21 May, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Using gradient boosting regression to improve ambient solar wind model predictions
Authors:
R. L. Bailey,
M. A. Reiss,
C. N. Arge,
C. Möstl,
M. J. Owens,
U. V. Amerstorfer,
C. J. Henney,
T. Amerstorfer,
A. J. Weiss,
J. Hinterreiter
Abstract:
Studying the ambient solar wind, a continuous pressure-driven plasma flow emanating from our Sun, is an important component of space weather research. The ambient solar wind flows in interplanetary space determine how solar storms evolve through the heliosphere before reaching Earth, and especially during solar minimum are themselves a driver of activity in the Earth's magnetic field. Accurately f…
▽ More
Studying the ambient solar wind, a continuous pressure-driven plasma flow emanating from our Sun, is an important component of space weather research. The ambient solar wind flows in interplanetary space determine how solar storms evolve through the heliosphere before reaching Earth, and especially during solar minimum are themselves a driver of activity in the Earth's magnetic field. Accurately forecasting the ambient solar wind flow is therefore imperative to space weather awareness. Here we present a machine learning approach in which solutions from magnetic models of the solar corona are used to output the solar wind conditions near the Earth. The results are compared to observations and existing models in a comprehensive validation analysis, and the new model outperforms existing models in almost all measures. In addition, this approach offers a new perspective to discuss the role of different input data to ambient solar wind modeling, and what this tells us about the underlying physical processes. The final model discussed here represents an extremely fast, well-validated and open-source approach to the forecasting of ambient solar wind at Earth.
△ Less
Submitted 23 March, 2021; v1 submitted 23 June, 2020;
originally announced June 2020.
-
The Solar Corona during the Total Eclipse on 16 June 1806: Graphical Evidence of the Coronal Structure during the Dalton Minimum
Authors:
Hisashi Hayakawa,
Mathew J. Owens,
Michael Lockwood,
Mitsuru Sôma
Abstract:
Visible coronal structure, in particular the spatial evolution of coronal streamers, provides indirect information about solar magnetic activity and the underlying solar dynamo. Their apparent absence of structure observed during the total eclipses of throughout the Maunder Minimum has been interpreted as evidence of a significant change in the solar magnetic field from that during modern cycles.…
▽ More
Visible coronal structure, in particular the spatial evolution of coronal streamers, provides indirect information about solar magnetic activity and the underlying solar dynamo. Their apparent absence of structure observed during the total eclipses of throughout the Maunder Minimum has been interpreted as evidence of a significant change in the solar magnetic field from that during modern cycles. Eclipse observations available from the more recent Dalton Minimum may be able to provide further information, sunspot activity being between the levels seen during recent cycles and in the Maunder Minimum. Here, we show and examine two graphical records of the total solar eclipse on 1806 June 16, during the Dalton Minimum. These records show significant rays and streamers around an inner ring. The ring is estimated to be ~ 0.44 R_S in width and the streamers in excess of 11.88 R_S in length. In combination with records of spicules or prominences, these eclipse records visually contrast the Dalton Minimum with the Maunder Minimum in terms of their coronal structure and support the existing discussions based on the sunspot observations. These eclipse records are broadly consistent with the solar cycle phase in the modelled open solar flux and the reconstructed slow solar wind at most latitudes.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Forecasting the Ambient Solar Wind with Numerical Models. II. An Adaptive Prediction System for Specifying Solar Wind Speed Near the Sun
Authors:
Martin A. Reiss,
Peter J. MacNeice,
Karin Muglach,
Charles N. Arge,
Christian Möstl,
Pete Riley,
Jürgen Hinterreiter,
Rachel Bailey,
Mathew J. Owens,
Tanja Amerstorfer,
Ute Amerstorfer
Abstract:
The ambient solar wind flows and fields influence the complex propagation dynamics of coronal mass ejections in the interplanetary medium and play an essential role in sha** Earth's space weather environment. A critical scientific goal in the space weather research and prediction community is to develop, implement and optimize numerical models for specifying the large-scale properties of solar w…
▽ More
The ambient solar wind flows and fields influence the complex propagation dynamics of coronal mass ejections in the interplanetary medium and play an essential role in sha** Earth's space weather environment. A critical scientific goal in the space weather research and prediction community is to develop, implement and optimize numerical models for specifying the large-scale properties of solar wind conditions at the inner boundary of the heliospheric model domain. Here we present an adaptive prediction system that fuses information from in situ measurements of the solar wind into numerical models to better match the global solar wind model solutions near the Sun with prevailing physical conditions in the vicinity of Earth. In this way, we attempt to advance the predictive capabilities of well-established solar wind models for specifying solar wind speed, including the Wang-Sheeley-Arge (WSA) model. In particular, we use the Heliospheric Upwind eXtrapolation (HUX) model for map** the solar wind solutions from the near-Sun environment to the vicinity of Earth. In addition, we present the newly developed Tunable HUX (THUX) model which solves the viscous form of the underlying Burgers equation. We perform a statistical analysis of the resulting solar wind predictions for the time 2006-2015. The proposed prediction scheme improves all the investigated coronal/heliospheric model combinations and produces better estimates of the solar wind state at Earth than our reference baseline model. We discuss why this is the case, and conclude that our findings have important implications for future practice in applied space weather research and prediction.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Fast Gunrock Subgraph Matching (GSM) on GPUs
Authors:
Leyuan Wang,
John D. Owens
Abstract:
In this paper, we propose a GPU-efficient subgraph isomorphism algorithm using the Gunrock graph analytic framework, GSM (Gunrock Subgraph Matching), to compute graph matching on GPUs. In contrast to previous approaches on the CPU which are based on depth-first traversal, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy. The advantage of using BFS-based tr…
▽ More
In this paper, we propose a GPU-efficient subgraph isomorphism algorithm using the Gunrock graph analytic framework, GSM (Gunrock Subgraph Matching), to compute graph matching on GPUs. In contrast to previous approaches on the CPU which are based on depth-first traversal, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy. The advantage of using BFS-based traversal is that we can leverage the massively parallel processing capabilities of the GPU. The disadvantage is the generation of more intermediate results. We propose several optimization techniques to cope with the problem. Our implementation follows a filtering-and-verification strategy. While most previous work on GPUs requires one-/two-step joining, we use one-step verification to decide the candidates in current frontier of nodes. Our implementation has a speedup up to 4x over previous GPU state-of-the-art implementation.
△ Less
Submitted 11 March, 2020; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Inelastic collisions in radiofrequency-dressed mixtures of ultracold atoms
Authors:
Elliot Bentine,
Adam J. Barker,
Kathrin Luksch,
Shinichi Sunami,
Tiffany L. Harte,
Ben Yuen,
Christopher J. Foot,
Daniel J. Owens,
Jeremy M. Hutson
Abstract:
Radiofrequency (RF)-dressed potentials are a promising technique for manipulating atomic mixtures, but so far little work has been undertaken to understand the collisions of atoms held within these traps. In this work, we dress a mixture of 85Rb and 87Rb with RF radiation, characterize the inelastic loss that occurs, and demonstrate species-selective manipulations. Our measurements show the loss i…
▽ More
Radiofrequency (RF)-dressed potentials are a promising technique for manipulating atomic mixtures, but so far little work has been undertaken to understand the collisions of atoms held within these traps. In this work, we dress a mixture of 85Rb and 87Rb with RF radiation, characterize the inelastic loss that occurs, and demonstrate species-selective manipulations. Our measurements show the loss is caused by two-body 87Rb+85Rb collisions, and we show the inelastic rate coefficient varies with detuning from the RF resonance. We explain our observations using quantum scattering calculations, which give reasonable agreement with the measurements. The calculations consider magnetic fields both perpendicular to the plane of RF polarization and tilted with respect to it. Our findings have important consequences for future experiments that dress mixtures with RF fields.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Unsupervised Object Segmentation with Explicit Localization Module
Authors:
Weitang Liu,
Lifeng Wei,
James Sharpnack,
John D. Owens
Abstract:
In this paper, we propose a novel architecture that iteratively discovers and segments out the objects of a scene based on the image reconstruction quality. Different from other approaches, our model uses an explicit localization module that localizes objects of the scene based on the pixel-level reconstruction qualities at each iteration, where simpler objects tend to be reconstructed better at e…
▽ More
In this paper, we propose a novel architecture that iteratively discovers and segments out the objects of a scene based on the image reconstruction quality. Different from other approaches, our model uses an explicit localization module that localizes objects of the scene based on the pixel-level reconstruction qualities at each iteration, where simpler objects tend to be reconstructed better at earlier iterations and thus are segmented out first. We show that our localization module improves the quality of the segmentation, especially on a challenging background.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
On the shape of the $\bar d-\bar u$ asymmetry
Authors:
A. Accardi,
C. E. Keppel,
S. Li,
W. Melnitchouk,
J. F. Owens
Abstract:
Using data from a recent reanalysis of neutron structure functions extracted from inclusive proton and deuteron deep-inelastic scattering (DIS), we re-examine the constraints on the shape of the $\bar d-\bar u$ asymmetry in the proton at large parton momentum fractions $x$. A global analysis of the proton-neutron structure function difference from BCDMS, NMC, SLAC and Jefferson Lab DIS measurement…
▽ More
Using data from a recent reanalysis of neutron structure functions extracted from inclusive proton and deuteron deep-inelastic scattering (DIS), we re-examine the constraints on the shape of the $\bar d-\bar u$ asymmetry in the proton at large parton momentum fractions $x$. A global analysis of the proton-neutron structure function difference from BCDMS, NMC, SLAC and Jefferson Lab DIS measurements, and of Fermilab Drell-Yan lepton-pair production cross sections, suggests that existing data can be well described with $\bar d > \bar u$ for all values of $x$ currently accessible. We compare the shape of the fitted $\bar d-\bar u$ distributions with expectations from nonperturbative models based on chiral symmetry breaking, which can be tested by upcoming Drell-Yan data from the SeaQuest experiment at larger values of $x$.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
RDMA vs. RPC for Implementing Distributed Data Structures
Authors:
Benjamin Brock,
Yuxin Chen,
Jiakun Yan,
John D. Owens,
Aydın Buluç,
Katherine Yelick
Abstract:
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a ha…
▽ More
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures.
△ Less
Submitted 14 October, 2019; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Fast BFS-Based Triangle Counting on GPUs
Authors:
Leyuan Wang,
John D. Owens
Abstract:
In this paper, we propose a novel method to compute triangle counting on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based by traversing the graph in an all-source-BFS manner and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation uses the Gunrock programming model and we evaluate our implementation in runtime and memory consumption compar…
▽ More
In this paper, we propose a novel method to compute triangle counting on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based by traversing the graph in an all-source-BFS manner and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation uses the Gunrock programming model and we evaluate our implementation in runtime and memory consumption compared with previous state-of-the-art work. We sustain a peak traversed-edges-per-second (TEPS) rate of nearly 10 GTEPS. Our algorithm is the most scalable and parallel among all existing GPU implementations and also outperforms all existing CPU distributed implementations. This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge 2019, demonstrating a geometric mean speedup over the 2018 champion of 3.84x.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
nCTEQ PDFs at the LHC: Vector boson production in heavy ion collisions
Authors:
The nCTEQ Collaboration,
D. B. Clark,
E. Godat,
T. J. Hobbs,
T. Ježo,
J. Kent,
C. Keppel,
M. Klasen,
K. Kovarík,
A. Kusina,
F. Lyonnet,
J. G. Morfin,
F. I. Olness,
J. F. Owens,
I. Schienbein,
J. Y. Yu
Abstract:
Extraction of the strange quark PDF is a long-standing puzzle. We use the nCTEQ nPDFs with uncertainties to study the impact of the LHC W/Z production data on both the flavor differentiation and nuclear corrections; this complements the information from neutrino-DIS data. As the proton flavor determination is dependent on nuclear corrections (from heavy target DIS, for example), LHC heavy ion meas…
▽ More
Extraction of the strange quark PDF is a long-standing puzzle. We use the nCTEQ nPDFs with uncertainties to study the impact of the LHC W/Z production data on both the flavor differentiation and nuclear corrections; this complements the information from neutrino-DIS data. As the proton flavor determination is dependent on nuclear corrections (from heavy target DIS, for example), LHC heavy ion measurements can also help improve proton PDFs. We introduce a new implementation of the nCTEQ code (nCTEQ++) based on C++ which has a modular strucure and enables us to easily integrate programs such as HOPPET, APPLgrid, and MCFM. Using ApplGrids generated from MCFM, we use nCTEQ++ to perform a preliminary fit including the pPb LHC W/Z vector boson data.
△ Less
Submitted 1 September, 2019;
originally announced September 2019.
-
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
Authors:
Carl Yang,
Aydin Buluc,
John D. Owens
Abstract:
High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph…
▽ More
High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.
△ Less
Submitted 14 June, 2021; v1 submitted 4 August, 2019;
originally announced August 2019.
-
Conceptual design and first results for a neutron detector with interaction localization capabilities
Authors:
J. Heideman,
D. Perez-Loureiro,
R. Grzywacz,
C. R. Thornsberry,
J. Chan,
L. H. Heilbronn,
S. K. Neupane,
K. Schmitt,
M. M. Rajabali,
A. R. Engelhardt,
C. W. Howell,
L. D. Mostella,
J. S. Owens,
S. C. Shadrick,
E. E. Peters,
A. P. D. Ramirez,
S. W. Yates,
K. Vaigneur
Abstract:
A new high-precision detector for studying neutrons from beta-delayed neutron emission and direct reaction studies is proposed. The Neutron dEtector with Xn Tracking (NEXT) array is designed to maintain high intrinsic neutron detection efficiency while reducing uncertainties in neutron energy measurements. A single NEXT module is composed of thin segments of plastic scintillator, each optically se…
▽ More
A new high-precision detector for studying neutrons from beta-delayed neutron emission and direct reaction studies is proposed. The Neutron dEtector with Xn Tracking (NEXT) array is designed to maintain high intrinsic neutron detection efficiency while reducing uncertainties in neutron energy measurements. A single NEXT module is composed of thin segments of plastic scintillator, each optically separated, capable of neutron-gamma discrimination. Each segmented module is coupled to position sensitive photodetectors enabling the high-precision determination of neutron time of arrival and interaction position within the active volume. A design study has been conducted based on simulations and experimental tests leading to the construction of prototype units. First results from measurements using a $^{252}$Cf neutron source and accelerator-produced monoenergetic neutrons are presented.
△ Less
Submitted 5 June, 2019; v1 submitted 31 March, 2019;
originally announced April 2019.
-
VoroCrust: Voronoi Meshing Without Clip**
Authors:
Ahmed Abdelkader,
Chandrajit L. Bajaj,
Mohamed S. Ebeida,
Ahmed H. Mahmoud,
Scott A. Mitchell,
John D. Owens,
Ahmad A. Rushdi
Abstract:
Polyhedral meshes are increasingly becoming an attractive option with particular advantages over traditional meshes for certain applications. What has been missing is a robust polyhedral meshing algorithm that can handle broad classes of domains exhibiting arbitrarily curved boundaries and sharp features. In addition, the power of primal-dual mesh pairs, exemplified by Voronoi-Delaunay meshes, has…
▽ More
Polyhedral meshes are increasingly becoming an attractive option with particular advantages over traditional meshes for certain applications. What has been missing is a robust polyhedral meshing algorithm that can handle broad classes of domains exhibiting arbitrarily curved boundaries and sharp features. In addition, the power of primal-dual mesh pairs, exemplified by Voronoi-Delaunay meshes, has been recognized as an important ingredient in numerous formulations. The VoroCrust algorithm is the first provably-correct algorithm for conforming polyhedral Voronoi meshing for non-convex and non-manifold domains with guarantees on the quality of both surface and volume elements. A robust refinement process estimates a suitable sizing field that enables the careful placement of Voronoi seeds across the surface circumventing the need for clip** and avoiding its many drawbacks. The algorithm has the flexibility of filling the interior by either structured or random samples, while preserving all sharp features in the output mesh. We demonstrate the capabilities of the algorithm on a variety of models and compare against state-of-the-art polyhedral meshing methods based on clipped Voronoi cells establishing the clear advantage of VoroCrust output.
△ Less
Submitted 22 November, 2023; v1 submitted 23 February, 2019;
originally announced February 2019.
-
A homogeneous aa index: 2. hemispheric asymmetries and the equinoctial variation
Authors:
Mike Lockwood,
Ivan D. Finch,
Aude Chambodut,
Luke A. Barnard,
Mathew J. Owens,
Ellen Clarke
Abstract:
Paper 1 [Lockwood et al., 2018] generated annual means of a new version of the $aa$ geomagnetic activity index which includes corrections for secular drift in the geographic coordinates of the auroral oval, thereby resolving the difference between the centennial-scale change in the northern and southern hemisphere indices, $aa_N$ and $aa_S$. However, other hemispheric asymmetries in the $aa$ index…
▽ More
Paper 1 [Lockwood et al., 2018] generated annual means of a new version of the $aa$ geomagnetic activity index which includes corrections for secular drift in the geographic coordinates of the auroral oval, thereby resolving the difference between the centennial-scale change in the northern and southern hemisphere indices, $aa_N$ and $aa_S$. However, other hemispheric asymmetries in the $aa$ index remain: in particular, the distributions of 3-hourly $aa_N$ and $aa_S$ values are different and the correlation between them is not high on this timescale ($r = 0.66$). In the present paper, a location-dependant station sensitivity model is developed using the $am$ index (derived from a much more extensive network of stations in both hemispheres) and used to reduce the difference between the hemispheric $aa$ indices and improve their correlation (to $r = 0.79$) by generating corrected 3-hourly hemispheric indices, $aa_{HN}$ and $aa_{HS}$, which also include the secular drift corrections detailed in Paper 1. These are combined into a new, 'homogeneous' $aa$ index, $aa_H$. It is shown that $aa_H$, unlike $aa$, reveals the 'equinoctial'-like time-of-day/time-of-year pattern that is found for the $am$ index.
△ Less
Submitted 17 December, 2018; v1 submitted 24 November, 2018;
originally announced November 2018.
-
A homogeneous aa index: 1. Secular variation
Authors:
Mike Lockwood,
Aude Chambodut,
Luke A. Barnard,
Mathew J. Owens,
Ellen Clarke,
Véronique Mendel
Abstract:
Originally complied for 1868-1967 and subsequently continued so that it now covers 150 years, the $aa$ index has become a vital resource for studying space climate change. However, there have been debates about the inter-calibration of data from the different stations. In addition, the effects of secular change in the geomagnetic field have not previously been allowed for. As a result, the compone…
▽ More
Originally complied for 1868-1967 and subsequently continued so that it now covers 150 years, the $aa$ index has become a vital resource for studying space climate change. However, there have been debates about the inter-calibration of data from the different stations. In addition, the effects of secular change in the geomagnetic field have not previously been allowed for. As a result, the components of the 'classical' $aa$ index for the southern and northern hemispheres ($aa_S$ and $aa_N$) have drifted apart. We here separately correct both $aa_S$ and $aa_N$ for both these effects using the same method as used to generate the classic $aa$ values but allowing $δ$, the minimum angular separation of each station from a nominal auroral oval, to vary as calculated using the IGRF-12 and gufm1 models of the intrinsic geomagnetic field. Our approach is to correct the quantized aK-values for each station, originally scaled on the assumption that $δ$ values are constant, with time-dependent scale factors that allow for the drift in $δ$. This requires revisiting the intercalibration of successive stations used in making the $aa_S$ and $aa_N$ composites. These intercalibrations are defined using independent data and daily averages from 11 years before and after each station change and it is shown that they depend on the time of year. This procedure produces new homogenized hemispheric aa indices, $aa_{HS}$ and $aa_{HN}$, which show centennial-scale changes that are in very close agreement. Calibration problems with the classic $aa$ index are shown to have arisen from drifts in $δ$ combined with simpler corrections which gave an incorrect temporal variation and underestimate the rise in $aa$ during the 20th century by about 15%.
△ Less
Submitted 17 December, 2018; v1 submitted 24 November, 2018;
originally announced November 2018.