Search | arXiv e-print repository

doi 10.22323/1.430.0338

Maximizing the Bang Per Bit

Authors: M. A. Clark, Dean Howarth, Jiqun Tu, Mathias Wagner, Evan Weinberg

Abstract: Reducing memory traffic is critical to accelerate Lattice QCD computations on modern processors, given that such computations are memory-bandwidth bound. A commonly used strategy is mixed-precision solvers, however, these require careful treatment to ensure stable convergence. We give an overview of the strategies employed in QUDA to stabilize mixed-precision variants of Conjugate Gradient (CG), a… ▽ More Reducing memory traffic is critical to accelerate Lattice QCD computations on modern processors, given that such computations are memory-bandwidth bound. A commonly used strategy is mixed-precision solvers, however, these require careful treatment to ensure stable convergence. We give an overview of the strategies employed in QUDA to stabilize mixed-precision variants of Conjugate Gradient (CG), and its multi-shift brethren. Through the use of customized numerical storage formats we can significantly improve upon the precision achievable compared to IEEE numerical formats, increasing both the solver precision and stability achievable at fixed word size. We give examples using BiCGStab(l) and multi-shift CG solvers using the HISQ operator. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 14 pages, 4 figures

Journal ref: Proceedings of The 39th International Symposium on Lattice Field Theory - PoS(LATTICE2022) 338

arXiv:2212.12559 [pdf, other]

doi 10.22323/1.430.0335

Optimizing Staggered Multigrid for Exascale performance

Authors: Venkitesh Ayyar, Richard Brower, M. A. Clark, Mathias Wagner, Evan Weinberg

Abstract: Adaptive multi-grid methods have proven very successful in dealing with critical slow down for the Wilson-Dirac solver in lattice gauge theory. Multi-grid algorithms developed for Staggered fermions using the Kähler-Dirac preconditioning~\cite{Brower:2018ymy} have shown remarkable success. In this work, we discuss the performance of this staggered multi-grid algorithm in four dimensions. We also d… ▽ More Adaptive multi-grid methods have proven very successful in dealing with critical slow down for the Wilson-Dirac solver in lattice gauge theory. Multi-grid algorithms developed for Staggered fermions using the Kähler-Dirac preconditioning~\cite{Brower:2018ymy} have shown remarkable success. In this work, we discuss the performance of this staggered multi-grid algorithm in four dimensions. We also demonstrate that offloading some components of a multi-shift solve to a multi-grid solver leads to a significant performance improvement in an existing MILC spectrum workflow on the Summit and Selene supercomputers. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: Submission to Proceedings of Lattice 2022: the 39th International Symposium on Lattice Field Theory, Bonn, Germany

arXiv:2201.03251 [pdf, other]

QED with massive photons for precision physics: zero modes and first result for the hadron spectrum

Authors: M. A. Clark, M. Della Morte, Z. Hall, B. Hörz, A. Nicholson, A. Shindler, J. T. Tsang, A. Walker-Loud, H. Yan

Abstract: The current precision reached by lattice QCD calculations of low-energy hadronic observables, requires not only the introduction of electromagnetic corrections, but also control over all the potential systematic uncertainties introduced by the lattice version of QED. Introducing a massive photon as an infrared regulator in lattice QED, provides a well defined theory, dubbed QEDM, amenable to numer… ▽ More The current precision reached by lattice QCD calculations of low-energy hadronic observables, requires not only the introduction of electromagnetic corrections, but also control over all the potential systematic uncertainties introduced by the lattice version of QED. Introducing a massive photon as an infrared regulator in lattice QED, provides a well defined theory, dubbed QEDM, amenable to numerical evaluation [arXiv:1507.08916]. The photon mass is removed through extrapolation. In this contribution we scrutinise aspects of QEDM such as the presence and fate of the zero modes contributions and we describe the determination of the photon mass corrections in finite and infinite volume. We demonstrate that the required extrapolations are well controlled using numerical data obtained on two ensembles which only differ in volume. △ Less

Submitted 10 January, 2022; originally announced January 2022.

Comments: 18 pages, 9 figures. Submitted as a conference proceeding for the 38th International Symposium on Lattice Field Theory (2021) contribution 281 (combines contributions 281 and 102)

arXiv:2201.01343 [pdf, other]

The hyperon spectrum from lattice QCD

Authors: Nolan Miller, Grant Bradley, M. A. Clark, Ben Hörz, Dean Howarth, Malcolm Lazarow, Henry Monge-Camacho, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas, André Walker-Loud

Abstract: Hyperon decays present a promising alternative for extracting $\vert V_{us} \vert$ from lattice QCD combined with experimental measurements. Currently $\vert V_{us} \vert$ is determined from the kaon decay widths and a lattice calculation of the associated form factor. In this proceeding, I will present preliminary work on a lattice determination of the hyperon mass spectrum. I will additionally s… ▽ More Hyperon decays present a promising alternative for extracting $\vert V_{us} \vert$ from lattice QCD combined with experimental measurements. Currently $\vert V_{us} \vert$ is determined from the kaon decay widths and a lattice calculation of the associated form factor. In this proceeding, I will present preliminary work on a lattice determination of the hyperon mass spectrum. I will additionally summarize future goals in which we will calculate the hyperon transition matrix elements, which will provide an alternative means for accessing $\vert V_{us} \vert$. This work is based on a particular formulation of SU(2) chiral perturbation theory for hyperons; determining the extent to which this effective field theory converges is instrumental in understanding the limits of its predictive power, especially since some hyperonic observables are difficult to calculate near the physical pion mass (e.g., hyperon-to-nucleon form factors), and thus the use of heavier than physical pion masses is likely to yield more precise results when combined with extrapolations to the physical point.} △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: 8 pages, 3 figures, presented at the 38th International Symposium on Lattice Field Theory (LATTICE2021)

Report number: RIKEN-iTHEMS-Report-22

arXiv:2112.04569 [pdf, other]

Toward a resolution of the NN controversy

Authors: Amy Nicholson, Evan Berkowitz, John Bulava, Chia Cheng Chang, M. A. Clark, Andrew D. Hanlon, Ben Horz, Dean Howarth, Christopher Korber, Wayne Tai Lee, Aaron S. Meyer, Henry Monge-Camacho, Colin Morningstar, Enrico Rinaldi, Pavlos Vranasc, Andre Walker-Loud

Abstract: Lattice QCD calculations of two-nucleon interactions have been underway for about a decade, but still haven't reached the pion mass regime necessary for matching onto effective field theories and extrapolating to the physical point. Furthermore, results from different methods, including the use of the Luscher formalism with different types of operators, as well as the HALQCD potential method, do n… ▽ More Lattice QCD calculations of two-nucleon interactions have been underway for about a decade, but still haven't reached the pion mass regime necessary for matching onto effective field theories and extrapolating to the physical point. Furthermore, results from different methods, including the use of the Luscher formalism with different types of operators, as well as the HALQCD potential method, do not agree even qualitatively at very heavy pion mass. We investigate the role that different operators employed in the literature may play on the extraction of spectra for use within the Luscher method. We first explore expectations from Effective Field Theory solved within a finite volume, for which the exact spectrum may be computed given different physical scenarios. We then present preliminary lattice QCD results for two-nucleon spectra calculated using different operators on a common lattice ensemble. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 11 pages, 7 figures, proceeding for The 38th International Symposium on Lattice Field Theory, LATTICE2021, 26th-30th July, 2021, Zoom/Gather@Massachusetts Institute of Technology

Report number: RIKEN-iTHEMS-Report-21

arXiv:2111.06333 [pdf, other]

Nucleon Axial Form Factor from Domain Wall on HISQ

Authors: Aaron S. Meyer, Evan Berkowitz, Chris Bouchard, Chia Cheng Chang, M. A. Clark, Ben Hörz, Dean Howarth, Christopher Körber, Henry Monge-Camacho, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas, André Walker-Loud

Abstract: The Deep Underground Neutrino Experiment (DUNE) is an upcoming neutrino oscillation experiment that is poised to answer key questions about the nature of neutrinos. Lattice QCD has the ability to make significant impact upon DUNE, beginning with computations of nucleon-neutrino interactions with weak currents. Nucleon amplitudes involving the axial form factor are part of the primary signal measur… ▽ More The Deep Underground Neutrino Experiment (DUNE) is an upcoming neutrino oscillation experiment that is poised to answer key questions about the nature of neutrinos. Lattice QCD has the ability to make significant impact upon DUNE, beginning with computations of nucleon-neutrino interactions with weak currents. Nucleon amplitudes involving the axial form factor are part of the primary signal measurement process for DUNE, and precise calculations from LQCD can significantly reduce the uncertainty for inputs into Monte Carlo generators. Recent calculations of the nucleon axial charge have demonstrated that sub-percent precision is possible on this vital quantity. In these proceedings, we discuss preliminary results for the CalLat collaboration's calculation of the axial form factor of the nucleon. These computations are performed with Möbius domain wall valence quarks on HISQ sea quark ensembles generated by the MILC and CalLat collaborations. The results use a variety of ensembles including several at physical pion mass. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: The 38th International Symposium on Lattice Field Theory, LATTICE2021 26th-30th July, 2021 Zoom/Gather@Massachusetts Institute of Technology 9 pages, 2 figures

Report number: RIKEN-iTHEMS-Report-21

arXiv:2104.05615 [pdf, other]

doi 10.1145/3468267.3470613

Solving DWF Dirac Equation Using Multi-splitting Preconditioned Conjugate Gradient with Tensor Cores on NVIDIA GPUs

Authors: Jiqun Tu, M. A. Clark, Chulwoo Jung, Robert Mawhinney

Abstract: We show that using the multi-splitting algorithm as a preconditioner for the domain wall Dirac linear operator, arising in lattice QCD, effectively reduces the inter-node communication cost, at the expense of performing more on-node floating point and memory operations. Correctly including the boundary \textit{snake} terms, the preconditioner is implemented in the QUDA framework, where it is found… ▽ More We show that using the multi-splitting algorithm as a preconditioner for the domain wall Dirac linear operator, arising in lattice QCD, effectively reduces the inter-node communication cost, at the expense of performing more on-node floating point and memory operations. Correctly including the boundary \textit{snake} terms, the preconditioner is implemented in the QUDA framework, where it is found that utilizing kernel fusion and the tensor cores on NVIDIA GPUs is necessary to achieve a sufficiently performant preconditioner. A reduced-dimension (reduced-$L_s$) strategy is also proposed and tested for the preconditioner. We find the method achieves lower time to solution than regular CG at high node count despite the additional local computational requirements from the preconditioner. This method could be useful for supercomputers with more on-node flops and memory bandwidth than inter-node communication bandwidth. △ Less

Submitted 8 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Add DOI

Journal ref: PASC '21: Proceedings of the Platform for Advanced Scientific Computing Conference, July 2021, Article No.: 9, Pages 1-11

arXiv:2104.05226 [pdf, other]

doi 10.1103/PhysRevC.105.065203

Detailed analysis of excited state systematics in a lattice QCD calculation of $g_A$

Authors: **chen He, David A. Brantley, Chia Cheng Chang, Ivan Chernyshev, Evan Berkowitz, Dean Howarth, Christopher Körber, Aaron S. Meyer, Henry Monge-Camacho, Enrico Rinaldi, Chris Bouchard, M. A. Clark, Arjun Singh Gambhir, Christopher J. Monahan, Amy Nicholson, Pavlos Vranas, André Walker-Loud

Abstract: Excited state contamination remains one of the most challenging sources of systematic uncertainty to control in lattice QCD calculations of nucleon matrix elements and form factors: early time separations are contaminated by excited states and late times suffer from an exponentially bad signal-to-noise problem. High-statistics calculations at large time separations $\gtrsim1$ fm are commonly used… ▽ More Excited state contamination remains one of the most challenging sources of systematic uncertainty to control in lattice QCD calculations of nucleon matrix elements and form factors: early time separations are contaminated by excited states and late times suffer from an exponentially bad signal-to-noise problem. High-statistics calculations at large time separations $\gtrsim1$ fm are commonly used to combat these issues. In this work, focusing on $g_A$, we explore the alternative strategy of utilizing a large number of relatively low-statistics calculations at short to medium time separations (0.2--1 fm), combined with a multi-state analysis. On an ensemble with a pion mass of approximately 310 MeV and a lattice spacing of approximately 0.09 fm, we find this provides a more robust and economical method of quantifying and controlling the excited state systematic uncertainty. A quantitative separation of various types of excited states enables the identification of the transition matrix elements as the dominant contamination. The excited state contamination of the Feynman-Hellmann correlation function is found to reduce to the 1% level at approximately 1 fm while for the more standard three-point functions, this does not occur until after 2 fm. Critical to our findings is the use of a global minimization, rather than fixing the spectrum from the two-point functions and using them as input to the three-point analysis. We find that the ground state parameters determined in such a global analysis are stable against variations in the excited state model, the number of excited states, and the truncation of early-time or late-time numerical data. △ Less

Submitted 9 June, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: v2: updates based on referee comments and some community response, consistent with published version; v1: 13 pages plus appendices. The correlation function data and analysis code accompanying this publication can be accessed at this github repository: https://github.com/callat-qcd/project_fh_vs_3pt

Report number: JLAB-THY-21-3350, RIKEN-iTHEMS-Report-21

Journal ref: Phys. Rev. C 105, 065203 (2022)

arXiv:2011.12166 [pdf, other]

doi 10.1103/PhysRevD.103.054511

Scale setting the Möbius Domain Wall Fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scales $t_0$ and $w_0$

Authors: Nolan Miller, Logan C Carpenter, Evan Berkowitz, Chia Cheng Chang, Ben Hörz, Dean Howarth, Henry Monge-Camacho, Enrico Rinaldi, David A. Brantley, Christopher Körber, Chris Bouchard, M. A. Clark, Arjun Singh Gambhir, Christopher J. Monahan, Amy Nicholson, Pavlos Vranas, André Walker-Loud

Abstract: We report on a sub-percent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of $N_f=2+1+1$ highly improved, rooted staggered sea-quark configurations generated by the MILC and CalLat Collaborations. The valence quark action used is Möbius Domain-Wall fermions solved on these configurations after a gradient-flow smearing is ap… ▽ More We report on a sub-percent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of $N_f=2+1+1$ highly improved, rooted staggered sea-quark configurations generated by the MILC and CalLat Collaborations. The valence quark action used is Möbius Domain-Wall fermions solved on these configurations after a gradient-flow smearing is applied with a flowtime of $t_{\rm gf}=1$ in lattice units. The ensembles span four lattice spacings in the range $0.06 \lesssim a \lesssim 0.15$ fm, six pion masses in the range $130 \lesssim m_π\lesssim 400$ MeV and multiple lattice volumes. On each ensemble, the gradient-flow scales $t_0/a^2$ and $w_0/a$ and the omega baryon mass $a m_Ω$ are computed. The dimensionless product of these quantities is then extrapolated to the continuum and infinite volume limits and interpolated to the physical light, strange and charm quark mass point in the isospin limit, resulting in the determination of $\sqrt{t_0}=0.1422(14)$ fm and $w_0 = 0.1709(11)$ fm with all sources of statistical and systematic uncertainty accounted for. The dominant uncertainty in this result is the stochastic uncertainty, providing a clear path for a few-per-mille uncertainty, as recently obtained by the Budapest-Marseille-Wuppertal Collaboration. △ Less

Submitted 15 April, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: v3: Published version; v2: Added determination of t_0 as well as w_0; v1: 13 pages plus appendices. The correlation function data, mass results and analysis code accompanying this publication can be found at this github repository: https://github.com/callat-qcd/project_scale_setting_mdwf_hisq

Report number: LLNL-JRNL-816949, RIKEN-iTHEMS-Report-20, JLAB-THY-20-3290

Journal ref: Phys. Rev. D 103, 054511 (2021)

arXiv:2009.11825 [pdf, other]

doi 10.1103/PhysRevC.103.014003

Two-nucleon S-wave interactions at the $SU(3)$ flavor-symmetric point with $m_{ud}\simeq m_s^{\rm phys}$: a first lattice QCD calculation with the stochastic Laplacian Heaviside method

Authors: Ben Hörz, Dean Howarth, Enrico Rinaldi, Andrew Hanlon, Chia Cheng Chang, Christopher Körber, Evan Berkowitz, John Bulava, M. A. Clark, Wayne Tai Lee, Colin Morningstar, Amy Nicholson, Pavlos Vranas, André Walker-Loud

Abstract: We report on the first application of the stochastic Laplacian Heaviside method for computing multi-particle interactions with lattice QCD to the two-nucleon system. Like the Laplacian Heaviside method, this method allows for the construction of interpolating operators which can be used to construct a positive definite set of two-nucleon correlation functions, unlike nearly all other applications… ▽ More We report on the first application of the stochastic Laplacian Heaviside method for computing multi-particle interactions with lattice QCD to the two-nucleon system. Like the Laplacian Heaviside method, this method allows for the construction of interpolating operators which can be used to construct a positive definite set of two-nucleon correlation functions, unlike nearly all other applications of lattice QCD to two nucleons in the literature. It also allows for a variational analysis in which optimal linear combinations of the interpolating operators are formed that couple predominantly to the eigenstates of the system. Utilizing such methods has become of paramount importance in order to help resolve the discrepancy in the literature on whether two nucleons in either isospin channel form a bound state at pion masses heavier than physical, with the discrepancy persisting even in the $SU(3)$-flavor symmetric point with all quark masses near the physical strange quark mass. This is the first in a series of papers aimed at resolving this discrepancy. In the present work, we employ the stochastic Laplacian Heaviside method without a hexaquark operator in the basis at a lattice spacing of $a\sim0.086$~fm, lattice volume of $L=48a\simeq4.1$~fm and pion mass $m_π\simeq714$ MeV. With this setup, the observed spectrum of two-nucleon energy levels strongly disfavors the presence of a bound state in either the deuteron or dineutron channel. △ Less

Submitted 5 January, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

Comments: v2: version to be published in Phys. Rev. C.; v1: 13 pages plus figures and appendices

Report number: LLNL-JRNL-813871, RIKEN-iTHEMS-Report-20, MITP/20-055

Journal ref: Phys. Rev. C 103, 014003 (2021)

arXiv:2005.04795 [pdf, other]

doi 10.1103/PhysRevD.102.034507

$F_K / F_π$ from Möbius domain-wall fermions solved on gradient-flowed HISQ ensembles

Authors: Nolan Miller, Henry Monge-Camacho, Chia Cheng Chang, Ben Hörz, Enrico Rinaldi, Dean Howarth, Evan Berkowitz, David A. Brantley, Arjun Singh Gambhir, Christopher Körber, Christopher J. Monahan, M. A. Clark, Bálint Joó, Thorsten Kurth, Amy Nicholson, Kostas Orginos, Pavlos Vranas, André Walker-Loud

Abstract: We report the results of a lattice quantum chromodynamics calculation of $F_K/F_π$ using Möbius domain-wall fermions computed on gradient-flowed $N_f=2+1+1$ highly-improved staggered quark (HISQ) ensembles. The calculation is performed with five values of the pion mass ranging from $130 \lesssim m_π\lesssim 400$ MeV, four lattice spacings of $a\sim 0.15, 0.12, 0.09$ and $0.06$ fm and multiple valu… ▽ More We report the results of a lattice quantum chromodynamics calculation of $F_K/F_π$ using Möbius domain-wall fermions computed on gradient-flowed $N_f=2+1+1$ highly-improved staggered quark (HISQ) ensembles. The calculation is performed with five values of the pion mass ranging from $130 \lesssim m_π\lesssim 400$ MeV, four lattice spacings of $a\sim 0.15, 0.12, 0.09$ and $0.06$ fm and multiple values of the lattice volume. The interpolation/extrapolation to the physical pion and kaon mass point, the continuum, and infinite volume limits are performed with a variety of different extrapolation functions utilizing both the relevant mixed-action effective field theory expressions as well as discretization-enhanced continuum chiral perturbation theory formulas. We find that the $a\sim0.06$ fm ensemble is helpful, but not necessary to achieve a subpercent determination of $F_K/F_π$. We also include an estimate of the strong isospin breaking corrections and arrive at a final result of $F_{K^\pm}/F_{π^\pm} = 1.1942(45)$ with all sources of statistical and systematic uncertainty included. This is consistent with the Flavour Lattice Averaging Group average value, providing an important benchmark for our lattice action. Combining our result with experimental measurements of the pion and kaon leptonic decays leads to a determination of $|V_{us}|/|V_{ud}| = 0.2311(10)$. △ Less

Submitted 3 September, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

Comments: v3: published version; v2: version submitted to journal; v1: 26 pages including 13 figures, appendices, and references. See https://github.com/callat-qcd/project_fkfpi for the analysis and data

Report number: LLNL-JRNL-809712, RIKEN-iTHEMS-Report-20, JLAB-THY-20-3192

Journal ref: Phys. Rev. D 102, 034507 (2020)

arXiv:2004.07732 [pdf, other]

doi 10.1103/PhysRevD.102.094517

Multigrid for Chiral Lattice Fermions: Domain Wall

Authors: Richard C. Brower, M. A. Clark, Dean Howarth, Evan S. Weinberg

Abstract: Critical slowing down for the Krylov Dirac solver presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. We propose a new multi-grid approach for chiral fermions, applicable to both the 5-d domain wall or 4-d Overlap operator. The central idea is to directly coarsen the 4-d Wilson kernel, giving an effective domain wall or overlap operator on… ▽ More Critical slowing down for the Krylov Dirac solver presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. We propose a new multi-grid approach for chiral fermions, applicable to both the 5-d domain wall or 4-d Overlap operator. The central idea is to directly coarsen the 4-d Wilson kernel, giving an effective domain wall or overlap operator on each level. We provide here an explicit construction for the Shamir domain wall formulation with numerical tests for the 2-d Schwinger prototype, demonstrating near ideal multi-grid scaling. The framework is designed for a natural extension to 4-d lattice QCD chiral fermions, such as the Möbius, Zolotarev or Borici domain wall discretizations or directly to a rational expansion of the 4-d Overlap operator. For the Shamir operator, the effective overlap operator is isolated by the use of a Pauli-Villars preconditioner in the spirit of the Kähler-Dirac spectral map used in a recent staggered MG algorithm [1]. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: 39 pages, 13 figures

arXiv:1912.08321 [pdf, other]

doi 10.22323/1.317.0020

Lattice QCD Determination of $g_A$

Authors: André Walker-Loud, Evan Berkowitz, David A. Brantley, Arjun Gambhir, Pavlos Vranas, Chris Bouchard, Chia Cheng Chang, M. A. Clark, Nicolas Garron, Bálint Joó, Thorsten Kurth, Henry Monge-Camacho, Amy Nicholson, Christopher J. Monahan, Kostas Orginos, Enrico Rinaldi

Abstract: The nucleon axial coupling, $g_A$, is a fundamental property of protons and neutrons, dictating the strength with which the weak axial current of the Standard Model couples to nucleons, and hence, the lifetime of a free neutron. The prominence of $g_A$ in nuclear physics has made it a benchmark quantity with which to calibrate lattice QCD calculations of nucleon structure and more complex calculat… ▽ More The nucleon axial coupling, $g_A$, is a fundamental property of protons and neutrons, dictating the strength with which the weak axial current of the Standard Model couples to nucleons, and hence, the lifetime of a free neutron. The prominence of $g_A$ in nuclear physics has made it a benchmark quantity with which to calibrate lattice QCD calculations of nucleon structure and more complex calculations of electroweak matrix elements in one and few nucleon systems. There were a number of significant challenges in determining $g_A$, notably the notorious exponentially-bad signal-to-noise problem and the requirement for hundreds of thousands of stochastic samples, that rendered this goal more difficult to obtain than originally thought. I will describe the use of an unconventional computation method, coupled with "ludicrously'" fast GPU code, access to publicly available lattice QCD configurations from MILC and access to leadership computing that have allowed these challenges to be overcome resulting in a determination of $g_A$ with 1% precision and all sources of systematic uncertainty controlled. I will discuss the implications of these results for the convergence of $SU(2)$ Chiral Perturbation theory for nucleons, as well as prospects for further improvements to $g_A$ (sub-percent precision, for which we have preliminary results) which is part of a more comprehensive application of lattice QCD to nuclear physics. This is particularly exciting in light of the new CORAL supercomputers coming online, Sierra and Summit, for which our lattice QCD codes achieve a machine-to-machine speed up over Titan of an order of magnitude. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: Plenary presentation at The 9th International workshop on Chiral Dynamics

Report number: RIKEN-iTHEMS-Report-19, LLNL-PROC-800060

Journal ref: POS(CD2018)020

arXiv:1905.03355 [pdf, other]

The Stochastic Feynman-Hellmann Method

Authors: Arjun Singh Gambhir, Evan Berkowitz, David Brantley, Chia Cheng Chang, M. A. Clark, Thorsten Kurth, Chris Monahan, Amy Nicholson, Pavlos Vranas, André Walker-Loud

Abstract: The Feynman-Hellmann method, as implemented by Bouchard et al. [1612.06963], was recently employed successfully to determine the nucleon axial charge. A limitation of the method was the restriction to a single operator and a single momentum during the computation of each "Feynman- Hellmann" propagator. By using stochastic techniques to estimate the all-to-all propagator, we relax this constraint a… ▽ More The Feynman-Hellmann method, as implemented by Bouchard et al. [1612.06963], was recently employed successfully to determine the nucleon axial charge. A limitation of the method was the restriction to a single operator and a single momentum during the computation of each "Feynman- Hellmann" propagator. By using stochastic techniques to estimate the all-to-all propagator, we relax this constraint and demonstrate the successful implementation of this new method. We show reproduction of the axial charge on a test ensemble and non-zero momentum transfer points of the axial and vector form factors. △ Less

Submitted 8 May, 2019; originally announced May 2019.

Comments: 7 pages, 5 figures, Proceedings for The 36th International Symposium on Lattice Field Theory

Report number: INT-PUB-19-003

Journal ref: PoS(LATTICE2018)126

arXiv:1904.12055 [pdf, other]

Short Range Operator Contributions to $0νββ$ decay from LQCD

Authors: Henry Monge-Camacho, Evan Berkowitz, David Brantley, Chia Cheng Chang, M. A. Clark, Arjun Gambhir, Nicolas Garrón, Bálint Joó, Thorsten Kurth, Amy Nicholson, Enrico Rinaldi, Brian Tiburzi, Pavlos Vranas, André Walker-Loud

Abstract: The search for neutrinoless double beta decay of nuclei is believed to be one of the most promising means to search for new physics. Observation of this very rare nuclear process, which violates Lepton Number conservation, would imply the neutrino sector has a Majorana mass component and may also provide an explanation for the universe matter-antimatter asymmetry of the universe. In the case where… ▽ More The search for neutrinoless double beta decay of nuclei is believed to be one of the most promising means to search for new physics. Observation of this very rare nuclear process, which violates Lepton Number conservation, would imply the neutrino sector has a Majorana mass component and may also provide an explanation for the universe matter-antimatter asymmetry of the universe. In the case where a heavy intermediate particle is exchanged in this process, QCD contributions from short range interactions become relevant and the calculation of matrix elements with four-quark operators becomes necessary. In these proceedings we will discuss our current progress in the calculation of these four-quark operators from LQCD. △ Less

Submitted 26 April, 2019; originally announced April 2019.

Comments: Proceedings of the 36th Annual International Symposium on Lattice Field Theory, 22-28 July, 2018, Michigan State University, East Lansing, Michigan

Report number: RBRC-1297,LLNL-PROC-772237

arXiv:1902.09416 [pdf, other]

Progress in Multibaryon Spectroscopy

Authors: Evan Berkowitz, David Brantley, Kenneth McElvain, André Walker-Loud, Chia Cheng Chang, M. A. Clark, Thorsten Kurth, Bálint Joó, Henry Monge-Camacho, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas

Abstract: Anchoring the nuclear interaction in QCD is a long-outstanding problem in nuclear physics. While the lattice community has made enormous progress in mesonic physics and single nucleon physics, continuum-limit physical-point multi-nucleon physics has remained out of reach. I will review CalLat's strategy for multi-nucleon spectroscopy and our latest results. Anchoring the nuclear interaction in QCD is a long-outstanding problem in nuclear physics. While the lattice community has made enormous progress in mesonic physics and single nucleon physics, continuum-limit physical-point multi-nucleon physics has remained out of reach. I will review CalLat's strategy for multi-nucleon spectroscopy and our latest results. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Proceedings of the 36th Annual International Symposium on Lattice Field Theory, 22-28 July, 2018, Michigan State University, East Lansing, Michigan

Report number: RBRC-1299, RIKEN-iTHEMS-Report-19

arXiv:1812.11127 [pdf, other]

Symmetries and Interactions from Lattice QCD

Authors: A. Nicholson, E. Berkowitz, H. Monge-Camacho, D. Brantley, N. Garron, C. C. Chang, E. Rinaldi, C. Monahan, C. Bouchard, M. A. Clark, B. Joo, T. Kurth, B. C. Tiburzi, P. Vranas, A. Walker-Loud

Abstract: Precision experimental tests of the Standard Model of particle physics (SM) are one of our best hopes for discovering what new physics lies beyond the SM (BSM). Key in the search for new physics is the connection between theory and experiment. Forging this connection for searches involving low-energy hadronic or nuclear environments requires the use of a non-perturbative theoretical tool, lattice… ▽ More Precision experimental tests of the Standard Model of particle physics (SM) are one of our best hopes for discovering what new physics lies beyond the SM (BSM). Key in the search for new physics is the connection between theory and experiment. Forging this connection for searches involving low-energy hadronic or nuclear environments requires the use of a non-perturbative theoretical tool, lattice QCD. We present two recent lattice QCD calculations by the CalLat collaboration relevant for new physics searches: the nucleon axial coupling, $g_A$, whose precise value as predicted by the SM could help point to new physics contributions to the so-called "neutron lifetime puzzle", and hadronic matrix elements of short-ranged operators relevant for neutrinoless double beta decay searches. △ Less

Submitted 28 December, 2018; originally announced December 2018.

Comments: Plenary talk presented CIPANP2018. 11 pages, 3 figures

Report number: CIPANP2018-Nicholson, LLNL-CONF-764382, RBRC-1296, RIKEN-iTHEMS-Report-18, INT-PUB-18-063

arXiv:1810.01609 [pdf, other]

doi 10.1109/SC.2018.00058

Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing

Authors: Evan Berkowitz, M. A. Clark, Arjun Gambhir, Ken McElvain, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas, André Walker-Loud, Chia Cheng Chang, Bálint Joó, Thorsten Kurth, Kostas Orginos

Abstract: The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life p… ▽ More The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life possible), may be understood through numerical solutions to QCD. We directly solve QCD using Lattice Gauge Theory and calculate nuclear observables such as neutron lifetime. We have developed an improved algorithm that exponentially decreases the time-to solution and applied it on the new CORAL supercomputers, Sierra and Summit. We use run-time autotuning to distribute GPU resources, achieving 20% performance at low node count. We also developed optimal application map** through a job manager, which allows CPU and GPU jobs to be interleaved, yielding 15% of peak performance when deployed across large fractions of CORAL. △ Less

Submitted 10 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: 2018 Gordon Bell Finalist: 9 pages, 9 figures; v2: fixed 2 typos and appended acknowledgements

Report number: LLNL-JRNL-749850, RIKEN-iTHEMS-Report-18 ACM Class: C.1.4; D.1.3

Journal ref: Supercomputing 2018, pp. 697-705

arXiv:1805.12130 [pdf, other]

doi 10.1038/s41586-018-0161-8

A percent-level determination of the nucleon axial coupling from Quantum Chromodynamics

Authors: Chia Cheng Chang, Amy Nicholson, Enrico Rinaldi, Evan Berkowitz, Nicolas Garron, David A. Brantley, Henry Monge-Camacho, Christopher J. Monahan, Chris Bouchard, M. A. Clark, Bálint Joó, Thorsten Kurth, Kostas Orginos, Pavlos Vranas, André Walker-Loud

Abstract: The $\textit{axial coupling of the nucleon}$, $g_A$, is the strength of its coupling to the $\textit{weak}$ axial current of the Standard Model of particle physics, in much the same way as the electric charge is the strength of the coupling to the electromagnetic current. This axial coupling dictates the rate at which neutrons decay to protons, the strength of the attractive long-range force betwe… ▽ More The $\textit{axial coupling of the nucleon}$, $g_A$, is the strength of its coupling to the $\textit{weak}$ axial current of the Standard Model of particle physics, in much the same way as the electric charge is the strength of the coupling to the electromagnetic current. This axial coupling dictates the rate at which neutrons decay to protons, the strength of the attractive long-range force between nucleons and other features of nuclear physics. Precision tests of the Standard Model in nuclear environments require a quantitative understanding of nuclear physics rooted in Quantum Chromodynamics, a pillar of the Standard Model. The prominence of $g_A$ makes it a benchmark quantity to determine theoretically - a difficult task because quantum chromodynamics is non-perturbative, precluding known analytical methods. Lattice Quantum Chromodynamics provides a rigorous, non-perturbative definition of quantum chromodynamics that can be implemented numerically. It has been estimated that a precision of two percent would be possible by 2020 if two challenges are overcome: contamination of $g_A$ from excited states must be controlled in the calculations and statistical precision must be improved markedly. Here we report a calculation of $g_A^{QCD} = 1.271\pm0.013$, using an unconventional method inspired by the Feynman-Hellmann theorem that overcomes these challenges. △ Less

Submitted 30 May, 2018; originally announced May 2018.

Comments: Published in Nature. 46 pages total: Main text 4 pages, Extended Data 8 pages, Supplemental 34 pages. Supporting data and code at https://github.com/callat-qcd/project_gA or https://zenodo.org/record/1241374

Report number: BNL-203631-2018-JAAM, INT-PUB-18-021, LLNL-JRNL-747003, RBRC-1283, RIKEN-iTHEMS-Report-18, LTH 1166

Journal ref: Nature 558, 91-94 (2018)

arXiv:1805.02634 [pdf, other]

doi 10.1103/PhysRevLett.121.172501

Heavy physics contributions to neutrinoless double beta decay from QCD

Authors: A. Nicholson, E. Berkowitz, H. Monge-Camacho, D. Brantley, N. Garron, C. C. Chang, E. Rinaldi, M. A. Clark, B. Joo, T. Kurth, B. Tiburzi, P. Vranas, A. Walker-Loud

Abstract: Observation of neutrinoless double beta decay, a lepton number violating process that has been proposed to clarify the nature of neutrino masses, has spawned an enormous world-wide experimental effort. Relating nuclear decay rates to high-energy, beyond the Standard Model (BSM) physics requires detailed knowledge of non-perturbative QCD effects. Using lattice QCD, we compute the necessary matrix e… ▽ More Observation of neutrinoless double beta decay, a lepton number violating process that has been proposed to clarify the nature of neutrino masses, has spawned an enormous world-wide experimental effort. Relating nuclear decay rates to high-energy, beyond the Standard Model (BSM) physics requires detailed knowledge of non-perturbative QCD effects. Using lattice QCD, we compute the necessary matrix elements of short-range operators, which arise due to heavy BSM mediators, that contribute to this decay via the leading order $π^- \to π^+$ exchange diagrams. Utilizing our result and taking advantage of effective field theory methods will allow for model-independent calculations of the relevant two-nucleon decay, which may then be used as input for nuclear many-body calculations of the relevant experimental decays. Contributions from short-range operators may prove to be equally important to, or even more important than, those from long-range Majorana neutrino exchange. △ Less

Submitted 1 November, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

Comments: Published version. Corrected missing term in chiral expansion, added supplemental material, updated references. The Jupyter notebook (DOI:10.5281/zenodo.1243313) accompanying this work can be found on github https://github.com/callat-qcd/project_0vbb

Report number: LLNL-JRNL-751220, RBRC-1266, RIKEN-iTHEMS-Report-18, BNL-209118-2018-JAAM

Journal ref: Phys. Rev. Lett. 121, 172501 (2018)

arXiv:1801.07823 [pdf, other]

doi 10.1103/PhysRevD.97.114513

Multigrid for Staggered Lattice Fermions

Authors: Richard C. Brower, M. A. Clark, Alexei Strelchenko, Evan Weinberg

Abstract: Critical slowing down in Krylov methods for the Dirac operator presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. Here we formulate a multi-grid algorithm for the Kogut-Susskind (or staggered) fermion discretization which has proven difficult relative to Wilson multigrid due to its first-order anti-Hermitian structure. The solution is to… ▽ More Critical slowing down in Krylov methods for the Dirac operator presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. Here we formulate a multi-grid algorithm for the Kogut-Susskind (or staggered) fermion discretization which has proven difficult relative to Wilson multigrid due to its first-order anti-Hermitian structure. The solution is to introduce a novel spectral transformation by the Kähler-Dirac spin structure prior to the Galerkin projection. We present numerical results for the two-dimensional, two-flavor Schwinger model, however, the general formalism is agnostic to dimension and is directly applicable to four-dimensional lattice QCD. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: 48 pages, 37 figures

Journal ref: Phys. Rev. D 97, 114513 (2018)

arXiv:1710.09745 [pdf, other]

doi 10.1016/j.cpc.2018.06.019

Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs

Authors: M. A. Clark, Alexei Strelchenko, Alejandro Vaquero, Mathias Wagner, Evan Weinberg

Abstract: Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and… ▽ More Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster. △ Less

Submitted 7 August, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 15 pages, 14 figures, in press

Report number: FERMILAB-PUB-17-592-CD

Journal ref: Comp. Phys. Comm. 2018

arXiv:1710.09409 [pdf, other]

doi 10.1051/epjconf/201817509006

Performance Portability Strategies for Grid C++ Expression Templates

Authors: Peter A. Boyle, M. A. Clark, Carleton DeTar, Meifeng Lin, Verinder Rana, Alejandro Vaquero Avilés-Casco

Abstract: One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Ju… ▽ More One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)$\times$SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code. △ Less

Submitted 25 October, 2017; originally announced October 2017.

Comments: 8 pages, 4 figures. Talk presented at the 35th International Symposium on Lattice Field Theory, 18-24 June 2017, Granada, Spain

arXiv:1710.06884 [pdf, other]

doi 10.1051/epjconf/201817514023

Multi-Grid Lanczos

Authors: M. A. Clark, Chulwoo Jung, Christoph Lehner

Abstract: We present a Lanczos algorithm utilizing multiple grids that reduces the memory requirements both on disk and in working memory by one order of magnitude for RBC/UKQCD's 48I and 64I ensembles at the physical pion mass. The precision of the resulting eigenvectors is on par with exact deflation. We present a Lanczos algorithm utilizing multiple grids that reduces the memory requirements both on disk and in working memory by one order of magnitude for RBC/UKQCD's 48I and 64I ensembles at the physical pion mass. The precision of the resulting eigenvectors is on par with exact deflation. △ Less

Submitted 18 October, 2017; originally announced October 2017.

Comments: 6 pages, 7 figures; Talk given at the 35th International Symposium on Lattice Field Theory, 18-24 June 2017, Granada, Spain

arXiv:1710.06523 [pdf, other]

Nucleon axial coupling from Lattice QCD

Authors: Chia Cheng Chang, Amy Nicholson, Enrico Rinaldi, Evan Berkowitz, Nicolas Garron, David Brantley, Henry Monge-Camacho, Chris Monahan, Chris Bouchard, M. A. Clark, Balint Joo, Thorsten Kurth, Kostas Orginos, Pavlos Vranas, Andre Walker-Loud

Abstract: We present state-of-the-art results from a lattice QCD calculation of the nucleon axial coupling, $g_A$, using Möbius Domain-Wall fermions solved on the dynamical $N_f = 2 + 1 + 1$ HISQ ensembles after they are smeared using the gradient-flow algorithm. Relevant three-point correlation functions are calculated using a method inspired by the Feynman-Hellmann theorem, and demonstrate significant imp… ▽ More We present state-of-the-art results from a lattice QCD calculation of the nucleon axial coupling, $g_A$, using Möbius Domain-Wall fermions solved on the dynamical $N_f = 2 + 1 + 1$ HISQ ensembles after they are smeared using the gradient-flow algorithm. Relevant three-point correlation functions are calculated using a method inspired by the Feynman-Hellmann theorem, and demonstrate significant improvement in signal for fixed stochastic samples. The calculation is performed at five pion masses of $m_π\sim \{400, 350, 310, 220, 130\}$~MeV, three lattice spacings of $a\sim\{0.15, 0.12, 0.09\}$~fm, and we do a dedicated volume study with $m_πL\sim\{3.22, 4.29, 5.36\}$. Control over all relevant sources of systematic uncertainty are demonstrated and quantified. We achieve a preliminary value of $g_A = 1.285(17)$, with a relative uncertainty of 1.33\%. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Comments: 18 pages, 8 figures, Lattice 2017 Proceedings

Report number: INT-PUB-18-022

arXiv:1710.05642 [pdf, other]

doi 10.1051/epjconf/201817505029

Calm Multi-Baryon Operators

Authors: Evan Berkowitz, Amy Nicholson, Chia Cheng Chang, Enrico Rinaldi, M. A. Clark, Bálint Joó, Thorsten Kurth, Pavlos Vranas, André Walker-Loud

Abstract: Outstanding problems in nuclear physics require input and guidance from lattice QCD calculations of few baryons systems. However, these calculations suffer from an exponentially bad signal-to-noise problem which has prevented a controlled extrapolation to the physical point. The variational method has been applied very successfully to two-meson systems, allowing for the extraction of the two-meson… ▽ More Outstanding problems in nuclear physics require input and guidance from lattice QCD calculations of few baryons systems. However, these calculations suffer from an exponentially bad signal-to-noise problem which has prevented a controlled extrapolation to the physical point. The variational method has been applied very successfully to two-meson systems, allowing for the extraction of the two-meson states very early in Euclidean time through the use of improved single hadron operators. The sheer numerical cost of using the same techniques in two-baryon systems has been prohibitive. We present an alternate strategy which offers some of the same advantages as the variational method while being significantly less numerically expensive. We first use the Matrix Prony method to form an optimal linear combination of single baryon interpolating fields generated from the same source and different sink interpolators. Very early in Euclidean time this linear combination is numerically free of excited state contamination, so we coin it a calm baryon. This calm baryon operator is then used in the construction of the two-baryon correlation functions. To test this method, we perform calculations on the WM/JLab iso-clover gauge configurations at the SU(3) flavor symmetric point with mπ $\sim$ 800 MeV --- the same configurations we have previously used for the calculation of two-nucleon correlation functions. We observe the calm baryon removes the excited state contamination from the two-nucleon correlation function to as early a time as the single-nucleon is improved, provided non-local (displaced nucleon) sources are used. For the local two-nucleon correlation function (where both nucleons are created from the same space-time location) there is still improvement, but there is significant excited state contamination in the region the single calm baryon displays no excited state contamination. △ Less

Submitted 16 October, 2017; originally announced October 2017.

Comments: 8 pages, 3 figures, proceedings for LATTICE 2017

arXiv:1704.01114 [pdf, other]

An accurate calculation of the nucleon axial charge with lattice QCD

Authors: Evan Berkowitz, David Brantley, Chris Bouchard, Chia Cheng Chang, M. A. Clark, Nicholas Garron, Balint Joo, Thorsten Kurth, Chris Monahan, Henry Monge-Camacho, Amy Nicholson, Kostas Orginos, Enrico Rinaldi, Pavlos Vranas, Andre Walker-Loud

Abstract: We report on a lattice QCD calculation of the nucleon axial charge, $g_A$, using Möbius Domain-Wall fermions solved on the dynamical $N_f=2+1+1$ HISQ ensembles after they are smeared using the gradient-flow algorithm. The calculation is performed with three pion masses, $m_π\sim\{310,220,130\}$ MeV. Three lattice spacings ($a\sim\{0.15,0.12,0.09\}$ fm) are used with the heaviest pion mass, while t… ▽ More We report on a lattice QCD calculation of the nucleon axial charge, $g_A$, using Möbius Domain-Wall fermions solved on the dynamical $N_f=2+1+1$ HISQ ensembles after they are smeared using the gradient-flow algorithm. The calculation is performed with three pion masses, $m_π\sim\{310,220,130\}$ MeV. Three lattice spacings ($a\sim\{0.15,0.12,0.09\}$ fm) are used with the heaviest pion mass, while the coarsest two spacings are used on the middle pion mass and only the coarsest spacing is used with the near physical pion mass. On the $m_π\sim220$ MeV, $a\sim0.12$ fm point, a dedicated volume study is performed with $m_πL \sim \{3.22,4.29,5.36\}$. Using a new strategy motivated by the Feynman-Hellmann Theorem, we achieve a precise determination of $g_A$ with relatively low statistics, and demonstrable control over the excited state, continuum, infinite volume and chiral extrapolation systematic uncertainties, the latter of which remains the dominant uncertainty. Our final determination at 2.6\% total uncertainty is $g_A = 1.278(21)(26)$, with the first uncertainty including statistical and systematic uncertainties from fitting and the second including model selection systematics related to the chiral and continuum extrapolation. The largest reduction of the second uncertainty will come from a greater number of pion mass points as well as more precise lattice QCD results near the physical pion mass. △ Less

Submitted 4 April, 2017; originally announced April 2017.

Comments: 17 pages + 11 pages of references and appendices. 15 figures. Interested readers can download the Python analysis scripts and an hdf5 data file at https://github.com/callat-qcd/project_gA_v0

arXiv:1701.07559 [pdf, other]

doi 10.1103/PhysRevD.96.054513

Möbius domain-wall fermions on gradient-flowed dynamical HISQ ensembles

Authors: Evan Berkowitz, Chris Bouchard, Chia Cheng Chang, M. A. Clark, Balint Joo, Thorsten Kurth, Christopher Monahan, Amy Nicholson, Kostas Orginos, Enrico Rinaldi, Pavlos Vranas, Andre Walker-Loud

Abstract: We report on salient features of a mixed lattice QCD action using valence Möbius domain-wall fermions solved on the dynamical $N_f=2+1+1$ HISQ ensembles generated by the MILC Collaboration. The approximate chiral symmetry properties of the valence fermions are shown to be significantly improved by utilizing the gradient-flow scheme to first smear the HISQ configurations. The greater numerical cost… ▽ More We report on salient features of a mixed lattice QCD action using valence Möbius domain-wall fermions solved on the dynamical $N_f=2+1+1$ HISQ ensembles generated by the MILC Collaboration. The approximate chiral symmetry properties of the valence fermions are shown to be significantly improved by utilizing the gradient-flow scheme to first smear the HISQ configurations. The greater numerical cost of the Möbius domain-wall inversions is mitigated by the highly efficient QUDA library optimized for NVIDIA GPU accelerated compute nodes. We have created an interface to this optimized QUDA solver in Chroma. We provide tuned parameters of the action and performance of QUDA using ensembles with the lattice spacings $a \simeq \{0.15, 0.12, 0.09\}$ fm and pion masses $m_π\simeq \{310, 220,130\}$ MeV. We have additionally generated two new ensembles with $a\sim0.12$ fm and $m_π\sim\{400, 350\}$ MeV. With a fixed flow-time of $t_{gf}=1$ in lattice units, the residual chiral symmetry breaking of the valence fermions is kept below 10\% of the light quark mass on all ensembles, $m_{res} \lesssim 0.1\times m_l$, with moderate values of the fifth dimension $L_5$ and a domain-wall height $M_5 \leq 1.3$. As a benchmark calculation, we perform a continuum, infinite volume, physical pion and kaon mass extrapolation of $F_{K^\pm}/F_{π^\pm}$ and demonstrate our results are independent of flow-time, and consistent with the FLAG determination of this quantity at the level of less than one standard deviation. △ Less

Submitted 21 September, 2017; v1 submitted 25 January, 2017; originally announced January 2017.

Comments: 14 pages + refs, 5 figures; v3 is version accepter for publication; v2 includes continuum chiral extrapolation analysis of FK/Fpi and details of two new HISQ ensembles generated

Report number: LLNL-JRNL-719521, RBRC-1227

Journal ref: Phys. Rev. D 96, 054513 (2017)

arXiv:1612.07873 [pdf, other]

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Authors: M. A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Gambhir, Richard Brower

Abstract: The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of p… ▽ More The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate map** of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings. △ Less

Submitted 22 December, 2016; originally announced December 2016.

Comments: http://dl.acm.org/citation.cfm?id=3014904.3014995}

Journal ref: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16), Article 68 (November, 2016)

arXiv:1608.04793 [pdf, other]

Neutrinoless double beta decay from lattice QCD

Authors: Amy Nicholson, Evan Berkowitz, Chia Cheng Chang, M. A. Clark, Balint Joo, Thorsten Kurth, Enrico Rinaldi, Brian Tiburzi, Pavlos Vranas, Andre Walker-Loud

Abstract: While the discovery of non-zero neutrino masses is one of the most important accomplishments by physicists in the past century, it is still unknown how and in what form these masses arise. Lepton number-violating neutrinoless double beta decay is a natural consequence of Majorana neutrinos and many BSM theories, and many experimental efforts are involved in the search for these processes. Understa… ▽ More While the discovery of non-zero neutrino masses is one of the most important accomplishments by physicists in the past century, it is still unknown how and in what form these masses arise. Lepton number-violating neutrinoless double beta decay is a natural consequence of Majorana neutrinos and many BSM theories, and many experimental efforts are involved in the search for these processes. Understanding how neutrinoless double beta decay would manifest in nuclear environments is key for understanding any observed signals. In these proceedings we present an overview of a set of one- and two-body matrix elements relevant for experimental searches for neutrinoless double beta decay, describe the role of lattice QCD calculations, and present preliminary lattice QCD results. △ Less

Submitted 16 August, 2016; originally announced August 2016.

Comments: Plenary talk given at the 34th International Symposium on Lattice Field Theory, Southampton, UK, 24-30 July 2016

Report number: LLNL-PROC-700398

arXiv:1408.5925 [pdf, other]

doi 10.1109/IPDPS.2014.112

A Framework for Lattice QCD Calculations on GPUs

Authors: F. T. Winter, M. A. Clark, R. G. Edwards, B. Joó

Abstract: Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typic… ▽ More Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole application in one swing to a different platform. The QDP-JIT/PTX library, the reimplementation of the low-level layer, provides a framework for lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler (part of the NVIDIA Linux kernel driver) which translates an assembly-like language (PTX) to GPU code. The expression template technique is used to build PTX code generators and a software cache manages the GPU memory. This reimplementation allows us to deploy an efficient implementation of the full gauge-generation program with dynamical fermions on large-scale GPU-based machines such as Titan and Blue Waters which accelerates the algorithm by more than an order of magnitude. △ Less

Submitted 25 August, 2014; originally announced August 2014.

Comments: 10 pages, 6 figures, as published in the proceedings of IPDPS '14

arXiv:1210.6600 [pdf, ps, other]

doi 10.1103/PhysRevD.87.034511

Shadow Hamiltonians, Poisson Brackets, and Gauge Theories

Authors: A. D. Kennedy, P. J. Silva, M. A. Clark

Abstract: Numerical lattice gauge theory computations to generate gauge field configurations including the effects of dynamical fermions are usually carried out using algorithms that require the molecular dynamics evolution of gauge fields using symplectic integrators. Sophisticated integrators are in common use but are hard to optimise, and force-gradient integrators show promise especially for large latti… ▽ More Numerical lattice gauge theory computations to generate gauge field configurations including the effects of dynamical fermions are usually carried out using algorithms that require the molecular dynamics evolution of gauge fields using symplectic integrators. Sophisticated integrators are in common use but are hard to optimise, and force-gradient integrators show promise especially for large lattice volumes. We explain why symplectic integrators lead to very efficient Monte Carlo algorithms because they exactly conserve a shadow Hamiltonian. The shadow Hamiltonian may be expanded in terms of Poisson brackets, and can be used to optimize the integrators. We show how this may be done for gauge theories by extending the formulation of Hamiltonian mechanics on Lie groups to include Poisson brackets and shadows, and by giving a general method for the practical computation of forces, force-gradients, and Poisson brackets for gauge theories. △ Less

Submitted 24 October, 2012; originally announced October 2012.

arXiv:1205.2933 [pdf, other]

Multigrid Algorithms for Domain-Wall Fermions

Authors: Saul D. Cohen, R. C. Brower, M. A. Clark, J. C. Osborn

Abstract: We describe an adaptive multigrid algorithm for solving inverses of the domain-wall fermion operator. Our multigrid algorithm uses an adaptive projection of near-null vectors of the domain-wall operator onto coarser four-dimensional lattices. This extension of multigrid techniques to a chiral fermion action will greatly reduce overall computation cost, and the elimination of the fifth dimension in… ▽ More We describe an adaptive multigrid algorithm for solving inverses of the domain-wall fermion operator. Our multigrid algorithm uses an adaptive projection of near-null vectors of the domain-wall operator onto coarser four-dimensional lattices. This extension of multigrid techniques to a chiral fermion action will greatly reduce overall computation cost, and the elimination of the fifth dimension in the coarse space reduces the relative cost of using chiral fermions compared to discarding this symmetry. We demonstrate near-elimination of critical slowing as the quark mass is reduced and small volume dependence, which may be suppressed by taking advantage of the recursive nature of the algorithm. △ Less

Submitted 13 May, 2012; originally announced May 2012.

Comments: 7 pages, 3 figures. Proceedings of the XXIX International Symposium on Lattice Field Theory - Lattice 2011, July 10-16, 2011, Squaw Valley, Lake Tahoe, California

Journal ref: PoS LATTICE2011, 030 (2011)

arXiv:1201.3977 [pdf, other]

doi 10.1103/PhysRevD.85.074505

WW Scattering Parameters via Pseudoscalar Phase Shifts

Authors: Thomas Appelquist, Ron Babich, Richard C. Brower, Michael I. Buchoff, Michael Cheng, Michael A. Clark, Saul D. Cohen, George T. Fleming, Joe Kiskis, Meifeng Lin, Ethan T. Neil, James C. Osborn, Claudio Rebbi, David Schaich, Sergey Syritsyn, Gennady Voronov, Pavlos Vranas, Joseph Wasem

Abstract: Using domain-wall lattice simulations, we study pseudoscalar-pseudoscalar scattering in the maximal isospin channel for an SU(3) gauge theory with two and six fermion flavors in the fundamental representation. This calculation of the S-wave scattering length is related to the next-to-leading order corrections to WW scattering through the low-energy coefficients of the chiral Lagrangian. While two… ▽ More Using domain-wall lattice simulations, we study pseudoscalar-pseudoscalar scattering in the maximal isospin channel for an SU(3) gauge theory with two and six fermion flavors in the fundamental representation. This calculation of the S-wave scattering length is related to the next-to-leading order corrections to WW scattering through the low-energy coefficients of the chiral Lagrangian. While two and six flavor scattering lengths are similar for a fixed ratio of the pseudoscalar mass to its decay constant, six-flavor scattering shows a somewhat less repulsive next-to-leading order interaction than its two-flavor counterpart. Estimates are made for the WW scattering parameters and the plausibility of detection is discussed. △ Less

Submitted 19 January, 2012; originally announced January 2012.

Comments: 8 pages, 6 figures

Report number: LLNL-JRNL-499587; FERMILAB-PUB-12-012-T

arXiv:1109.2935 [pdf, other]

doi 10.1145/2063384.2063478

Scaling Lattice QCD beyond 100 GPUs

Authors: R. Babich, M. A. Clark, B. Joó, G. Shi, R. C. Brower, S. Gottlieb

Abstract: Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo "gauge… ▽ More Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo "gauge field generation" phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results for two popular discretizations of the Dirac operator, Wilson-clover and improved staggered, employing up to 256 GPUs on the Edge cluster at Lawrence Livermore National Laboratory. △ Less

Submitted 13 September, 2011; originally announced September 2011.

Comments: 11 pages, 10 figures, to appear in the proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11)

arXiv:1108.1828 [pdf, ps, other]

doi 10.1103/PhysRevD.84.071502

Improving dynamical lattice QCD simulations through integrator tuning using Poisson brackets and a force-gradient integrator

Authors: M. A. Clark, Bálint Joó, A. D. Kennedy, P. J. Silva

Abstract: We show how the integrators used for the molecular dynamics step of the Hybrid Monte Carlo algorithm can be further improved. These integrators not only approximately conserve some Hamiltonian $H$ but conserve exactly a nearby shadow Hamiltonian $\tilde{H}$. This property allows for a new tuning method of the molecular dynamics integrator and also allows for a new class of integrators (force-gradi… ▽ More We show how the integrators used for the molecular dynamics step of the Hybrid Monte Carlo algorithm can be further improved. These integrators not only approximately conserve some Hamiltonian $H$ but conserve exactly a nearby shadow Hamiltonian $\tilde{H}$. This property allows for a new tuning method of the molecular dynamics integrator and also allows for a new class of integrators (force-gradient integrators) which is expected to reduce significantly the computational cost of future large-scale gauge field ensemble generation. △ Less

Submitted 20 September, 2011; v1 submitted 8 August, 2011; originally announced August 2011.

Comments: 5 pages, 1 figure; minor changes

arXiv:1012.0562 [pdf, ps, other]

doi 10.1103/PhysRevD.85.054510

Exploring strange nucleon form factors on the lattice

Authors: Ronald Babich, Richard C. Brower, Michael A. Clark, George T. Fleming, James C. Osborn, Claudio Rebbi, David Schaich

Abstract: We discuss techniques for evaluating sea quark contributions to hadronic form factors on the lattice and apply these to an exploratory calculation of the strange electromagnetic, axial, and scalar form factors of the nucleon. We employ the Wilson gauge and fermion actions on an anisotropic 24^3 x 64 lattice, probing a range of momentum transfer with Q^2 < 1 GeV^2. The strange electric and magnetic… ▽ More We discuss techniques for evaluating sea quark contributions to hadronic form factors on the lattice and apply these to an exploratory calculation of the strange electromagnetic, axial, and scalar form factors of the nucleon. We employ the Wilson gauge and fermion actions on an anisotropic 24^3 x 64 lattice, probing a range of momentum transfer with Q^2 < 1 GeV^2. The strange electric and magnetic form factors, G_E^s(Q^2) and G_M^s(Q^2), are found to be small and consistent with zero within the statistics of our calculation. The lattice data favor a small negative value for the strange axial form factor G_A^s(Q^2) and exhibit a strong signal for the bare strange scalar matrix element <N|ss|N>_0. We discuss the unique systematic uncertainties affecting the latter quantity relative to the continuum, as well as prospects for improving future determinations with Wilson-like fermions. △ Less

Submitted 22 May, 2012; v1 submitted 2 December, 2010; originally announced December 2010.

Comments: 19 pages, 11 figures; v2 includes additional references; v3 as appears in PRD

Journal ref: Phys. Rev. D 85, 054510 (2012)

arXiv:1011.2775 [pdf, ps, other]

Multigrid solver for clover fermions

Authors: J. C. Osborn, R. Babich, J. Brannick, R. C. Brower, M. A. Clark, S. D. Cohen, C. Rebbi

Abstract: We present an adaptive multigrid Dirac solver developed for Wilson clover fermions which offers order-of-magnitude reductions in solution time compared to conventional Krylov solvers. The solver incorporates even-odd preconditioning and mixed precision to solve the Dirac equation to double precision accuracy and shows only a mild increase in time to solution for decreasing quark mass. We show actu… ▽ More We present an adaptive multigrid Dirac solver developed for Wilson clover fermions which offers order-of-magnitude reductions in solution time compared to conventional Krylov solvers. The solver incorporates even-odd preconditioning and mixed precision to solve the Dirac equation to double precision accuracy and shows only a mild increase in time to solution for decreasing quark mass. We show actual time to solution on production lattices in comparison to conventional Krylov solvers and will also discuss the setup process and its relative cost to the total solution time. △ Less

Submitted 11 November, 2010; originally announced November 2010.

Comments: 7 pages, 8 figures, talk presented at the XXVIII International Symposium on Lattice Field Theory, June 14-19 2010, Villasimius, Italy

Journal ref: PoS Lattice2010:037,2010

arXiv:1011.0230 [pdf, ps, other]

Better HMC integrators for dynamical simulations

Authors: M. A. Clark, Balint Joo, A. D. Kennedy, P. J. Silva

Abstract: We show how to improve the molecular dynamics step of Hybrid Monte Carlo, both by tuning the integrator using Poisson brackets measurements and by the use of force gradient integrators. We present results for moderate lattice sizes. We show how to improve the molecular dynamics step of Hybrid Monte Carlo, both by tuning the integrator using Poisson brackets measurements and by the use of force gradient integrators. We present results for moderate lattice sizes. △ Less

Submitted 31 October, 2010; originally announced November 2010.

Comments: 6 pages, 1 figure, poster presented at Lattice 2010 (Algorithms and Machines)

Journal ref: PoS Lattice2010:323,2010

arXiv:1011.0024 [pdf, other]

doi 10.1109/SC.2010.40

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Authors: Ronald Babich, Michael A. Clark, Bálint Joó

Abstract: Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to… ▽ More Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlap** of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops. △ Less

Submitted 29 October, 2010; originally announced November 2010.

Comments: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing 2010 (submitted April 12, 2010)

arXiv:1009.5967 [pdf, other]

doi 10.1103/PhysRevLett.106.231601

Parity Doubling and the S Parameter Below the Conformal Window

Authors: Thomas Appelquist, Ron Babich, Richard C. Brower, Michael Cheng, Michael A. Clark, Saul D. Cohen, George T. Fleming, Joe Kiskis, Meifeng Lin, Ethan T. Neil, James C. Osborn, Claudio Rebbi, David Schaich, Pavlos Vranas

Abstract: We describe a lattice simulation of the masses and decay constants of the lowest-lying vector and axial resonances, and the electroweak S parameter, in an SU(3) gauge theory with $N_f = 2$ and 6 fermions in the fundamental representation. The spectrum becomes more parity doubled and the S parameter per electroweak doublet decreases when $N_f$ is increased from 2 to 6, motivating study of these tre… ▽ More We describe a lattice simulation of the masses and decay constants of the lowest-lying vector and axial resonances, and the electroweak S parameter, in an SU(3) gauge theory with $N_f = 2$ and 6 fermions in the fundamental representation. The spectrum becomes more parity doubled and the S parameter per electroweak doublet decreases when $N_f$ is increased from 2 to 6, motivating study of these trends as $N_f$ is increased further, toward the critical value for transition from confinement to infrared conformality. △ Less

Submitted 29 September, 2010; originally announced September 2010.

Comments: 4 pages, 5 figures; to be submitted to PRL

Journal ref: Phys.Rev.Lett.106:231601,2011

arXiv:1005.3043 [pdf, ps, other]

doi 10.1103/PhysRevLett.105.201602

Adaptive multigrid algorithm for the lattice Wilson-Dirac operator

Authors: R. Babich, J. Brannick, R. C. Brower, M. A. Clark, T. A. Manteuffel, S. F. McCormick, J. C. Osborn, C. Rebbi

Abstract: We present an adaptive multigrid solver for application to the non-Hermitian Wilson-Dirac system of QCD. The key components leading to the success of our proposed algorithm are the use of an adaptive projection onto coarse grids that preserves the near null space of the system matrix together with a simplified form of the correction based on the so-called gamma_5-Hermitian symmetry of the Dirac op… ▽ More We present an adaptive multigrid solver for application to the non-Hermitian Wilson-Dirac system of QCD. The key components leading to the success of our proposed algorithm are the use of an adaptive projection onto coarse grids that preserves the near null space of the system matrix together with a simplified form of the correction based on the so-called gamma_5-Hermitian symmetry of the Dirac operator. We demonstrate that the algorithm nearly eliminates critical slowing down in the chiral limit and that it has weak dependence on the lattice volume. △ Less

Submitted 21 June, 2010; v1 submitted 17 May, 2010; originally announced May 2010.

Journal ref: Phys.Rev.Lett.105:201602,2010

arXiv:1002.3777 [pdf, other]

Lattice study of ChPT beyond QCD

Authors: Ethan T. Neil, Adam Avakian, Ron Babich, Richard C. Brower, Michael Cheng, Michael A. Clark, Saul D. Cohen, George T. Fleming, Joseph Kiskis, James C. Osborn, Claudio Rebbi, David Schaich, Pavlos Vranas

Abstract: We describe initial results by the Lattice Strong Dynamics (LSD) collaboration of a study into the variation of chiral properties of chiral properties of SU(3) Yang-Mills gauge theory as the number of massless flavors changes from $N_f = 2$ to $N_f = 6$, with a focus on the use of chiral perturbation theory. We describe initial results by the Lattice Strong Dynamics (LSD) collaboration of a study into the variation of chiral properties of chiral properties of SU(3) Yang-Mills gauge theory as the number of massless flavors changes from $N_f = 2$ to $N_f = 6$, with a focus on the use of chiral perturbation theory. △ Less

Submitted 3 May, 2010; v1 submitted 19 February, 2010; originally announced February 2010.

Comments: 9 pages, 3 figures. Presented at the 6th International Workshop on Chiral Dynamics, University of Bern, Switzerland, July 6-10 2009

Journal ref: PoS CD09:088,2009

arXiv:0912.2268 [pdf, other]

QCD on GPUs: cost effective supercomputing

Authors: M. A. Clark

Abstract: The exponential growth of floating point power in graphics processing units (GPUs), together with their low cost, has given rise to an attractive platform upon which to deploy lattice QCD calculations. GPUs are essentially many (O(100)) core chips, that are programmed using a massively threaded environment, and so are representative of the future of high performance computing (HPC). The large ra… ▽ More The exponential growth of floating point power in graphics processing units (GPUs), together with their low cost, has given rise to an attractive platform upon which to deploy lattice QCD calculations. GPUs are essentially many (O(100)) core chips, that are programmed using a massively threaded environment, and so are representative of the future of high performance computing (HPC). The large ratio of raw floating point operations per second to memory bandwidth that is characteristic of GPUs necessitates that unique algorithmic design choices are made to harness their full potential. We review the progress to date in using GPUs for large scale calculations, and contrast GPUs against more traditional HPC architectures △ Less

Submitted 20 December, 2009; v1 submitted 11 December, 2009; originally announced December 2009.

Comments: 14 pages, 5 figures, Lattice 2009 plenary talk

Journal ref: PoS LAT2009:003,2009

arXiv:0912.2186 [pdf, other]

The role of multigrid algorithms for LQCD

Authors: Ronald Babich, James Brannick, Richard C. Brower, Michael A. Clark, Saul D. Cohen, James C. Osborn, Claudio Rebbi

Abstract: We report on the first successful QCD multigrid algorithm which demonstrates constant convergence rates independent of quark mass and lattice volume for the Wilson Dirac operator. The new ingredient is the adaptive method for constructing the near null space on which the coarse grid multigrid Dirac operator acts. In addition we speculate on future prospects for extending this algorithm to the Do… ▽ More We report on the first successful QCD multigrid algorithm which demonstrates constant convergence rates independent of quark mass and lattice volume for the Wilson Dirac operator. The new ingredient is the adaptive method for constructing the near null space on which the coarse grid multigrid Dirac operator acts. In addition we speculate on future prospects for extending this algorithm to the Domain Wall and Staggered discretizations, its exceptional suitability for high performance GPU code and its potential impact on simulations at the physical pion mass. △ Less

Submitted 11 December, 2009; originally announced December 2009.

Comments: 7 pages, 5 figures, Presented at the XXVII International Symposium on Lattice Field Theory, July 26-31, 2009, Peking University, Bei**g, China

Journal ref: PoS LAT2009:031,2009

arXiv:0911.3191 [pdf, ps, other]

doi 10.1016/j.cpc.2010.05.002

Solving Lattice QCD systems of equations using mixed precision solvers on GPUs

Authors: M. A. Clark, R. Babich, K. Barros, R. C. Brower, C. Rebbi

Abstract: Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we… ▽ More Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision. △ Less

Submitted 21 December, 2009; v1 submitted 16 November, 2009; originally announced November 2009.

Comments: 30 pages, 7 figures

Journal ref: Comput.Phys.Commun.181:1517-1528,2010

arXiv:0910.2950 [pdf, ps, other]

Force Gradient Integrators

Authors: A. D. Kennedy, M. A. Clark, P. J. Silva

Abstract: We present initial results of the use of Force Gradient integrators for lattice field theories. These promise to give significant performance improvements, especially for light fermions and large lattices. Our results show that this is indeed the case, indicating a speed-up of more than a factor of two, which is expected to increase as the integration step size becomes smaller for larger lattice… ▽ More We present initial results of the use of Force Gradient integrators for lattice field theories. These promise to give significant performance improvements, especially for light fermions and large lattices. Our results show that this is indeed the case, indicating a speed-up of more than a factor of two, which is expected to increase as the integration step size becomes smaller for larger lattices and smaller fermion masses. △ Less

Submitted 15 October, 2009; originally announced October 2009.

Comments: 6 pages, 2 figures, talk presented at Lattice 2009 (Algorithms and Machines)

Journal ref: PoS LAT2009:021,2009

arXiv:0910.2224 [pdf, ps, other]

doi 10.1103/PhysRevLett.104.071601

Toward TeV Conformality

Authors: Thomas Appelquist, Adam Avakian, Ron Babich, Richard C. Brower, Michael Cheng, Michael A. Clark, Saul D. Cohen, George T. Fleming, Joseph Kiskis, Ethan T. Neil, James C. Osborn, Claudio Rebbi, David Schaich, Pavlos Vranas

Abstract: We study the chiral condensate $<\barψ ψ>$ for an SU(3) gauge theory with $N_f$ massless Dirac fermions in the fundamental representation when $N_f$ is increased from 2 to 6. For $N_f=2$, our lattice simulations of $<\barψ ψ>/F^3$, where $F$ is the Nambu-Goldstone-boson decay constant, agree with the measured QCD value. For $N_f = 6$, this ratio shows significant enhancement, presaging an even l… ▽ More We study the chiral condensate $<\barψ ψ>$ for an SU(3) gauge theory with $N_f$ massless Dirac fermions in the fundamental representation when $N_f$ is increased from 2 to 6. For $N_f=2$, our lattice simulations of $<\barψ ψ>/F^3$, where $F$ is the Nambu-Goldstone-boson decay constant, agree with the measured QCD value. For $N_f = 6$, this ratio shows significant enhancement, presaging an even larger enhancement anticipated as $N_f$ increases further, toward the critical value for transition from confinement to infrared conformality. △ Less

Submitted 19 February, 2010; v1 submitted 12 October, 2009; originally announced October 2009.

Comments: 4 pages, 4 figures. v2: revised version for PRL

Journal ref: Phys.Rev.Lett.104:071601,2010

arXiv:0811.4331 [pdf, ps, other]

The removal of critical slowing down

Authors: M. A. Clark, J. Brannick, R. C. Brower, S. F. McCormick, T. A. Manteuffel, J. C. Osborn, C. Rebbi

Abstract: We present promising initial results of our adaptive multigrid solver developed for application directly to the non-Hermitian Wilson-Dirac system in 4 dimensions, as opposed to the solver developed in [1] for the corresponding normal equations. The key behind the success of this algorithm is the use of an adaptive projection onto coarse grids that preserves the near null space of the system matr… ▽ More We present promising initial results of our adaptive multigrid solver developed for application directly to the non-Hermitian Wilson-Dirac system in 4 dimensions, as opposed to the solver developed in [1] for the corresponding normal equations. The key behind the success of this algorithm is the use of an adaptive projection onto coarse grids that preserves the near null space of the system matrix. We demonstrate that the resulting algorithm has weak dependence on the gauge coupling and exhibits extremely mild critical slowing down in the chiral limit. △ Less

Submitted 1 April, 2010; v1 submitted 26 November, 2008; originally announced November 2008.

Comments: 7 pages, talk given at the XXVI International Symposium on Lattice Field Theory, July 14-19, 2008, Williamsburg, VA, USA

Journal ref: PoS LATTICE2008:035,2008

arXiv:0810.5365 [pdf, ps, other]

Blasting through lattice calculations using CUDA

Authors: Kipton Barros, Ronald Babich, Richard Brower, Michael A. Clark, Claudio Rebbi

Abstract: Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia's CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA plat… ▽ More Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia's CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA platform we have implemented a Wilson-Dirac operator which runs at an effective 68 Gflops on the Tesla C870. The recently released GeForce GTX 280 runs this same code at 92 Gflops, and we expect further improvement pending code optimization. △ Less

Submitted 29 October, 2008; originally announced October 2008.

Comments: 7 pages, 3 figures, presented at the XXVI International Symposium on Lattice Field Theory (Lattice 2008), Williamsburg, Virginia, July 14-19, 2008

Journal ref: PoS LATTICE2008:045,2008

Showing 1–50 of 71 results for author: Clark, M A