-
Subspace recursive Fermi-operator expansion strategies for large-scale DFT eigenvalue problems on HPC architectures
Authors:
Sameer Khadatkar,
Phani Motamarri
Abstract:
Quantum mechanical calculations for material modelling using Kohn-Sham density functional theory (DFT) involve the solution of a nonlinear eigenvalue problem for $N$ smallest eigenvector-eigenvalue pairs with $N$ proportional to the number of electrons in the material system. These calculations are computationally demanding and have asymptotic cubic scaling complexity with the number of electrons.…
▽ More
Quantum mechanical calculations for material modelling using Kohn-Sham density functional theory (DFT) involve the solution of a nonlinear eigenvalue problem for $N$ smallest eigenvector-eigenvalue pairs with $N$ proportional to the number of electrons in the material system. These calculations are computationally demanding and have asymptotic cubic scaling complexity with the number of electrons. Large-scale matrix eigenvalue problems arising from the discretization of the Kohn-Sham DFT equations employing a systematically convergent basis traditionally rely on iterative orthogonal projection methods, which are shown to be computationally efficient and scalable on massively parallel computing architectures. However, as the size of the material system increases, these methods are known to incur dominant computational costs through the Rayleigh-Ritz projection step of the discretized Kohn-Sham Hamiltonian matrix and the subsequent subspace diagonalization of the projected matrix. This work explores the potential of polynomial expansion approaches based on recursive Fermi-operator expansion as an alternative to the subspace diagonalization of the projected Hamiltonian matrix to reduce the computational cost. Subsequently, we perform a detailed comparison of various recursive polynomial expansion approaches to the traditional approach of explicit diagonalization on both multi-node CPU and GPU architectures and assess their relative performance in terms of accuracy, computational efficiency, scaling behaviour and energy efficiency.
△ Less
Submitted 24 September, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Roadmap on Electronic Structure Codes in the Exascale Era
Authors:
Vikram Gavini,
Stefano Baroni,
Volker Blum,
David R. Bowler,
Alexander Buccheri,
James R. Chelikowsky,
Sambit Das,
William Dawson,
Pietro Delugas,
Mehmet Dogan,
Claudia Draxl,
Giulia Galli,
Luigi Genovese,
Paolo Giannozzi,
Matteo Giantomassi,
Xavier Gonze,
Marco Govoni,
Andris Gulans,
François Gygi,
John M. Herbert,
Sebastian Kokott,
Thomas D. Kühne,
Kai-Hsin Liou,
Tsuyoshi Miyazaki,
Phani Motamarri
, et al. (16 additional authors not shown)
Abstract:
Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources d…
▽ More
Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Fast hardware-aware matrix-free algorithm for higher-order finite-element discretized matrix multivector products on distributed systems
Authors:
Gourab Panigrahi,
Nikhil Kodali,
Debashis Panda,
Phani Motamarri
Abstract:
Recent hardware-aware matrix-free algorithms for higher-order finite-element (FE) discretized matrix-vector multiplications reduce floating point operations and data access costs compared to traditional sparse matrix approaches. This work proposes efficient matrix-free algorithms for evaluating FE discretized matrix-multivector products on both multi-node CPU and GPU architectures. We address a cr…
▽ More
Recent hardware-aware matrix-free algorithms for higher-order finite-element (FE) discretized matrix-vector multiplications reduce floating point operations and data access costs compared to traditional sparse matrix approaches. This work proposes efficient matrix-free algorithms for evaluating FE discretized matrix-multivector products on both multi-node CPU and GPU architectures. We address a critical gap in existing matrix-free implementations, which are well suited only for the action of FE discretized matrices on a single vector. We employ batched evaluation strategies, with the batchsize tailored to underlying hardware architectures, leading to better data locality and enabling further parallelization. On CPUs, we utilize even-odd decomposition, SIMD vectorization, and overlap** computation and communication strategies. On GPUs, we employ strategies to overlap compute and data movement in conjunction with GPU shared memory, constant memory, and kernel fusion to reduce data accesses. Our implementation outperforms the baselines for Helmholtz operator action, achieving up to 1.4x improvement on one CPU node and up to 2.8x on one GPU node, while reaching up to 4.4x and 1.5x improvement on multiple nodes for CPUs ($\sim 3000$ cores) and GPUs ($\sim$ 25 GPUs), respectively. We further benchmark the performance of the proposed implementation for solving a model eigenvalue problem for 1024 smallest eigenvalue-eigenvector pairs by employing the Chebyshev Filtered Subspace Iteration method, achieving up to 1.5x improvement on one CPU node and up to 2.2x on one GPU node while reaching up to 3.0x and 1.4x improvement on multinode CPUs ($\sim 3000$ cores) and GPUs ($\sim$ 25 GPUs), respectively.
△ Less
Submitted 24 September, 2023; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Chemical bonding in large systems using projected population analysis from real-space density functional theory calculations
Authors:
Kartick Ramakrishnan,
Sai Krishna Kishore Nori,
Seung-Cheol Lee,
Gour P Das,
Satadeep Bhattacharjee,
Phani Motamarri
Abstract:
We present an efficient and scalable computational approach for conducting projected population analysis from real-space finite-element (FE) based Kohn-Sham density functional theory calculations (DFT-FE). This work provides an important direction towards extracting chemical bonding information from large-scale DFT calculations on materials systems involving thousands of atoms while accommodating…
▽ More
We present an efficient and scalable computational approach for conducting projected population analysis from real-space finite-element (FE) based Kohn-Sham density functional theory calculations (DFT-FE). This work provides an important direction towards extracting chemical bonding information from large-scale DFT calculations on materials systems involving thousands of atoms while accommodating periodic, semi-periodic or fully non-periodic boundary conditions. Towards this, we derive the relevant mathematical expressions and develop efficient numerical implementation procedures that are scalable on multi-node CPU architectures to compute the projected overlap and Hamilton populations. The population analysis is accomplished by projecting either the self-consistently converged FE discretized Kohn-Sham orbitals, or the FE discretized Hamiltonian onto a subspace spanned by a localized atom-centred basis set. The proposed methods are implemented in a unified framework within DFT-FE code where the ground-state DFT calculations and the population analysis are performed on the same FE grid. We further benchmark the accuracy and performance of this approach on representative material systems involving periodic and non-periodic DFT calculations with LOBSTER, a widely used projected population analysis code. Finally, we discuss a case study demonstrating the advantages of our scalable approach to extract the quantitative chemical bonding information of hydrogen chemisorbed in large silicon nanoparticles alloyed with carbon, a candidate material for hydrogen storage.
△ Less
Submitted 23 June, 2023; v1 submitted 29 May, 2022;
originally announced May 2022.
-
DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization
Authors:
Sambit Das,
Phani Motamarri,
Vishal Subramanian,
David M. Rogers,
Vikram Gavini
Abstract:
We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ $100,000$ electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions…
▽ More
We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ $100,000$ electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency -- as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our implementation, which yields $\sim 20 \times$ CPU-GPU speed-up by using GPU acceleration on hybrid CPU-GPU nodes. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of $80-140$ seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~ $6,000-15,000$ electrons.
△ Less
Submitted 21 March, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Tensor-structured algorithm for reduced-order scaling large-scale Kohn-Sham density functional theory calculations
Authors:
Chih-Chuen Lin,
Phani Motamarri,
Vikram Gavini
Abstract:
We present a tensor-structured algorithm for efficient large-scale DFT calculations by constructing a Tucker tensor basis that is adapted to the Kohn-Sham Hamiltonian and localized in real-space. The proposed approach uses an additive separable approximation to the Kohn-Sham Hamiltonian and an $L_1$ localization technique to generate the 1-D localized functions that constitute the Tucker tensor ba…
▽ More
We present a tensor-structured algorithm for efficient large-scale DFT calculations by constructing a Tucker tensor basis that is adapted to the Kohn-Sham Hamiltonian and localized in real-space. The proposed approach uses an additive separable approximation to the Kohn-Sham Hamiltonian and an $L_1$ localization technique to generate the 1-D localized functions that constitute the Tucker tensor basis. Numerical results show that the resulting Tucker tensor basis exhibits exponential convergence in the ground-state energy with increasing Tucker rank. Further, the proposed tensor-structured algorithm demonstrated sub-quadratic scaling with system size for both systems with and without a gap, and involving many thousands of atoms. This reduced-order scaling has also resulted in the proposed approach outperforming plane-wave DFT implementation for systems beyond 2,000 electrons.
△ Less
Submitted 9 January, 2021; v1 submitted 24 November, 2020;
originally announced November 2020.
-
DFT-FE -- A massively parallel adaptive finite-element code for large-scale density functional theory calculations
Authors:
Phani Motamarri,
Sambit Das,
Shiva Rudraraju,
Krishnendu Ghosh,
Denis Davydov,
Vikram Gavini
Abstract:
We present an accurate, efficient and massively parallel finite-element code, DFT-FE, for large-scale ab-initio calculations (reaching $\sim 100,000$ electrons) using Kohn-Sham density functional theory (DFT). DFT-FE is based on a local real-space variational formulation of the Kohn-Sham DFT energy functional that is discretized using a higher-order adaptive spectral finite-element (FE) basis, and…
▽ More
We present an accurate, efficient and massively parallel finite-element code, DFT-FE, for large-scale ab-initio calculations (reaching $\sim 100,000$ electrons) using Kohn-Sham density functional theory (DFT). DFT-FE is based on a local real-space variational formulation of the Kohn-Sham DFT energy functional that is discretized using a higher-order adaptive spectral finite-element (FE) basis, and treats pseudopotential and all-electron calculations in the same framework, while accommodating non-periodic, semi-periodic and periodic boundary conditions. We discuss the main aspects of the code, which include, the various strategies of adaptive FE basis generation, and the different approaches employed in the numerical implementation of the solution of the discrete Kohn-Sham problem that are focused on significantly reducing the floating point operations, communication costs and latency. We demonstrate the accuracy of DFT-FE by comparing the energies, ionic forces and periodic cell stresses on a wide range of problems with popularly used DFT codes. Further, we demonstrate that DFT-FE significantly outperforms widely used plane-wave codes---both in CPU-times and wall-times, and on both non-periodic and periodic systems---at systems sizes beyond a few thousand electrons, with over $5-10$ fold speedups in systems with more than 10,000 electrons. The benchmark studies also highlight the excellent parallel scalability of DFT-FE, with strong scaling demonstrated on up to 192,000 MPI tasks.
△ Less
Submitted 4 April, 2019; v1 submitted 26 March, 2019;
originally announced March 2019.
-
Configurational forces in electronic structure calculations using Kohn-Sham density functional theory
Authors:
Phani Motamarri,
Vikram Gavini
Abstract:
We derive the expressions for configurational forces in Kohn-Sham density functional theory, which correspond to the generalized variational force computed as the derivative of the Kohn-Sham energy functional with respect to the position of a material point $\textbf{x}$. These configurational forces that result from the inner variations of the Kohn-Sham energy functional provide a unified framewor…
▽ More
We derive the expressions for configurational forces in Kohn-Sham density functional theory, which correspond to the generalized variational force computed as the derivative of the Kohn-Sham energy functional with respect to the position of a material point $\textbf{x}$. These configurational forces that result from the inner variations of the Kohn-Sham energy functional provide a unified framework to compute atomic forces as well as stress tensor for geometry optimization. Importantly, owing to the variational nature of the formulation, these configurational forces inherently account for the Pulay corrections. The formulation presented in this work treats both pseudopotential and all-electron calculations in single framework, and employs a local variational real-space formulation of Kohn-Sham DFT expressed in terms of the non-orthogonal wavefunctions that is amenable to reduced-order scaling techniques. We demonstrate the accuracy and performance of the proposed configurational force approach on benchmark all-electron and pseudopotential calculations conducted using higher-order finite-element discretization. To this end, we examine the rates of convergence of the finite-element discretization in the computed forces and stresses for various materials systems, and, further, verify the accuracy from finite-differencing the energy. Wherever applicable, we also compare the forces and stresses with those obtained from Kohn-Sham DFT calculations employing plane-wave basis (pseudopotential calculations) and Gaussian basis (all-electron calculations). Finally, we verify the accuracy of the forces on large materials systems involving a metallic aluminum nanocluster containing 666 atoms and an alkane chain containing 902 atoms, where the Kohn-Sham electronic ground state is computed using a reduced-order scaling subspace projection technique (P. Motamarri and V. Gavini, Phys. Rev. B 90, 115127).
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
Spectrum-splitting approach for Fermi-operator expansion in all-electron Kohn-Sham DFT calculations
Authors:
Phani Motamarri,
Vikram Gavini,
Kaushik Bhattacharya,
Michael Ortiz
Abstract:
We present a spectrum-splitting approach to conduct all-electron Kohn-Sham density functional theory (DFT) calculations by employing Fermi-operator expansion of the Kohn-Sham Hamiltonian. The proposed approach splits the subspace containing the occupied eigenspace into a core-subspace, spanned by the core eigenfunctions, and its complement, the valence-subspace, and thereby enables an efficient co…
▽ More
We present a spectrum-splitting approach to conduct all-electron Kohn-Sham density functional theory (DFT) calculations by employing Fermi-operator expansion of the Kohn-Sham Hamiltonian. The proposed approach splits the subspace containing the occupied eigenspace into a core-subspace, spanned by the core eigenfunctions, and its complement, the valence-subspace, and thereby enables an efficient computation of the Fermi-operator expansion by reducing the expansion to the valence-subspace projected Kohn-Sham Hamiltonian. The key ideas used in our approach are: (i) employ Chebyshev filtering to compute a subspace containing the occupied states followed by a localization procedure to generate non-orthogonal localized functions spanning the Chebyshev-filtered subspace; (ii) compute the Kohn-Sham Hamiltonian projected onto the valence-subspace; (iii) employ Fermi-operator expansion in terms of the valence-subspace projected Hamiltonian to compute the density matrix, electron-density and band energy. We demonstrate the accuracy and performance of the method on benchmark materials systems involving silicon nano-clusters up to 1330 electrons, a single gold atom and a six-atom gold nano-cluster. The benchmark studies on silicon nano-clusters revealed a staggering five-fold reduction in the Fermi-operator expansion polynomial degree by using the spectrum-splitting approach for accuracies in the ground-state energies of $\sim 10^{-4} Ha/atom$ with respect to reference calculations. Further, numerical investigations on gold suggest that spectrum splitting is indispensable to achieve meaningful accuracies, while employing Fermi-operator expansion.
△ Less
Submitted 28 August, 2016;
originally announced August 2016.
-
A subquadratic-scaling subspace projection method for large-scale Kohn-Sham density functional theory calculations using spectral finite-element discretization
Authors:
Phani Motamarri,
Vikram Gavini
Abstract:
We present a subspace projection technique to conduct large-scale Kohn-Sham density functional theory calculations using spectral finite-element discretization. The proposed method treats both metallic and insulating materials in a single framework, and is applicable to both pseudopotential as well as all-electron calculations. The key ideas involved in the method include: (i) employing a higher-o…
▽ More
We present a subspace projection technique to conduct large-scale Kohn-Sham density functional theory calculations using spectral finite-element discretization. The proposed method treats both metallic and insulating materials in a single framework, and is applicable to both pseudopotential as well as all-electron calculations. The key ideas involved in the method include: (i) employing a higher-order spectral finite-element basis that is amenable to mesh adaption; (ii) using a Chebyshev filter to construct a subspace which is an approximation to the occupied eigenspace in a given self-consistent field iteration; (iii) using a localization procedure to construct a non-orthogonal localized basis spanning the Chebyshev filtered subspace; (iv) using a Fermi-operator expansion in terms of the subspace-projected Hamiltonian represented in the non-orthogonal localized basis to compute relevant quantities like the density matrix, electron density and band energy. We demonstrate the accuracy and efficiency of the approach on benchmark systems involving pseudopotential calculations on metallic aluminum nano-clusters up to 3430 atoms and on insulating alkane chains up to 7052 atoms, as well as all-electron calculations on silicon nano-clusters up to 3920 electrons. The benchmark studies revealed that accuracies commensurate with chemical accuracy can be obtained, and a subquadratic-scaling with system size was observed for the range of materials systems studied. In particular, for the alkane chains---close to linear-scaling is observed, whereas, for aluminum nano-clusters---the scaling is observed to be $\mathcal{O} (N^{1.46})$. For all-electron calculations on silicon nano-clusters, the scaling with the number of electrons is computed to be $\mathcal{O} (N^{1.75})$. Furthermore, significant computational savings have been realized with the proposed approach with respect to reference calculations.
△ Less
Submitted 22 October, 2015; v1 submitted 10 June, 2014;
originally announced June 2014.
-
Higher-order adaptive finite-element methods for Kohn-Sham density functional theory
Authors:
Phani Motamarri,
Michael R Nowak,
Kenneth Leiter,
Jaroslaw Knap,
Vikram Gavini
Abstract:
We present an efficient computational approach to perform real-space electronic structure calculations using an adaptive higher-order finite-element discretization of Kohn-Sham density-functional theory (DFT). To this end, we develop an a-priori mesh adaption technique to construct a close to optimal finite-element discretization of the problem. We further propose an efficient solution strategy fo…
▽ More
We present an efficient computational approach to perform real-space electronic structure calculations using an adaptive higher-order finite-element discretization of Kohn-Sham density-functional theory (DFT). To this end, we develop an a-priori mesh adaption technique to construct a close to optimal finite-element discretization of the problem. We further propose an efficient solution strategy for solving the discrete eigenvalue problem by using spectral finite-elements in conjunction with Gauss-Lobatto quadrature, and a Chebyshev acceleration technique for computing the occupied eigenspace. The proposed approach has been observed to provide a staggering 100-200 fold computational advantage over the solution of a generalized eigenvalue problem. Using the proposed solution procedure, we investigate the computational efficiency afforded by higher-order finite-element discretization of the Kohn-Sham DFT problem. Our studies suggest that staggering computational savings of the order of 1000 fold relative to linear finite-elements can be realized, for both all-electron and local pseudopotential calculations. On all the benchmark systems studied, we observe diminishing returns in computational savings beyond the sixth-order for accuracies commensurate with chemical accuracy. A comparative study of the computational efficiency of the proposed higher-order finite-element discretizations suggests that the performance of finite-element basis is competing with the plane-wave discretization for non-periodic local pseudopotential calculations, and compares to the Gaussian basis for all-electron calculations within an order of magnitude. Further, we demonstrate the capability of the proposed approach to compute the electronic structure of a metallic system containing 1688 atoms using modest computational resources, and good scalability of the present implementation up to 192 processors.
△ Less
Submitted 4 July, 2013; v1 submitted 30 June, 2012;
originally announced July 2012.
-
Higher-order adaptive finite-element methods for orbital-free density functional theory
Authors:
Phani Motamarri,
Mrinal Iyer,
Jaroslaw Knap,
Vikram Gavini
Abstract:
In the present work, we investigate the computational efficiency afforded by higher-order finite-element discretization of the saddle-point formulation of orbital-free density functional theory. We first investigate the robustness of viable solution schemes by analyzing the solvability conditions of the discrete problem. We find that a staggered solution procedure where the potential fields are co…
▽ More
In the present work, we investigate the computational efficiency afforded by higher-order finite-element discretization of the saddle-point formulation of orbital-free density functional theory. We first investigate the robustness of viable solution schemes by analyzing the solvability conditions of the discrete problem. We find that a staggered solution procedure where the potential fields are computed consistently for every trial electron-density is a robust solution procedure for higher-order finite-element discretizations. We next study the numerical convergence rates for various orders of finite-element approximations on benchmark problems. We obtain close to optimal convergence rates in our studies, although orbital-free density-functional theory is nonlinear in nature and some benchmark problems have Coulomb singular potential fields. We finally investigate the computational efficiency of various higher-order finite-element discretizations by measuring the CPU time for the solution of discrete equations on benchmark problems that include large Aluminum clusters. In these studies, we use mesh coarse-graining rates that are derived from error estimates and an a priori knowledge of the asymptotic solution of the far-field electronic fields. Our studies reveal a significant 100-1000 fold computational savings afforded by the use of higher-order finite-element discretization, alongside providing the desired chemical accuracy. We consider this study as a step towards develo** a robust and computationally efficient discretization of electronic structure calculations using the finite-element basis.
△ Less
Submitted 28 January, 2012; v1 submitted 6 October, 2011;
originally announced October 2011.