-
A Parallelizable Energy-Preserving Integrator MB4 and Its Application to Quantum-Mechanical Wavepacket Dynamics
Authors:
Tsubasa Sakai,
Shuhei Kudo,
Hiroto Imachi,
Yuto Miyatake,
Takeo Hoshi,
Yusaku Yamamoto
Abstract:
In simulating physical systems, conservation of the total energy is often essential, especially when energy conversion between different forms of energy occurs frequently. Recently, a new fourth order energy-preserving integrator named MB4 was proposed based on the so-called continuous stage Runge--Kutta methods (Y.~Miyatake and J.~C.~Butcher, SIAM J.~Numer.~Anal., 54(3), 1993-2013). A salient fea…
▽ More
In simulating physical systems, conservation of the total energy is often essential, especially when energy conversion between different forms of energy occurs frequently. Recently, a new fourth order energy-preserving integrator named MB4 was proposed based on the so-called continuous stage Runge--Kutta methods (Y.~Miyatake and J.~C.~Butcher, SIAM J.~Numer.~Anal., 54(3), 1993-2013). A salient feature of this method is that it is parallelizable, which makes its computational time for one time step comparable to that of second order methods. In this paper, we illustrate how to apply the MB4 method to a concrete ordinary differential equation using the nonlinear Schrödinger-type equation on a two-dimensional grid as an example. This system is a prototypical model of two-dimensional disordered organic material and is difficult to solve with standard methods like the classical Runge--Kutta methods due to the nonlinearity and the $δ$-function like potential coming from defects. Numerical tests show that the method can solve the equation stably and preserves the total energy to 16-digit accuracy throughout the simulation. It is also shown that parallelization of the method yields up to 2.8 times speedup using 3 computational nodes.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Numerical aspect of large-scale electronic state calculation for flexible device material
Authors:
Takeo Hoshi,
Hiroto Imachi,
Akiyoshi Kuwata,
Kohsuke Kakuda,
Takatoshi Fujita,
Hiroyuki Matsui
Abstract:
Numerical aspects of large-scale electronic state calculation are explored on flexible organic device materials. Physical theory, numerical method and real application studies are discussed in the context of application-algorithm-architecture co-design. An application study was carried out for disordered organic thin film. Participation ratio, a measure for the spatial extension of electronic wave…
▽ More
Numerical aspects of large-scale electronic state calculation are explored on flexible organic device materials. Physical theory, numerical method and real application studies are discussed in the context of application-algorithm-architecture co-design. An application study was carried out for disordered organic thin film. Participation ratio, a measure for the spatial extension of electronic wavefunction is focused on, since it is crucial for device property. A data scientific research is reported for a classification problem of disordered organic polymers, in which participation ratio is used as descriptor. These application studies indicate the potential need of purpose-specific solvers for internal eigenpairs.
△ Less
Submitted 14 December, 2018; v1 submitted 6 August, 2018;
originally announced August 2018.
-
EigenKernel - A middleware for parallel generalized eigenvalue solvers to attain high scalability and usability
Authors:
Kazuyuki Tanaka,
Hiroto Imachi,
Tomoya Fukumoto,
Akiyoshi Kuwata,
Yuki Harada,
Takeshi Fukaya,
Yusaku Yamamoto,
Takeo Hoshi
Abstract:
An open-source middleware EigenKernel was developed for use with parallel generalized eigenvalue solvers or large-scale electronic state calculation to attain high scalability and usability. The middleware enables the users to choose the optimal solver, among the three parallel eigenvalue libraries of ScaLAPACK, ELPA, EigenExa and hybrid solvers constructed from them, according to the problem spec…
▽ More
An open-source middleware EigenKernel was developed for use with parallel generalized eigenvalue solvers or large-scale electronic state calculation to attain high scalability and usability. The middleware enables the users to choose the optimal solver, among the three parallel eigenvalue libraries of ScaLAPACK, ELPA, EigenExa and hybrid solvers constructed from them, according to the problem specification and the target architecture. The benchmark was carried out on the Oakforest-PACS supercomputer and reveals that ELPA, EigenExa and their hybrid solvers show better performance, when compared with pure ScaLAPACK solvers. The benchmark on the K computer is also used for discussion. In addition, a preliminary research for the performance prediction was investigated, so as to predict the elapsed time T as the function of the number of used nodes P (T=T(P)). The prediction is based on Bayesian inference using the Markov Chain Monte Carlo (MCMC) method and the test calculation indicates that the method is applicable not only to performance interpolation but also to extrapolation. Such a middleware is of crucial importance for application-algorithm-architecture co-design among the current, next-generation (exascale), and future-generation (post-Moore era) supercomputers.
△ Less
Submitted 19 December, 2018; v1 submitted 3 June, 2018;
originally announced June 2018.
-
Variance-based Gradient Compression for Efficient Distributed Deep Learning
Authors:
Yusuke Tsuzuku,
Hiroto Imachi,
Takuya Akiba
Abstract:
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient comm…
▽ More
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient communication, but they either suffer a low compression ratio or significantly harm the resulting model accuracy, particularly when applied to convolutional neural networks. To address these issues, we propose a method to reduce the communication overhead of distributed deep learning. Our key observation is that gradient updates can be delayed until an unambiguous (high amplitude, low variance) gradient has been calculated. We also present an efficient algorithm to compute the variance with negligible additional cost. We experimentally show that our method can achieve very high compression ratio while maintaining the result model accuracy. We also analyze the efficiency using computation and communication cost models and provide the evidence that this method enables distributed deep learning for many scenarios with commodity environments.
△ Less
Submitted 19 February, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
Extremely scalable algorithm for 10$^8$-atom quantum material simulation on the full system of the K computer
Authors:
Takeo Hoshi,
Hiroto Imachi,
Kiyoshi Kumahata,
Masaaki Terai,
Kengo Miyamoto,
Kazuo Minami,
Fumiyoshi Shoji
Abstract:
An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (electronic state calculation) with 10$^8$ atoms or 100-nm-scale materials. The mathematical foundation is generalized shifted linear equations ((zB - A) x = b), instead of conventional generalized eigenvalue equations. The method has a highly parallelizable mathematical structure. The fundamental theory…
▽ More
An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (electronic state calculation) with 10$^8$ atoms or 100-nm-scale materials. The mathematical foundation is generalized shifted linear equations ((zB - A) x = b), instead of conventional generalized eigenvalue equations. The method has a highly parallelizable mathematical structure. The fundamental theory is mathematical and is applicable also to other scientific fields. The benchmark shows an extreme strong scaling and a qualified time-to-solution on the full system of the K computer. The method was demonstrated in a real material research for ultra-flexible (organic) devices, key devices of next-generation IoT products. The present paper shows that an innovative scalable algorithm for a real research can appear by the co-design among application, algorithm and architecture.
△ Less
Submitted 7 October, 2016; v1 submitted 27 September, 2016;
originally announced September 2016.
-
One-hundred-nm-scale electronic structure and transport calculations of organic polymers on the K computer
Authors:
Hiroto Imachi,
Seiya Yokoyama,
Takami Kaji,
Yukiya Abe,
Tomofumi Tada,
Takeo Hoshi
Abstract:
One-hundred-nm-scale electronic structure calculations were carried out on the K supercomputer by our original simulation code ELSES (http://www.elses.jp/) The present paper reports preliminary results of transport calculations for condensed organic polymers. Large-scale calculations are realized by novel massively parallel order-N algorithms. The transport calculations were carried out as a theor…
▽ More
One-hundred-nm-scale electronic structure calculations were carried out on the K supercomputer by our original simulation code ELSES (http://www.elses.jp/) The present paper reports preliminary results of transport calculations for condensed organic polymers. Large-scale calculations are realized by novel massively parallel order-N algorithms. The transport calculations were carried out as a theoretical extension for the quantum wavepacket dynamics simulation. The method was applied to a single polymer chain and condensed polymers.
△ Less
Submitted 31 March, 2016;
originally announced March 2016.
-
Hybrid Numerical Solvers for Massively Parallel Eigenvalue Computation and Their Benchmark with Electronic Structure Calculations
Authors:
Hiroto Imachi,
Takeo Hoshi
Abstract:
Optimally hybrid numerical solvers were constructed for massively parallel generalized eigenvalue problem (GEP).The strong scaling benchmark was carried out on the K computer and other supercomputers for electronic structure calculation problems in the matrix sizes of M = 10^4-10^6 with upto 105 cores. The procedure of GEP is decomposed into the two subprocedures of the reducer to the standard eig…
▽ More
Optimally hybrid numerical solvers were constructed for massively parallel generalized eigenvalue problem (GEP).The strong scaling benchmark was carried out on the K computer and other supercomputers for electronic structure calculation problems in the matrix sizes of M = 10^4-10^6 with upto 105 cores. The procedure of GEP is decomposed into the two subprocedures of the reducer to the standard eigenvalue problem (SEP) and the solver of SEP. A hybrid solver is constructed, when a routine is chosen for each subprocedure from the three parallel solver libraries of ScaLAPACK, ELPA and EigenExa. The hybrid solvers with the two newer libraries, ELPA and EigenExa, give better benchmark results than the conventional ScaLAPACK library. The detailed analysis on the results implies that the reducer can be a bottleneck in next-generation (exa-scale) supercomputers, which indicates the guidance for future research. The code was developed as a middleware and a mini-application and will appear online.
△ Less
Submitted 24 April, 2015;
originally announced April 2015.
-
Novel linear algebraic theory and one-hundred-million-atom quantum material simulations on the K computer
Authors:
Takeo Hoshi,
Tomohiro Sogabe,
Takafumi Miyatad,
Dong** Lee,
Shao-Liang Zhang,
Hiroto Imachi,
Yoshifumi Kawai,
Yohei Akiyama,
Keita Yamazaki,
Seiya Yokoyama
Abstract:
The present paper gives a review of our recent progress and latest results for novel linear-algebraic algorithms and its application to large-scale quantum material simulations or electronic structure calculations. The algorithms are Krylov-subspace (iterative) solvers for generalized shifted linear equations, in the form of (zS-H)x=b,in stead of conventional generalized eigen-value equation. The…
▽ More
The present paper gives a review of our recent progress and latest results for novel linear-algebraic algorithms and its application to large-scale quantum material simulations or electronic structure calculations. The algorithms are Krylov-subspace (iterative) solvers for generalized shifted linear equations, in the form of (zS-H)x=b,in stead of conventional generalized eigen-value equation. The method was implemented in our order-$N$ calculation code ELSES (http://www.elses.jp/) with modelled systems based on ab initio calculations. The code realized one-hundred-million-atom, or 100-nm-scale, quantum material simulations on the K computer in a high parallel efficiency with up to all the built-in processor cores. The present paper also explains several methodological aspects, such as use of XML files and 'novice' mode for general users. A sparse matrix data library in our real problems (http://www.elses.jp/matrix/) was prepared. Internal eigen-value problem is discussed as a general need from the quantum material simulation. The present study is a interdisciplinary one and is sometimes called 'Application-Algorithm-Architecture co-design'. The co-design will play a crucial role in exa-scale scientific computations.
△ Less
Submitted 28 February, 2014;
originally announced February 2014.