Search | arXiv e-print repository

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Authors: Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda , et al. (18 additional authors not shown)

Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning appli… ▽ More Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications driven by the MLCommons Association. We present the results from the first submission round, including a diverse set of some of the world's largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization, and communication scheduling, enabling overall $>10 \times$ (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system's memory hierarchy, and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior to parameterize extended roofline performance models in future rounds. △ Less

Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2108.12968 [pdf, other]

doi 10.1063/5.0069971

Kinetic construction of the high-beta anisotropic-pressure equilibrium in the magnetosphere

Authors: H. Aibara, Z. Yoshida, K. Shirahata

Abstract: A theoretical model of the high-beta equilibrium of magnetospheric plasma was constructed by consistently connecting the (anisotropic pressure) Grad-Shafranov equation and the Vlasov equation. The Grad-Shafranov equation was used to determine the axisymmetric magnetic field for a given magnetization current corresponding to a pressure tensor. Given a magnetic field, we determine the distribution f… ▽ More A theoretical model of the high-beta equilibrium of magnetospheric plasma was constructed by consistently connecting the (anisotropic pressure) Grad-Shafranov equation and the Vlasov equation. The Grad-Shafranov equation was used to determine the axisymmetric magnetic field for a given magnetization current corresponding to a pressure tensor. Given a magnetic field, we determine the distribution function as a specific equilibrium solution of the Vlasov equation, using which we obtain the pressure tensor. We need to find an appropriate class of distribution function for these two equations to be satisfied simultaneously. Here, we consider the distribution function that maximizes the entropy on the submanifold specified by the magnetic moment. This is equivalent to the reduction of the canonical Poisson bracket to the noncanonical one having the Casimir corresponding to the magnetic moment. The pressure tensor then becomes a function of the magnetic field (through the cyclotron frequency) and flux function, satisfying the requirement of the Grad-Shafranov equation. Numerical solutions have been obtained to interpret the experimental data of the RT-1 laboratory magnetosphere. △ Less

Submitted 13 November, 2021; v1 submitted 29 August, 2021; originally announced August 2021.

Comments: 19 pages, 8 figures

Journal ref: Physics of Plasmas 28.12 (2021): 122301

arXiv:1612.08484 [pdf, ps, other]

An Automated CNN Recommendation System for Image Classification Tasks

Authors: Song Wang, Li Sun, Wei Fan, Jun Sun, Satoshi Naoi, Koichi Shirahata, Takuya Fukagai, Yasumoto Tomita, Atsushi Ike

Abstract: Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CN… ▽ More Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CNN recommendation system for image classification task. Our system is able to evaluate the complexity of the classification task and the classification ability of the CNN model precisely. By using the evaluation results, the system can recommend the optimal CNN model and which can match the task perfectly. The recommendation process of the system is very fast since we don't need any model training. The experiment results proved that the evaluation methods are very accurate and reliable. △ Less

Submitted 26 December, 2016; originally announced December 2016.

Comments: Submitted to ICME 2017 and all the methods in this paper are patented

Showing 1–3 of 3 results for author: Shirahata, K