-
Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku
Authors:
Tomoyuki Tokuue,
Tomoaki Ishiyama
Abstract:
Sorting is one of the most basic algorithms, and develo** highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The…
▽ More
Sorting is one of the most basic algorithms, and develo** highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption
Authors:
Takumi Ishiyama,
Takuya Suzuki,
Hayato Yamana
Abstract:
In the big data era, cloud-based machine learning as a service (MLaaS) has attracted considerable attention. However, when handling sensitive data, such as financial and medical data, a privacy issue emerges, because the cloud server can access clients' raw data. A common method of handling sensitive data in the cloud uses homomorphic encryption, which allows computation over encrypted data withou…
▽ More
In the big data era, cloud-based machine learning as a service (MLaaS) has attracted considerable attention. However, when handling sensitive data, such as financial and medical data, a privacy issue emerges, because the cloud server can access clients' raw data. A common method of handling sensitive data in the cloud uses homomorphic encryption, which allows computation over encrypted data without decryption. Previous research usually adopted a low-degree polynomial map** function, such as the square function, for data classification. However, this technique results in low classification accuracy. In this study, we seek to improve the classification accuracy for inference processing in a convolutional neural network (CNN) while using homomorphic encryption. We adopt an activation function that approximates Google's Swish activation function while using a fourth-order polynomial. We also adopt batch normalization to normalize the inputs for the Swish function to fit the input range to minimize the error. We implemented CNN inference labeling over homomorphic encryption using the Microsoft's Simple Encrypted Arithmetic Library for the Cheon-Kim-Kim-Song (CKKS) scheme. The experimental evaluations confirmed classification accuracies of 99.22% and 80.48% for MNIST and CIFAR-10, respectively, which entails 0.04% and 4.11% improvements, respectively, over previous methods.
△ Less
Submitted 2 December, 2020; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Statistically Discriminative Sub-trajectory Mining
Authors:
Vo Nguyen Le Duy,
Takuto Sakuma,
Taiju Ishiyama,
Hiroki Toda,
Kazuya Nishi,
Masayuki Karasuyama,
Yuta Okubo,
Masayuki Sunaga,
Yasuo Tabei,
Ichiro Takeuchi
Abstract:
We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage o…
▽ More
We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage of the SDSM method is that the statistical significance of the extracted sub-trajectories are properly controlled in the sense that the probability of finding a false positive sub-trajectory is smaller than a specified significance threshold alpha (e.g., 0.05), which is indispensable when the method is used in scientific or social studies under noisy environment. Finding such statistically discriminative sub-trajectories from massive trajectory dataset is both computationally and statistically challenging. In the SDSM method, we resolve the difficulties by introducing a tree representation among sub-trajectories and running an efficient permutation-based statistical inference method on the tree. To the best of our knowledge, SDSM is the first method that can efficiently extract statistically discriminative sub-trajectories from massive trajectory dataset. We illustrate the effectiveness and scalability of the SDSM method by applying it to a real-world dataset with 1,000,000 trajectories which contains 16,723,602,505 sub-trajectories.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
Authors:
Jeroen BĂ©dorf,
Evghenii Gaburov,
Michiko S. Fujii,
Keigo Nitadori,
Tomoaki Ishiyama,
Simon Portegies Zwart
Abstract:
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This i…
▽ More
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
High Performance Gravitational N-body Simulations on a Planet-wide Distributed Supercomputer
Authors:
Derek Groen,
Simon Portegies Zwart,
Tomoaki Ishiyama,
Junichiro Makino
Abstract:
We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seco…
▽ More
We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seconds and the communication over 30.000 km of optical network cable we are able to achieve about 87% of the performance compared to an equal number of cores on a single supercomputer. We argue that using widely distributed supercomputers in order to acquire more compute power is technically feasible, and that the largest obstacle is introduced by local scheduling and reservation policies.
△ Less
Submitted 3 January, 2011;
originally announced January 2011.
-
Simulating the universe on an intercontinental grid of supercomputers
Authors:
Simon Portegies Zwart,
Tomoaki Ishiyama,
Derek Groen,
Keigo Nitadori,
Junichiro Makino,
Cees de Laat,
Stephen McMillan,
Kei Hiraki,
Stefan Harfst,
Paola Grosso
Abstract:
Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbation…
▽ More
Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbations in the nonlinear regime. Simulating the formation of large-scale structures in the universe, however, is still a challenge due to the enormous dynamic range in spatial and temporal coordinates, and due to the enormous computer resources required. The dynamic range is generally dealt with by the hybridization of numerical techniques. We deal with the computational requirements by connecting two supercomputers via an optical network and make them operate as a single machine. This is challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as one is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two computers and the 'gridification' of the code enables us to achieve a 90% efficiency for this distributed intercontinental supercomputer.
△ Less
Submitted 5 January, 2010;
originally announced January 2010.