Search | arXiv e-print repository

doi 10.2197/ipsjjip.31.452

Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku

Authors: Tomoyuki Tokuue, Tomoaki Ishiyama

Abstract: Sorting is one of the most basic algorithms, and develo** highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The… ▽ More Sorting is one of the most basic algorithms, and develo** highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 7 pages, 6 figures, accepted by Journal of Information Processing

Journal ref: Journal of Information Processing, 2023, 31, 452-458

arXiv:2009.03727 [pdf]

Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption

Authors: Takumi Ishiyama, Takuya Suzuki, Hayato Yamana

Abstract: In the big data era, cloud-based machine learning as a service (MLaaS) has attracted considerable attention. However, when handling sensitive data, such as financial and medical data, a privacy issue emerges, because the cloud server can access clients' raw data. A common method of handling sensitive data in the cloud uses homomorphic encryption, which allows computation over encrypted data withou… ▽ More In the big data era, cloud-based machine learning as a service (MLaaS) has attracted considerable attention. However, when handling sensitive data, such as financial and medical data, a privacy issue emerges, because the cloud server can access clients' raw data. A common method of handling sensitive data in the cloud uses homomorphic encryption, which allows computation over encrypted data without decryption. Previous research usually adopted a low-degree polynomial map** function, such as the square function, for data classification. However, this technique results in low classification accuracy. In this study, we seek to improve the classification accuracy for inference processing in a convolutional neural network (CNN) while using homomorphic encryption. We adopt an activation function that approximates Google's Swish activation function while using a fourth-order polynomial. We also adopt batch normalization to normalize the inputs for the Swish function to fit the input range to minimize the error. We implemented CNN inference labeling over homomorphic encryption using the Microsoft's Simple Encrypted Arithmetic Library for the Cheon-Kim-Kim-Song (CKKS) scheme. The experimental evaluations confirmed classification accuracies of 99.22% and 80.48% for MNIST and CIFAR-10, respectively, which entails 0.04% and 4.11% improvements, respectively, over previous methods. △ Less

Submitted 2 December, 2020; v1 submitted 8 September, 2020; originally announced September 2020.

Comments: Accepted at 7th International Workshop on Privacy and Security of Big Data in conjunction with 2020 IEEE International Conference on Big Data (IEEE BigData 2020)

arXiv:1905.01788 [pdf, other]

Statistically Discriminative Sub-trajectory Mining

Authors: Vo Nguyen Le Duy, Takuto Sakuma, Taiju Ishiyama, Hiroki Toda, Kazuya Nishi, Masayuki Karasuyama, Yuta Okubo, Masayuki Sunaga, Yasuo Tabei, Ichiro Takeuchi

Abstract: We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage o… ▽ More We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage of the SDSM method is that the statistical significance of the extracted sub-trajectories are properly controlled in the sense that the probability of finding a false positive sub-trajectory is smaller than a specified significance threshold alpha (e.g., 0.05), which is indispensable when the method is used in scientific or social studies under noisy environment. Finding such statistically discriminative sub-trajectories from massive trajectory dataset is both computationally and statistically challenging. In the SDSM method, we resolve the difficulties by introducing a tree representation among sub-trajectories and running an efficient permutation-based statistical inference method on the tree. To the best of our knowledge, SDSM is the first method that can efficiently extract statistically discriminative sub-trajectories from massive trajectory dataset. We illustrate the effectiveness and scalability of the SDSM method by applying it to a real-world dataset with 1,000,000 trajectories which contains 16,723,602,505 sub-trajectories. △ Less

Submitted 5 May, 2019; originally announced May 2019.

arXiv:1412.0659 [pdf, other]

doi 10.1109/SC.2014.10

24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs

Authors: Jeroen Bédorf, Evghenii Gaburov, Michiko S. Fujii, Keigo Nitadori, Tomoaki Ishiyama, Simon Portegies Zwart

Abstract: We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This i… ▽ More We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively. △ Less

Submitted 1 December, 2014; originally announced December 2014.

Comments: 12 pages, 4 figures, Published in: 'Proceeding SC '14 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis'. Gordon Bell Prize 2014 finalist

arXiv:1101.0605 [pdf, other]

doi 10.1088/1749-4699/4/1/015001

High Performance Gravitational N-body Simulations on a Planet-wide Distributed Supercomputer

Authors: Derek Groen, Simon Portegies Zwart, Tomoaki Ishiyama, Junichiro Makino

Abstract: We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seco… ▽ More We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seconds and the communication over 30.000 km of optical network cable we are able to achieve about 87% of the performance compared to an equal number of cores on a single supercomputer. We argue that using widely distributed supercomputers in order to acquire more compute power is technically feasible, and that the largest obstacle is introduced by local scheduling and reservation policies. △ Less

Submitted 3 January, 2011; originally announced January 2011.

Comments: 30 pages, 11 figures, accepted by Comp. Science and Discovery

MSC Class: 68M14 (primary); 68M20; 85-08; 85A40 (secondary) ACM Class: C.2.4; C.2.5

Journal ref: Comput. Sci. Disc. 4 (2011) 015001

arXiv:1001.0773 [pdf, ps, other]

doi 10.1109/MC.2009.419

Simulating the universe on an intercontinental grid of supercomputers

Authors: Simon Portegies Zwart, Tomoaki Ishiyama, Derek Groen, Keigo Nitadori, Junichiro Makino, Cees de Laat, Stephen McMillan, Kei Hiraki, Stefan Harfst, Paola Grosso

Abstract: Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbation… ▽ More Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbations in the nonlinear regime. Simulating the formation of large-scale structures in the universe, however, is still a challenge due to the enormous dynamic range in spatial and temporal coordinates, and due to the enormous computer resources required. The dynamic range is generally dealt with by the hybridization of numerical techniques. We deal with the computational requirements by connecting two supercomputers via an optical network and make them operate as a single machine. This is challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as one is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two computers and the 'gridification' of the code enables us to achieve a 90% efficiency for this distributed intercontinental supercomputer. △ Less

Submitted 5 January, 2010; originally announced January 2010.

Comments: Accepted for publication in IEEE Computer

Showing 1–6 of 6 results for author: Ishiyama, T