Search | arXiv e-print repository

Design and Development of a Java Parallel I/O Library

Authors: Muhammad Sohaib Ayub, Muhammad Adnan, Muhammad Yasir Shafi

Abstract: Parallel I/O refers to the ability of scientific programs to concurrently read/write from/to a single file from multiple processes executing on distributed memory platforms like compute clusters. In the HPC world, I/O becomes a significant bottleneck for many real-world scientific applications. In the last two decades, there has been significant research in improving the performance of I/O operati… ▽ More Parallel I/O refers to the ability of scientific programs to concurrently read/write from/to a single file from multiple processes executing on distributed memory platforms like compute clusters. In the HPC world, I/O becomes a significant bottleneck for many real-world scientific applications. In the last two decades, there has been significant research in improving the performance of I/O operations in scientific computing for traditional languages including C, C++, and Fortran. As a result of this, several mature and high-performance libraries including ROMIO (implementation of MPI-IO), parallel HDF5, Parallel I/O (PIO), and parallel netCDF are available today that provide efficient I/O for scientific applications. However, there is very little research done to evaluate and improve I/O performance of Java-based HPC applications. The main hindrance in the development of efficient parallel I/O Java libraries is the lack of a standard API (something equivalent to MPI-IO). Some adhoc solutions have been developed and used in proprietary applications, but there is no general-purpose solution that can be used by performance hungry applications. As part of this project, we plan to develop a Java-based parallel I/O API inspired by the MPI-IO bindings (MPI 2.0 standard document) for C, C++, and Fortran. Once the Java equivalent API of MPI-IO has been developed, we will develop a reference implementation on top of existing Java messaging libraries. Later, we will evaluate and compare performance of our reference Java Parallel I/O library with C/C++ counterparts using benchmarks and real-world applications. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 10 pages

arXiv:2204.10323 [pdf, other]

Accelerating Physics Simulations with TPUs: An Inundation Modeling Example

Authors: Damien Pierce, R. Lily Hu, Yusef Shafi, Anudhyan Boral, Vladimir Anisimov, Sella Nevo, Yi-fan Chen

Abstract: Recent advancements in hardware accelerators such as Tensor Processing Units (TPUs) speed up computation time relative to Central Processing Units (CPUs) not only for machine learning but, as demonstrated here, also for scientific modeling and computer simulations. To study TPU hardware for distributed scientific computing, we solve partial differential equations (PDEs) for the physics simulation… ▽ More Recent advancements in hardware accelerators such as Tensor Processing Units (TPUs) speed up computation time relative to Central Processing Units (CPUs) not only for machine learning but, as demonstrated here, also for scientific modeling and computer simulations. To study TPU hardware for distributed scientific computing, we solve partial differential equations (PDEs) for the physics simulation of fluids to model riverine floods. We demonstrate that TPUs achieve a two orders of magnitude speedup over CPUs. Running physics simulations on TPUs is publicly accessible via the Google Cloud Platform, and we release a Python interactive notebook version of the simulation. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: 14 pages, 10 figures. Accepted for publication in The International Journal of High Performance Computing Applications

arXiv:2104.14659 [pdf, other]

doi 10.1103/PhysRevD.105.052008

End-to-End Jet Classification of Boosted Top Quarks with the CMS Open Data

Authors: Michael Andrews, Bjorn Burkle, Yi-fan Chen, Davide DiCroce, Sergei Gleyzer, Ulrich Heintz, Meenakshi Narain, Manfred Paulini, Nikolas Pervan, Yusef Shafi, Wei Sun, Emanuele Usai, Kun Yang

Abstract: We describe a novel application of the end-to-end deep learning technique to the task of discriminating top quark-initiated jets from those originating from the hadronization of a light quark or a gluon. The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation of the high-energy collision event. In this study, we use low-level detector informat… ▽ More We describe a novel application of the end-to-end deep learning technique to the task of discriminating top quark-initiated jets from those originating from the hadronization of a light quark or a gluon. The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation of the high-energy collision event. In this study, we use low-level detector information from the simulated CMS Open Data samples to construct the top jet classifiers. To optimize classifier performance we progressively add low-level information from the CMS tracking detector, including pixel detector reconstructed hits and impact parameters, and demonstrate the value of additional tracking information even when no new spatial structures are added. Relying only on calorimeter energy deposits and reconstructed pixel detector hits, the end-to-end classifier achieves an AUC score of 0.975$\pm$0.002 for the task of classifying boosted top quark jets. After adding derived track quantities, the classifier AUC score increases to 0.9824$\pm$0.0013, serving as the first performance benchmark for these CMS Open Data samples. We additionally provide a timing performance comparison of different processor unit architectures for training the network. △ Less

Submitted 21 January, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: 9 pages, 3 figures, 4 tables; v3: unpublished

arXiv:2002.06198 [pdf, other]

Simulation Pipeline for Traffic Evacuation in Urban Areas and Emergency Traffic Management Policy Improvements through Case Studies

Authors: Yu Chen, S. Yusef Shafi, Yi-fan Chen

Abstract: Traffic evacuation plays a critical role in saving lives in devastating disasters such as hurricanes, wildfires, floods, earthquakes, etc. An ability to evaluate evacuation plans in advance for these rare events, including identifying traffic flow bottlenecks, improving traffic management policies, and understanding the robustness of the traffic management policy are critical for emergency managem… ▽ More Traffic evacuation plays a critical role in saving lives in devastating disasters such as hurricanes, wildfires, floods, earthquakes, etc. An ability to evaluate evacuation plans in advance for these rare events, including identifying traffic flow bottlenecks, improving traffic management policies, and understanding the robustness of the traffic management policy are critical for emergency management. Given the rareness of such events and the corresponding lack of real data, traffic simulation provides a flexible and versatile approach for such scenarios, and furthermore allows dynamic interaction with the simulated evacuation. In this paper, we build a traffic simulation pipeline to explore the above problems, covering many aspects of evacuation, including map creation, demand generation, vehicle behavior, bottleneck identification, traffic management policy improvement, and results analysis. We apply the pipeline to two case studies in California. The first is Paradise, which was destroyed by a large wildfire in 2018 and experienced catastrophic traffic jams during the evacuation. The second is Mill Valley, which has high risk of wildfire and potential traffic issues since the city is situated in a narrow valley. △ Less

Submitted 22 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: 38 pages, 9 figures

arXiv:1910.05006 [pdf, other]

Inundation Modeling in Data Scarce Regions

Authors: Zvika Ben-Haim, Vladimir Anisimov, Aaron Yonas, Varun Gulshan, Yusef Shafi, Stephan Hoyer, Sella Nevo

Abstract: Flood forecasts are crucial for effective individual and governmental protective action. The vast majority of flood-related casualties occur in develo** countries, where providing spatially accurate forecasts is a challenge due to scarcity of data and lack of funding. This paper describes an operational system providing flood extent forecast maps covering several flood-prone regions in India, wi… ▽ More Flood forecasts are crucial for effective individual and governmental protective action. The vast majority of flood-related casualties occur in develo** countries, where providing spatially accurate forecasts is a challenge due to scarcity of data and lack of funding. This paper describes an operational system providing flood extent forecast maps covering several flood-prone regions in India, with the goal of being sufficiently scalable and cost-efficient to facilitate the establishment of effective flood forecasting systems globally. △ Less

Submitted 30 October, 2019; v1 submitted 11 October, 2019; originally announced October 2019.

Comments: To appear in the Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop (AI+HADR) @ NeurIPS 2019

arXiv:1906.02818 [pdf, other]

Tensor Processing Units for Financial Monte Carlo

Authors: Francois Belletti, Davis King, Kun Yang, Roland Nelet, Yusef Shafi, Yi-Fan Chen, John Anderson

Abstract: Monte Carlo methods are critical to many routines in quantitative finance such as derivatives pricing, hedging and risk metrics. Unfortunately, Monte Carlo methods are very computationally expensive when it comes to running simulations in high-dimensional state spaces where they are still a method of choice in the financial industry. Recently, Tensor Processing Units (TPUs) have provided considera… ▽ More Monte Carlo methods are critical to many routines in quantitative finance such as derivatives pricing, hedging and risk metrics. Unfortunately, Monte Carlo methods are very computationally expensive when it comes to running simulations in high-dimensional state spaces where they are still a method of choice in the financial industry. Recently, Tensor Processing Units (TPUs) have provided considerable speedups and decreased the cost of running Stochastic Gradient Descent (SGD) in Deep Learning. After highlighting computational similarities between training neural networks with SGD and simulating stochastic processes, we ask in the present paper whether TPUs are accurate, fast and simple enough to use for financial Monte Carlo. Through a theoretical reminder of the key properties of such methods and thorough empirical experiments we examine the fitness of TPUs for option pricing, hedging and risk metrics computation. In particular we demonstrate that, in spite of the use of mixed precision, TPUs still provide accurate estimators which are fast to compute when compared to GPUs. We also show that the Tensorflow programming model for TPUs is elegant, expressive and simplifies automated differentiation. △ Less

Submitted 27 January, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Showing 1–6 of 6 results for author: Shafi, Y