Search | arXiv e-print repository

Towards Generalized On-Chip Communication for Programmable Accelerators in Heterogeneous Architectures

Authors: Joseph Zuckerman, John-David Wellman, Ajay Vanamali, Manish Shankar, Gabriele Tombesi, Karthik Swaminathan, Kevin Lee, Mohit Kapur, Robert Philhower, Pradip Bose, Luca P. Carloni

Abstract: We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leve… ▽ More We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leveraging the SoC's coherence protocol, 4) an accelerator interface that offers fine-grained control over the communication mode used, and 5) an example ISA extension to support our enhancements. Our solution adds negligible area to the SoC architecture and requires minimal changes to the accelerators themselves. We have validated most of these features in complex FPGA prototypes and plan to include them in the open-source release of ESP in the coming months. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Appeared in the Sixth International Workshop on Domain Specific System Architecture (DOSSA-6)

arXiv:2406.17812 [pdf, other]

Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars

Authors: Wesley Brewer, Aditya Kashi, Sajal Dash, Aristeidis Tsaris, Junqi Yin, Mallikarjun Shankar, Feiyi Wang

Abstract: In a post-ChatGPT world, this paper explores the potential of leveraging scalable artificial intelligence for scientific discovery. We propose that scaling up artificial intelligence on high-performance computing platforms is essential to address such complex problems. This perspective focuses on scientific use cases like cognitive simulations, large language models for scientific inquiry, medical… ▽ More In a post-ChatGPT world, this paper explores the potential of leveraging scalable artificial intelligence for scientific discovery. We propose that scaling up artificial intelligence on high-performance computing platforms is essential to address such complex problems. This perspective focuses on scientific use cases like cognitive simulations, large language models for scientific inquiry, medical image analysis, and physics-informed approaches. The study outlines the methodologies needed to address such challenges at scale on supercomputers or the cloud and provides exemplars of such approaches applied to solve a variety of scientific problems. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 17 pages, 5 figures

arXiv:2312.04270 [pdf]

Stability of buoyant-Couette flow in a vertical porous slot

Authors: B. M. Shankar, I. S. Shivakumara

Abstract: The stability of two-dimensional buoyancy-driven convection in a vertical porous slot, wherein a plane Couette flow is additionally present, is studied. This complex fluid flow scenario is examined under the influence of Robin-type boundary conditions, which are applied to perturbations in both velocity and temperature. The inclusion of a time-derivative velocity term within the Darcy momentum equ… ▽ More The stability of two-dimensional buoyancy-driven convection in a vertical porous slot, wherein a plane Couette flow is additionally present, is studied. This complex fluid flow scenario is examined under the influence of Robin-type boundary conditions, which are applied to perturbations in both velocity and temperature. The inclusion of a time-derivative velocity term within the Darcy momentum equation notably introduces intricacies to the study. The stability of the basic natural convection flow is primarily governed by several key parameters namely, the Péclet number, the Prandtl-Darcy number, the Biot number and a non-negative parameter that dictates the nature of the vertical boundaries. Through numerical analysis, the stability eigenvalue problem is solved for a variety of combinations of boundary conditions. The outcomes of this analysis reveal the critical threshold values that signify the onset of instability. Furthermore, a detailed examination of the stability of the system has provided insights into both its commonalities and distinctions under different conditions. It is observed that, except for the scenario featuring impermeable-isothermal boundaries, the underlying base flow exhibits instability when subjected to various other configurations of perturbed velocity and temperature boundary conditions. This underscores the notion that the presence of Couette flow alone does not suffice to induce instability within the system. The plots depicting neutral stability curves show either bi-modal or uni-modal characteristics, contingent upon specific parameter values that influence the onset of instability. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.07020 [pdf]

Mechanically induced interaction between diamond and transition metals

Authors: Zhijie Wang, Susheng Tan, M. Ravi Shankar

Abstract: Purely mechanically induced mass transport between diamond and transition metals are investigated using transition thin metal film-deposited AFM tip scratching and in situ TEM scratching test. Due to the weak strength of the transition metal-diamond joints and transition metal thin films, AFM scratching rarely activated the mass transport interaction at the diamond-transition metal thin film inter… ▽ More Purely mechanically induced mass transport between diamond and transition metals are investigated using transition thin metal film-deposited AFM tip scratching and in situ TEM scratching test. Due to the weak strength of the transition metal-diamond joints and transition metal thin films, AFM scratching rarely activated the mass transport interaction at the diamond-transition metal thin film interfaces. In situ TEM scratching tests were performed by using a Nanofactory STM holder. The interaction at diamond and tungsten interface was successfully activated by nanoscale in-situ scratching under room temperature. The lattice structure of diamond and tungsten were characterized by HRTEM. The stress to activate the interaction was estimated by measuring the interplanar spacing change of tungsten nanotips before scratching and at the frame that the interaction was activated. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 28 Pages, 10 Figures

arXiv:2311.00860 [pdf, other]

Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning

Authors: Kuangdai Leng, Mallikarjun Shankar, Jeyan Thiyagalingam

Abstract: Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled co… ▽ More Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves" whereby reverse-mode AD becomes directly utilisable. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions (physical parameters). ZCS is easy to implement with current deep learning libraries; our own implementation is achieved by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently reduced GPU memory consumption and wall time for training by an order of magnitude, and such reduction factor scales with the number of functions. As a low-level optimisation technique, ZCS imposes no restrictions on data, physics (PDE) or network architecture and does not compromise training results from any aspect. △ Less

Submitted 14 March, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: Published in Journal of Computational Physics. https://doi.org/10.1016/j.jcp.2024.112904

arXiv:2310.14980 [pdf, other]

doi 10.1039/D3SM00664F

A Dimensionally-Reduced Nonlinear Elasticity Model for Liquid Crystal Elastomer Strips with Transverse Curvature

Authors: Kevin LoGrande, M. Ravi Shankar, Kaushik Dayal

Abstract: Liquid Crystalline Elastomers (LCEs) are active materials that are of interest due to their programmable response to various external stimuli such as light and heat. When exposed to these stimuli, the anisotropy in the response of the material is governed by the nematic director, which is a continuum parameter that is defined as the average local orientation of the mesogens in the liquid crystal p… ▽ More Liquid Crystalline Elastomers (LCEs) are active materials that are of interest due to their programmable response to various external stimuli such as light and heat. When exposed to these stimuli, the anisotropy in the response of the material is governed by the nematic director, which is a continuum parameter that is defined as the average local orientation of the mesogens in the liquid crystal phase. This nematic director can be programmed to be heterogeneous in space, creating a vast design space that is useful for applications ranging from artificial ligaments to deployable structures to self-assembling mechanisms. Even when specialized to long and thin strips of LCEs -- the focus of this work -- the vast design space has required the use of numerical simulations to aid in experimental discovery. To mitigate the computational expense of full 3-d numerical simulations, several dimensionally-reduced rod and ribbon models have been developed for LCE strips, but these have not accounted for the possibility of initial transverse curvature, like carpenter's tape spring. Motivated by recent experiments showing that transversely-curved LCE strips display a rich variety of configurations, this work derives a dimensionally-reduced 1-d model for pre-curved LCE strips. The 1-d model is validated against full 3-d finite element calculations, and it is also shown to capture experimental observations, including tape-spring-like localizations, in activated LCE strips. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.04610 [pdf, other]

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research. △ Less

Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2307.09226 [pdf, other]

A Blender-based channel simulator for FMCW Radar

Authors: Yuan Liu, Moein Ahmadi, Johann Fuchs, Mohammad Alaee-Kerahroodi, M. R. Bhavani Shankar

Abstract: Radar simulation is a promising way to provide data-cube with effectiveness and accuracy for AI-based approaches to radar applications. This paper develops a channel simulator to generate frequency-modulated continuous-wave (FMCW) waveform multiple inputs multiple outputs (MIMO) radar signals. In the proposed simulation framework, an open-source animation tool called Blender is utilized to model t… ▽ More Radar simulation is a promising way to provide data-cube with effectiveness and accuracy for AI-based approaches to radar applications. This paper develops a channel simulator to generate frequency-modulated continuous-wave (FMCW) waveform multiple inputs multiple outputs (MIMO) radar signals. In the proposed simulation framework, an open-source animation tool called Blender is utilized to model the scenarios and render animations. The ray tracing (RT) engine embedded can trace the radar propagation paths, i.e., the distance and signal strength of each path. The beat signal models of time division multiplexing (TDM)-MIMO are adapted to RT outputs. Finally, the environment-based models are simulated to show the validation. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Presented in ISCS23

Report number: ISCS23-26

arXiv:2305.04602 [pdf, ps, other]

RIS-Aided Wideband Holographic DFRC

Authors: Tong Wei, Linlong Wu, Kumar Vijay Mishra, M. R. Bhavani Shankar

Abstract: To enable non-line-of-sight (NLoS) sensing and communications, dual-function radar-communications (DFRC) systems have recently proposed employing reconfigurable intelligent surface (RIS) as a reflector in wireless media. However, in the dense environment and higher frequencies, severe propagation and attenuation losses are a hindrance for RIS-aided DFRC systems to utilize wideband processing. To t… ▽ More To enable non-line-of-sight (NLoS) sensing and communications, dual-function radar-communications (DFRC) systems have recently proposed employing reconfigurable intelligent surface (RIS) as a reflector in wireless media. However, in the dense environment and higher frequencies, severe propagation and attenuation losses are a hindrance for RIS-aided DFRC systems to utilize wideband processing. To this end, we propose equip** the transceivers with the reconfigurable holographic surface (RHS) that, different from RIS, is a metasurface with an embedded connected feed deployed at the transceiver for greater control of the radiation amplitude. This surface is crucial for designing compact low-cost wideband wireless systems, wherein ultra-massive antenna arrays are required to compensate for the losses incurred by severe attenuation and diffraction. We consider a novel wideband DFRC system equipped with an RHS at the transceiver and a RIS reflector in the channel. We jointly design the digital, holographic, and passive beamformers to maximize the radar signal-to-interference-plus-noise ratio (SINR) while ensuring the communications SINR among all users. The resulting nonconvex optimization problem involves maximin objective, constant modulus, and difference of convex constraints. We develop an alternating maximization method to decouple and iteratively solve these subproblems. Numerical experiments demonstrate that the proposed method achieves better radar performance than non-RIS, random-RHS, and randomly configured RIS-aided DFRC systems. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2210.03170 [pdf, other]

doi 10.1109/PMBS56514.2022.00014

WfBench: Automated Generation of Scientific Workflow Benchmarks

Authors: Tainã Coleman, Henri Casanova, Ketan Maheshwari, Loïc Pottier, Sean R. Wilkinson, Justin Wozniak, Frédéric Suter, Mallikarjun Shankar, Rafael Ferreira da Silva

Abstract: The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow bench… ▽ More The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We present experimental results that show that our approach generates benchmarks that are representative of production workflows, and conduct a case study to demonstrate the use and usefulness of our generated benchmarks to evaluate the performance of workflow systems under different configuration scenarios. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2209.13280 [pdf, ps, other]

Improving Pulse-Compression Weather Radar via the Joint Design of Subpulses and Extended Mismatch Filter

Authors: Linlong Wu, Mohammad Alaee-Kerahroodi, M. R. Bhavani Shankar

Abstract: Pulse compression can enhance both the performance in range resolution and sensitivity for weather radar. However, it will introduce the issue of high sidelobes if not delicately implemented. Motivated by this fact, we focus on the pulse compression design for weather radar in this paper. Specifically, we jointly design both the subpulse codes and extended mismatch filter based on the alternating… ▽ More Pulse compression can enhance both the performance in range resolution and sensitivity for weather radar. However, it will introduce the issue of high sidelobes if not delicately implemented. Motivated by this fact, we focus on the pulse compression design for weather radar in this paper. Specifically, we jointly design both the subpulse codes and extended mismatch filter based on the alternating direction method of multipliers (ADMM). This joint design will yield a pulse compression with low sidelobes, which equivalently implies a high signal-to-interference-plus-noise ratio (SINR) and a low estimation error on meteorological reflectivity. The experiment results demonstrate the efficacy of the proposed pulse compression strategy since its achieved meteorological reflectivity estimations are highly similar to the ground truth. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2207.02157 [pdf, other]

Multi-IRS-Aided Doppler-Tolerant Wideband DFRC System

Authors: Tong Wei, Linlong Wu, Kumar Vijay Mishra, M. R. Bhavani Shankar

Abstract: Intelligent reflecting surface (IRS) is recognized as an enabler of future dual-function radar-communications (DFRC) by improving spectral efficiency, coverage, parameter estimation, and interference suppression. Prior studies on IRS-aided DFRC focus either on narrowband processing, single-IRS deployment, static targets, non-clutter scenario, or on the under-utilized line-of-sight (LoS) and non-li… ▽ More Intelligent reflecting surface (IRS) is recognized as an enabler of future dual-function radar-communications (DFRC) by improving spectral efficiency, coverage, parameter estimation, and interference suppression. Prior studies on IRS-aided DFRC focus either on narrowband processing, single-IRS deployment, static targets, non-clutter scenario, or on the under-utilized line-of-sight (LoS) and non-line-of-sight (NLoS) paths. In this paper, we address the aforementioned shortcomings by optimizing a wideband DFRC system comprising multiple IRSs and a dual-function base station that jointly processes the LoS and NLoS wideband multi-carrier signals to improve both the communications SINR and the radar SINR in the presence of a moving target and clutter. We formulate the transmit, {receive} and IRS beamformer design as the maximization of the worst-case radar signal-to-interference-plus-noise ratio (SINR) subject to transmit power and communications SINR. We tackle this nonconvex problem under the alternating optimization framework, where the subproblems are solved by a combination of Dinkelbach algorithm, consensus alternating direction method of multipliers, and Riemannian steepest decent. Our numerical experiments show that the proposed multi-IRS-aided wideband DFRC provides over $4$ dB radar SINR and $31.7$\% improvement in target detection over a single-IRS system. △ Less

Submitted 10 August, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: 16 pages, 8 figures, 2 tables

arXiv:2204.07265 [pdf, other]

doi 10.1109/MNET.128.2200446

The Rise of Intelligent Reflecting Surfaces in Integrated Sensing and Communications Paradigms

Authors: Ahmet M. Elbir, Kumar Vijay Mishra, M. R. Bhavani Shankar, Symeon Chatzinotas

Abstract: The intelligent reflecting surface (IRS) alters the behavior of wireless media and, consequently, has potential to improve the performance and reliability of wireless systems such as communications and radar remote sensing. Recently, integrated sensing and communications (ISAC) has been widely studied as a means to efficiently utilize spectrum and thereby save cost and power. This article investig… ▽ More The intelligent reflecting surface (IRS) alters the behavior of wireless media and, consequently, has potential to improve the performance and reliability of wireless systems such as communications and radar remote sensing. Recently, integrated sensing and communications (ISAC) has been widely studied as a means to efficiently utilize spectrum and thereby save cost and power. This article investigates the role of IRS in the future ISAC paradigms. While there is a rich heritage of recent research into IRS-assisted communications, the IRS-assisted radars and ISAC remain relatively unexamined. We discuss the putative advantages of IRS deployment, such as coverage extension, interference suppression, and enhanced parameter estimation, for both communications and radar. We introduce possible IRS-assisted ISAC scenarios with common and dedicated surfaces. The article provides an overview of related signal processing techniques and the design challenges, such as wireless channel acquisition, waveform design, and security. △ Less

Submitted 20 December, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted paper in IEEE Network Magazine

Journal ref: IEEE Network, 2023

arXiv:2112.06670 [pdf, other]

MIMO Radar Transmit Beampattern Sha** for Spectrally Dense Environments

Authors: Ehsan Raei, Saeid Sedighi, Mohammad Alaee-Kerahroodi, M. R. Bhavani Shankar

Abstract: Designing unimodular waveforms with a desired beampattern, spectral occupancy and orthogonality level is of vital importance in the next generation Multiple-Input Multiple-Output (MIMO) radar systems. Motivated by this fact, in this paper, we propose a framework for sha** the beampattern in MIMO radar systems under the constraints simultaneously ensuring unimodularity, desired spectral occupancy… ▽ More Designing unimodular waveforms with a desired beampattern, spectral occupancy and orthogonality level is of vital importance in the next generation Multiple-Input Multiple-Output (MIMO) radar systems. Motivated by this fact, in this paper, we propose a framework for sha** the beampattern in MIMO radar systems under the constraints simultaneously ensuring unimodularity, desired spectral occupancy and orthogonality of the designed waveform. In this manner, the proposed framework is the most comprehensive approach for MIMO radar waveform design focusing on beampattern sha**. The problem formulation leads to a non-convex quadratic fractional programming. We propose an effective iterative to solve the problem, where each iteration is composed of a Semi-Definite Programming (SDP) followed by eigenvalue decomposition. Some numerical simulations are provided to illustrate the superior performance of our proposed over the state-of-the-art. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2110.12773 [pdf]

Scientific Machine Learning Benchmarks

Authors: Jeyan Thiyagalingam, Mallikarjun Shankar, Geoffrey Fox, Tony Hey

Abstract: The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training machines to identify patterns, trends, and anomalies… ▽ More The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training machines to identify patterns, trends, and anomalies to extract meaningful scientific insights from such datasets. With a new generation of experimental facilities, the rate of data generation and the scale of data volumes will increasingly require the use of more automated data analysis. At present, identifying the most appropriate machine learning algorithm for the analysis of any given scientific dataset is still a challenge for scientists. This is due to many different machine learning frameworks, computer architectures, and machine learning models. Historically, for modelling and simulation on HPC systems such problems have been addressed through benchmarking computer applications, algorithms, and architectures. Extending such a benchmarking approach and identifying metrics for the application of machine learning methods to scientific datasets is a new challenge for both scientists and computer scientists. In this paper, we describe our approach to the development of scientific machine learning benchmarks and review other approaches to benchmarking scientific machine learning. △ Less

Submitted 25 October, 2021; originally announced October 2021.

ACM Class: I.2

arXiv:2110.11466 [pdf, other]

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Authors: Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda , et al. (18 additional authors not shown)

Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning appli… ▽ More Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications driven by the MLCommons Association. We present the results from the first submission round, including a diverse set of some of the world's largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization, and communication scheduling, enabling overall $>10 \times$ (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system's memory hierarchy, and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior to parameterize extended roofline performance models in future rounds. △ Less

Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2106.11469 [pdf, other]

doi 10.1002/cpe.8019

Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis

Authors: Johannes P. Blaschke, Aaron S. Brewster, Daniel W. Paley, Derek Mendez, Asmit Bhowmick, Nicholas K. Sauter, Wilko Kröger, Murali Shankar, Bjoern Enders, Deborah Bard

Abstract: X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experi… ▽ More X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiment's operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends. △ Less

Submitted 31 December, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

arXiv:2104.04322 [pdf, ps, other]

Sparse Array Beampattern Synthesis via Majorization-Based ADMM

Authors: Tong Wei, Linlong Wu, M. R. Bhavani Shankar

Abstract: Beampattern synthesis is a key problem in many wireless applications. With the increasing scale of MIMO antenna array, it is highly desired to conduct beampattern synthesis on a sparse array to reduce the power and hardware cost. In this paper, we consider conducting beampattern synthesis and sparse array construction jointly. In the formulated problem, the beampattern synthesis is designed by min… ▽ More Beampattern synthesis is a key problem in many wireless applications. With the increasing scale of MIMO antenna array, it is highly desired to conduct beampattern synthesis on a sparse array to reduce the power and hardware cost. In this paper, we consider conducting beampattern synthesis and sparse array construction jointly. In the formulated problem, the beampattern synthesis is designed by minimizing the matching error to the beampattern template, and the Shannon entropy function is first introduced to impose the sparsity of the array. Then, for this nonconvex problem, an iterative method is proposed by leveraging on the alternating direction multiplier method (ADMM) and the majorization minimization (MM). Simulation results demonstrate that, compared with the benchmark, our approach achieves a good trade-off between array sparsity and beampattern matching error with less runtime. △ Less

Submitted 4 June, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2104.03303 [pdf, other]

Design of MIMO Radar Waveforms based on lp-Norm Criteria

Authors: Ehsan Raei, Mohammad Alaee-Kerahroodi, Prabhu Babu, M. R. Bhavani Shankar

Abstract: Multiple-input multiple-output (MIMO) radars transmit a set of sequences that exhibit small cross-correlation sidelobes, to enhance sensing performance by separating them at the matched filter outputs. The waveforms also require small auto-correlation sidelobes to avoid masking of weak targets by the range sidelobes of strong targets and to mitigate deleterious effects of distributed clutter. In l… ▽ More Multiple-input multiple-output (MIMO) radars transmit a set of sequences that exhibit small cross-correlation sidelobes, to enhance sensing performance by separating them at the matched filter outputs. The waveforms also require small auto-correlation sidelobes to avoid masking of weak targets by the range sidelobes of strong targets and to mitigate deleterious effects of distributed clutter. In light of these requirements, in this paper, we design a set of phase-only (constant modulus) sequences that exhibit near-optimal properties in terms of Peak Sidelobe Level (PSL) and Integrated Sidelobe Level (ISL). At the design stage, we adopt weighted lp-norm of auto- and cross-correlation sidelobes as the objective function and minimize it for a general p value, using block successive upper bound minimization (BSUM). Considering the limitation of radar amplifiers, we design unimodular sequences which make the design problem non-convex and NP-hard. To tackle the problem, in every iteration of the BSUM algorithm, we introduce different local approximation functions and optimize them concerning a block, containing a code entry or a code vector. The numerical results show that the performance of the optimized set of sequences outperforms the state-of-the-art counterparts, in both terms of PSL values and computational time. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2103.04851 [pdf, other]

doi 10.1109/TSP.2021.3082460

Spatial- and Range- ISLR Trade-off in MIMO Radar via Waveform Correlation Optimization

Authors: Ehsan Raei, Mohammad Alaee-Kerahrood, M. R. Bhavani Shankar

Abstract: This paper aims to design a set of transmitting waveforms in cognitive colocated Multi-Input Multi-Output (MIMO) radar systems considering the simultaneous minimization of spatial- and the range- Integrated Sidelobe Level Ratio (ISLR). The design problem is formulated as a bi-objective Pareto optimization under practical constraints on the waveforms, namely total transmit power, peak-to-average-po… ▽ More This paper aims to design a set of transmitting waveforms in cognitive colocated Multi-Input Multi-Output (MIMO) radar systems considering the simultaneous minimization of spatial- and the range- Integrated Sidelobe Level Ratio (ISLR). The design problem is formulated as a bi-objective Pareto optimization under practical constraints on the waveforms, namely total transmit power, peak-to-average-power ratio (PAR), constant modulus, and discrete phase alphabet. A Coordinate Descent (CD) based approach is proposed, in which at every single variable update of the algorithm we obtain the solution of the uni-variable optimization problems. The novelty of the paper comes from deriving a flexible waveform design problem applicable for 4D imaging MIMO radars which is optimized directly over the different constraint sets. The simultaneous optimization leads to a trade-off between the two ISLRs and the simulation results illustrate significantly improved trade-off offered by the proposed methodologies. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2012.14051 [pdf, other]

doi 10.1109/TSP.2021.3122290

On the Performance of One-Bit DoA Estimation via Sparse Linear Arrays

Authors: Saeid Sedighi, M. R. Bhavani Shankar, Mojtaba Soltanalian, Björn Ottersten

Abstract: Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable attention in array processing thanks to their capability to provide enhanced degrees of freedom in resolving uncorrelated source signals. Additionally, deployment of one-bit Analog-to-Digital Converters (ADCs) has emerged as an important topic in array processing, as it offers both a low-cost a… ▽ More Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable attention in array processing thanks to their capability to provide enhanced degrees of freedom in resolving uncorrelated source signals. Additionally, deployment of one-bit Analog-to-Digital Converters (ADCs) has emerged as an important topic in array processing, as it offers both a low-cost and a low-complexity implementation. In this paper, we study the problem of DoA estimation from one-bit measurements received by an SLA. Specifically, we first investigate the identifiability conditions for the DoA estimation problem from one-bit SLA data and establish an equivalency with the case when DoAs are estimated from infinite-bit unquantized measurements. Towards determining the performance limits of DoA estimation from one-bit quantized data, we derive a pessimistic approximation of the corresponding Cramér-Rao Bound (CRB). This pessimistic CRB is then used as a benchmark for assessing the performance of one-bit DoA estimators. We also propose a new algorithm for estimating DoAs from one-bit quantized data. We investigate the analytical performance of the proposed method through deriving a closed-form expression for the covariance matrix of the asymptotic distribution of the DoA estimation errors and show that it outperforms the existing algorithms in the literature. Numerical simulations are provided to validate the analytical derivations and corroborate the resulting performance improvement. △ Less

Submitted 20 October, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

Comments: 17 pages, 10 figures

arXiv:2012.09272 [pdf, other]

Data optimization for large batch distributed training of deep neural networks

Authors: Shubhankar Gahlot, Junqi Yin, Mallikarjun Shankar

Abstract: Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniq… ▽ More Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy. △ Less

Submitted 18 December, 2020; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: Computational Science & Computational Intelligence (CSCI'20), 7 pages

arXiv:2007.15108 [pdf, other]

doi 10.1109/TSP.2021.3072834

Localization with One-Bit Passive Radars in Narrowband Internet-of-Things using Multivariate Polynomial Optimization

Authors: Saeid Sedighi, Kumar Vijay Mishra, M. R. Bhavani Shankar, Björn Ottersten

Abstract: Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links… ▽ More Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links. We address both of these NB-IoT drawbacks in the framework of passive sensing devices that receive signals from the target-of-interest. We consider the limiting case where each node receiver employs one-bit analog-to-digital-converters and propose a novel low-complexity nodal delay estimation method using constrained-weighted least squares minimization. To support the low-capacity links to the fusion center (FC), the range estimates obtained at individual sensors are then converted to one-bit data. At the FC, we propose target localization with the aggregated one-bit range vector using both optimal and sub-optimal techniques. The computationally expensive former approach is based on Lasserre's method for multivariate polynomial optimization while the latter employs our less complex iterative joint r\textit{an}ge-\textit{tar}get location \textit{es}timation (ANTARES) algorithm. Our overall one-bit framework not only complements the low NB-IoT bandwidth but also supports the design goal of inexpensive NB-IoT location sensing. Numerical experiments demonstrate feasibility of the proposed one-bit approach with a $0.6$\% increase in the normalized localization error for the small set of $20$-$60$ nodes over the full-precision case. When the number of nodes is sufficiently large ($>80$), the one-bit methods yield the same performance as the full precision. △ Less

Submitted 9 April, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: 16 pages, 11 figures

arXiv:2004.03710 [pdf, other]

DataFed: Towards Reproducible Research via Federated Data Management

Authors: Dale Stansberry, Suhas Somnath, Jessica Breet, Gregory Shutt, Mallikarjun Shankar

Abstract: The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a li… ▽ More The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: Part of conference proceedings at the 6th Annual Conference on Computational Science & Computational Intelligence held at Las Vegas, NV, USA on Dec 05-07 2019

arXiv:1912.10036 [pdf, other]

A Family of Deep Learning Architectures for Channel Estimation and Hybrid Beamforming in Multi-Carrier mm-Wave Massive MIMO

Authors: Ahmet M. Elbir, Kumar Vijay Mishra, M. R. Bhavani Shankar, Björn Ottersten

Abstract: Hybrid analog and digital beamforming transceivers are instrumental in addressing the challenge of expensive hardware and high training overheads in the next generation millimeter-wave (mm-Wave) massive MIMO (multiple-input multiple-output) systems. However, lack of fully digital beamforming in hybrid architectures and short coherence times at mm-Wave impose additional constraints on the channel e… ▽ More Hybrid analog and digital beamforming transceivers are instrumental in addressing the challenge of expensive hardware and high training overheads in the next generation millimeter-wave (mm-Wave) massive MIMO (multiple-input multiple-output) systems. However, lack of fully digital beamforming in hybrid architectures and short coherence times at mm-Wave impose additional constraints on the channel estimation. Prior works on addressing these challenges have focused largely on narrowband channels wherein optimization-based or greedy algorithms were employed to derive hybrid beamformers. In this paper, we introduce a deep learning (DL) approach for channel estimation and hybrid beamforming for frequency-selective, wideband mm-Wave systems. In particular, we consider a massive MIMO Orthogonal Frequency Division Multiplexing (MIMO-OFDM) system and propose three different DL frameworks comprising convolutional neural networks (CNNs), which accept the raw data of received signal as input and yield channel estimates and the hybrid beamformers at the output. We also introduce both offline and online prediction schemes. Numerical experiments demonstrate that, compared to the current state-of-the-art optimization and DL methods, our approach provides higher spectral efficiency, lesser computational cost and fewer number of pilot signals, and higher tolerance against the deviations in the received pilot data, corrupted channel matrix, and propagation environment. △ Less

Submitted 3 January, 2022; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: Accepted Paper in IEEE Transactions on Cognitive Communications and Networking. arXiv admin note: text overlap with arXiv:1910.14240

arXiv:1909.00798 [pdf, other]

Dynamic Approach for Lane Detection using Google Street View and CNN

Authors: Rama Sai Mamidala, Uday Uthkota, Mahamkali Bhavani Shankar, A. Joseph Antony, A. V. Narasimhadhan

Abstract: Lane detection algorithms have been the key enablers for a fully-assistive and autonomous navigation systems. In this paper, a novel and pragmatic approach for lane detection is proposed using a convolutional neural network (CNN) model based on SegNet encoder-decoder architecture. The encoder block renders low-resolution feature maps of the input and the decoder block provides pixel-wise classific… ▽ More Lane detection algorithms have been the key enablers for a fully-assistive and autonomous navigation systems. In this paper, a novel and pragmatic approach for lane detection is proposed using a convolutional neural network (CNN) model based on SegNet encoder-decoder architecture. The encoder block renders low-resolution feature maps of the input and the decoder block provides pixel-wise classification from the feature maps. The proposed model has been trained over 2000 image data-set and tested against their corresponding ground-truth provided in the data-set for evaluation. To enable real-time navigation, we extend our model's predictions interfacing it with the existing Google APIs evaluating the metrics of the model tuning the hyper-parameters. The novelty of this approach lies in the integration of existing segNet architecture with google APIs. This interface makes it handy for assistive robotic systems. The observed results show that the proposed method is robust under challenging occlusion conditions due to pre-processing involved and gives superior performance when compared to the existing methods. △ Less

Submitted 2 September, 2019; originally announced September 2019.

Comments: Preprint: To be published in the proceedings of IEEE TENCON 2019

arXiv:1903.09515 [pdf]

USID and Pycroscopy -- Open frameworks for storing and analyzing spectroscopic and imaging data

Authors: Suhas Somnath, Chris R. Smith, Nouamane Laanait, Rama K. Vasudevan, Anton Ievlev, Alex Belianinov, Andrew R. Lupini, Mallikarjun Shankar, Sergei V. Kalinin, Stephen Jesse

Abstract: Materials science is undergoing profound changes due to advances in characterization instrumentation that have resulted in an explosion of data in terms of volume, velocity, variety and complexity. Harnessing these data for scientific research requires an evolution of the associated computing and data infrastructure, bridging scientific instrumentation with super- and cloud- computing. Here, we de… ▽ More Materials science is undergoing profound changes due to advances in characterization instrumentation that have resulted in an explosion of data in terms of volume, velocity, variety and complexity. Harnessing these data for scientific research requires an evolution of the associated computing and data infrastructure, bridging scientific instrumentation with super- and cloud- computing. Here, we describe Universal Spectroscopy and Imaging Data (USID), a data model capable of representing data from most common instruments, modalities, dimensionalities, and sizes. We pair this schema with the hierarchical data file format (HDF5) to maximize compatibility, exchangeability, traceability, and reproducibility. We discuss a family of community-driven, open-source, and free python software packages for storing, processing and visualizing data. The first is pyUSID which provides the tools to read and write USID HDF5 files in addition to a scalable framework for parallelizing data analysis. The second is Pycroscopy, which provides algorithms for scientific analysis of nanoscale imaging and spectroscopy modalities and is built on top of pyUSID and USID. The instrument-agnostic nature of USID facilitates the development of analysis code independent of instrumentation and task in Pycroscopy which in turn can bring scientific communities together and break down barriers in the age of open-science. The interested reader is encouraged to be a part of this ongoing community-driven effort to collectively accelerate materials research and discovery through the realms of big data. △ Less

Submitted 27 March, 2019; v1 submitted 22 March, 2019; originally announced March 2019.

arXiv:1811.02287 [pdf, other]

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

Authors: Drew Schmidt, Junqi Yin, Michael Matheson, Bronson Messer, Mallikarjun Shankar

Abstract: The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly, have been the primary customers of HPC for decades. However, there is a large and growing volume of data science work that requires these large scale resources,… ▽ More The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly, have been the primary customers of HPC for decades. However, there is a large and growing volume of data science work that requires these large scale resources, and as such the calls for inclusion and investments in data for HPC have been increasing. So when designing a next generation HPC platform, it is necessary to have HPC-amenable big data analytics benchmarks. In this paper, we propose a set of big data analytics benchmarks and sample codes designed for testing the capabilities of current and next generation supercomputers. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: 5 figures

arXiv:1802.03958 [pdf, other]

doi 10.1109/MSP.2019.2894391

Signal Processing for High Throughput Satellite Systems: Challenges in New Interference-Limited Scenarios

Authors: Ana I. Perez-Neira, Miguel Angel Vazquez, Sina Maleki, M. R. Bhavani Shankar, Symeon Chatzinotas

Abstract: The field of satellite communications is enjoying a renewed interest in the global telecom market, and very high throughput satellites (V/HTS), with their multiple spot-beams, are key for delivering the future rate demands. In this article, the state-of-the-art and open research challenges of signal processing techniques for V/HTS systems are presented for the first time, with focus on novel appro… ▽ More The field of satellite communications is enjoying a renewed interest in the global telecom market, and very high throughput satellites (V/HTS), with their multiple spot-beams, are key for delivering the future rate demands. In this article, the state-of-the-art and open research challenges of signal processing techniques for V/HTS systems are presented for the first time, with focus on novel approaches for efficient interference mitigation. The main signal processing topics for the ground, satellite, and user segment are addressed. Also, the critical components for the integration of satellite and terrestrial networks are studied, such as cognitive satellite systems and satellite-terrestrial backhaul for caching. All the reviewed techniques are essential in empowering satellite systems to support the increasing demands of the upcoming generation of communication networks. △ Less

Submitted 12 February, 2018; originally announced February 2018.

Showing 1–29 of 29 results for author: Shankar, M