Skip to main content

Showing 1–19 of 19 results for author: Shafi, A

.
  1. arXiv:2401.14489  [pdf, other

    cs.DC cs.AI

    The Case for Co-Designing Model Architectures with Hardware

    Authors: Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

    Abstract: While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set… ▽ More

    Submitted 30 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  2. arXiv:2401.08383  [pdf, other

    cs.LG cs.AI cs.DC

    Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

    Authors: **ghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

    Abstract: In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication required for expert routing and aggregation. T… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  3. arXiv:2401.01196  [pdf

    physics.optics

    Broadband miniaturized spectrometers with a van der Waals tunnel diode

    Authors: MD Gius Uddin, Susobhan Das, Abde Mayeen Shafi, Lei Wang, Xiaoqi Cui, Fedor Nigmatulin, Faisal Ahmed, Andreas C. Liapis, Weiwei Cai, Zongyin Yang, Harri Lipsanen, Tawfique Hasan, Hoon Hahn Yoon, Zhipei Sun

    Abstract: Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  4. arXiv:2305.13484  [pdf, other

    cs.DC cs.AI cs.CL cs.CV cs.LG

    Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

    Authors: **ghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

    Abstract: Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. This inherent characteristic severely impedes computational efficiency during i… ▽ More

    Submitted 2 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: In Proceeding of 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

  5. arXiv:2303.08374  [pdf, other

    cs.DC cs.LG

    MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

    Authors: Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

    Abstract: In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of co… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted, to be presented at IPDPS 2023

  6. arXiv:2303.05016  [pdf, other

    cs.PF eess.SP

    Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

    Authors: Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K., Panda

    Abstract: Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance characterization of the benefits of using quantization techniques -- mainly FP16/INT8 variants with static and dynamic schemes -- using the MLPerf Edge Infer… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Extended version of accepted short paper by ICFEC 2023

  7. arXiv:2110.10659  [pdf, other

    cs.DC cs.AI cs.LG

    OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

    Authors: Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K Panda

    Abstract: Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Effici… ▽ More

    Submitted 24 August, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

  8. arXiv:2101.08878  [pdf, other

    cs.DC cs.LG cs.PF

    Efficient MPI-based Communication for GPU-Accelerated Dask Applications

    Authors: Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks using UCX-Py -- a Cython wrapper… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: 10 pages, 9 figures, 1 table

    ACM Class: C.4; D.1.3

  9. arXiv:1811.02402  [pdf

    eess.SP

    Realisation of Highly Precise and Low Power Tunable Voltage Amplifier Based on the Translinear Circuit Scheme of CCCII+

    Authors: Umar Mohammad, Mir Aamir Shafi

    Abstract: In the past few years, advancements in the field of nano circuit design has become tougher than the demand. Low power devices have emerged tremendously.Both voltage mode aswell as current mode devices have proven alternative to each other for satisfying the demand of the growing market. As such, current conveyors have equitably established their uniqueness as an important circuit design element. T… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: 7 pages Under revisions in Indonesian journal

  10. arXiv:1410.0373  [pdf, other

    cs.CY cs.DC

    Teaching Parallel Programming Using Java

    Authors: Aamir Shafi, Aleem Akhtar, Ansar Javed, Bryan Carpenter

    Abstract: This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle… ▽ More

    Submitted 27 August, 2014; originally announced October 2014.

    Comments: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel Programming

    ACM Class: K.3.2

  11. arXiv:1408.6347  [pdf, other

    cs.DC cs.SE

    Design and Implementation of Parallel Debugger and Profiler for MPJ Express

    Authors: Aleem Akhtar, Aamir Shafi, Mohsan Jameel

    Abstract: MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. Despite its successful adoption in the Java HPC community, the MPJ Express software currently does not provide any support for debugging and profiling parallel applications and hence forces its users to rely on manual and tedious debug… ▽ More

    Submitted 27 August, 2014; originally announced August 2014.

    Comments: 6 pages, 7 figures

  12. arXiv:1310.5848  [pdf

    cs.NI

    Evaluation and Performance of Reactive Protocols Using Mobility Model

    Authors: Naveed Anjum Imran Shafi, Sohail Abidi

    Abstract: A Mobile Ad-hoc Network (MANET) is a self-motivated wireless network which has no centralized point. It is an independent network that is connected by wireless link so, in which every point or device work as a router. In this network every node forward the packets to the destination as a router and it's not operating as an ending point. In this network every node adjusts them self by on his way in… ▽ More

    Submitted 22 October, 2013; originally announced October 2013.

    Comments: 10 pages, 14 figures, http://www.IJCSI.org

    Journal ref: IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 3, No 1, May 2013

  13. Measurement of $K^- p$ radiative capture to $γΛ$ and $γΣ^0$ for $p_{K^-}$ between 514 and 750 MeV/$c$

    Authors: S. Prakhov, P. Vancraeyveld, N. Phaisangittisakul, B. M. K. Nefkens, V. Bekrenev, W. J. Briscoe, L. De Cruz, D. Isenhower, N. Knecht, A. Koulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, A. Marušić, S. McDonald, Z. Papandreou, D. Peaslee, J. W. Price, J. Ryckebusch, M. Sadler, A. Shafi, A. Starostin, H. M. Staudenmaier, I. I. Strakovsky , et al. (2 additional authors not shown)

    Abstract: Differential cross sections for $K^-$ radiative capture in flight on the proton, leading to the $γΛ$ and $γΣ^0$ final states, have been measured at eight $K^-$ momenta between 514 and 750 MeV/$c$. The data were obtained with the Crystal Ball multiphoton spectrometer installed at the separated $K/π$ beam line C6 of the BNL Alternating Gradient Synchrotron. The results substantially improve the exis… ▽ More

    Submitted 8 June, 2010; v1 submitted 8 December, 2009; originally announced December 2009.

    Journal ref: Phys.Rev.C82:015201,2010

  14. Differential cross sections of the charge-exchange reaction pi- p --> pi0 n in the momentum range from 103 to 178 MeV/c

    Authors: D. Mekterović, I. Supek, V. Abaev, V. Bekrenev, C. Bircher, W. J. Briscoe, R. V. Cadman, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhover, M. Jerkins, M. Joy, N. Knecht, D. D. Koetke, N. Kozlenko, A. Kulbardis, S. Kruglov, G. Lolos, I. Lopatin, D. M. Manley, R. Manweiler, A. Marušić, S. McDonald , et al. (18 additional authors not shown)

    Abstract: Measured values of the differential cross sections for pion-nucleon charge exchange, pi- p --> pi0 n, are presented for pi- momenta of 103, 112, 120, 130, 139, 152, and 178 MeV/c. Complete angular distributions were obtained by using the Crystal Ball detector at the Alternating Gradient Synchrotron at Brookhaven National Laboratory. Statistical uncertainties of the differential cross sections va… ▽ More

    Submitted 26 August, 2009; originally announced August 2009.

    Comments: 18 pages, 12 figures, submitted to Phys. Rev. C

    Journal ref: Phys.Rev.C80:055207,2009

  15. Measurement of $π^0 Λ$, $\bar{K}^0 n$, and $π^0 Σ^0$ production in $K^- p$ interactions for $p_{K^-}$ between 514 and 750 MeV/$c$

    Authors: S. Prakhov, B. M. K. Nefkens, V. Bekrenev, W. J. Briscoe, N. Knecht, A. Koulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, A. Marušić, S. McDonald, D. Peaslee, N. Phaisangittisakul, J. W. Price, A. Shafi, A. Starostin, H. M. Staudenmaier, I. I. Strakovsky, I. Supek

    Abstract: Differential cross sections and hyperon polarizations have been measured for $\bar{K}^0 n$, $π^0 Λ$, and $π^0 Σ^0$ production in $K^- p$ interactions at eight $K^-$ momenta between 514 and 750 MeV/$c$. The experiment detected the multiphoton final states with the Crystal Ball spectrometer using a $K^-$ beam from the Alternating Gradient Synchrotron of BNL. The results provide significantly great… ▽ More

    Submitted 26 June, 2009; v1 submitted 10 December, 2008; originally announced December 2008.

    Journal ref: Phys.Rev.C80:025204,2009

  16. Measurement of Inverse Pion Photoproduction at Energies Spanning the N(1440) Resonance

    Authors: A. Shafi, S. Prakhov, I. I. Strakovsky, W. J. Briscoe, B. M. K. Nefkens, C. E. Allgower, R. A. Arndt, V. Bekrenev, C. Bennhold, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhower, N. Knecht, D. D. Koetke, A. Kulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, D. M. Manley, R. Manweiler, A. Marusic, S. McDonald , et al. (14 additional authors not shown)

    Abstract: Differential cross sections for the process pi^- p -> gamma n have been measured at Brookhaven National Laboratory's Alternating Gradient Synchrotron with the Crystal Ball multiphoton spectrometer. Measurements were made at 18 pion momenta from 238 to 748 MeV/c, corresponding to E_gamma for the inverse reaction from 285 to 769 MeV. The data have been used to evaluate the gamma n multipoles in th… ▽ More

    Submitted 25 May, 2004; originally announced May 2004.

    Comments: 14 pages, 8 figures, submitted to PRC

    Journal ref: Phys.Rev.C70:035204,2004

  17. Does the Sigma(1580)3/2- resonance exist?

    Authors: Crystal Ball Collaboration, J. Olmsted, S. Prakhov, D. M. Manley, C. E. Allgower, V. S. Bekrenev, W. J. Briscoe, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhower, N. Knecht, D. D. Koetke, N. G. Kozlenko, S. Kruglov, A. A. Kulbardis, G. Lolos, I. V. Lopatin, R. Manweiler, A. Marusic, S. McDonald, B. M. K. Nefkens, Z. Papandreou, D. C. Peaslee , et al. (12 additional authors not shown)

    Abstract: Precise new data for the reaction $K^- p \to π^0 Λ$ are presented in the c.m. energy range 1565 to 1600 MeV. Our analysis of these data sheds new light on claims for the $Σ(1580){3/2}^-$ resonance, which (if it exists with the specified quantum numbers) must be an exotic baryon because of its very low mass. Our results show no evidence for this state.

    Submitted 23 February, 2004; v1 submitted 6 August, 2003; originally announced August 2003.

    Comments: 4 pages, 4 figures

    Journal ref: Phys.Lett.B588:29-34,2004

  18. arXiv:cs/0305062  [pdf

    cs.DC

    DIAMOnDS - DIstributed Agents for MObile & Dynamic Services

    Authors: Aamir Shafi, Umer Farooq, Saad Kiani, Maria Riaz, Anjum Shehzad, Arshad Ali, Iosif Legrand, Harvey Newman

    Abstract: Distributed Services Architecture with support for mobile agents between services, offer significantly improved communication and computational flexibility. The uses of agents allow execution of complex operations that involve large amounts of data to be processed effectively using distributed resources. The prototype system Distributed Agents for Mobile and Dynamic Services (DIAMOnDS), allows a… ▽ More

    Submitted 13 June, 2003; v1 submitted 30 May, 2003; originally announced May 2003.

    Comments: 7 pages, 4 figures, CHEP03, La Jolla, California, March 24-28, 2003

    ACM Class: C.2.4

    Journal ref: ECONFC0303241:THAT003,2003

  19. arXiv:nucl-ex/0302017  [pdf, ps, other

    nucl-ex

    Nucleon and Hyperon Resonances with the Crystal Ball

    Authors: Crystal Ball Collaboration, W. J. Briscoe, A. Shafi, I. I. Strakovsky

    Abstract: The Crystal Ball Spectrometer is being used at Brookhaven National Laboratory in a series of experiments which study all neutral final states of pi-p and K-p induced reactions. We report about the experimental setup and progress in obtaining new results for the radiative capture reactions pi-p-->gn and K-p-->gL,charge exchange pi-p-->pi0n,two pi0 production pi-p-->pi0pi0n, and eta production pi-… ▽ More

    Submitted 14 February, 2003; originally announced February 2003.

    Comments: 10 pages, 4 figures. Talk given at N*2002 conference