Search | arXiv e-print repository

The Case for Co-Designing Model Architectures with Hardware

Authors: Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

Abstract: While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set… ▽ More While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39\% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes. △ Less

Submitted 30 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.08383 [pdf, other]

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Authors: **ghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Abstract: In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication required for expert routing and aggregation. T… ▽ More In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication required for expert routing and aggregation. This communication bottleneck exacerbates the already complex computational landscape, hindering the efficient utilization of high-performance computing resources. In this paper, we propose a lightweight optimization technique called ExFlow, to largely accelerate the inference of these MoE models. We take a new perspective on alleviating the communication overhead by exploiting the inter-layer expert affinity. Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation. By proposing a context-coherent expert parallelism on distributed systems, our design only uses one Alltoall communication to deliver the same functionality while previous methods all require two Alltoalls. By carefully examining the conditional probability in tokens' routing across multiple layers, we proved that pre-trained GPT MoE models implicitly exhibit a strong inter-layer expert affinity. We then design an efficient integer programming model to capture such features and show that by properly placing the experts on corresponding GPUs, we can reduce up to 67% cross-GPU routing latency. Our solution beats the cutting-edge MoE implementations with experts from 8 to 64, with up to 2.2x improvement in inference throughput. We further provide a detailed study of how the model implicitly acquires this expert affinity at the very early training stage and how this affinity evolves and stabilizes during training. △ Less

Submitted 16 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.01196 [pdf]

Broadband miniaturized spectrometers with a van der Waals tunnel diode

Authors: MD Gius Uddin, Susobhan Das, Abde Mayeen Shafi, Lei Wang, Xiaoqi Cui, Fedor Nigmatulin, Faisal Ahmed, Andreas C. Liapis, Weiwei Cai, Zongyin Yang, Harri Lipsanen, Tawfique Hasan, Hoon Hahn Yoon, Zhipei Sun

Abstract: Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report… ▽ More Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report a high-performance broadband spectrometer based on a simple and compact van der Waals heterostructure diode, leveraging a careful selection of active van der Waals materials -- molybdenum disulfide and black phosphorus, their electrically tunable photoresponse, and advanced computational algorithms for spectral reconstruction. We achieve remarkably high peak wavelength accuracy of ~2 nanometers, and broad operation bandwidth spanning from ~500 to 1600 nanometers in a device with a ~30x20 μm2 footprint. This diode-based spectrometer scheme with broadband operation offers an attractive pathway for various applications, such as sensing, surveillance and spectral imaging. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2305.13484 [pdf, other]

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Authors: **ghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Abstract: Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. This inherent characteristic severely impedes computational efficiency during i… ▽ More Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. This inherent characteristic severely impedes computational efficiency during inference as a typical inference request can require more than thousands of tokens, where generating each token requires a load of entire model weights, making the inference more memory-bound. The large overhead becomes profound in real deployment where requests arrive randomly, necessitating various generation lengths. Existing solutions, such as dynamic batching and concurrent instances, introduce significant response delays and bandwidth contention, falling short of achieving optimal latency and throughput. To address these shortcomings, we propose Flover -- a temporal fusion framework for efficiently inferring multiple requests in parallel. We deconstruct the general generation pipeline into pre-processing and token generation, and equip the framework with a dedicated work scheduler for fusing the generation process temporally across all requests. By orchestrating the token-level parallelism, Flover exhibits optimal hardware efficiency and significantly spares the system resources. By further employing a fast buffer reordering algorithm that allows memory eviction of finished tasks, it brings over 11x inference speedup on GPT and 16x on LLAMA compared to the cutting-edge solutions provided by NVIDIA FasterTransformer. Crucially, by leveraging the advanced tensor parallel technique, Flover proves efficacious across diverse computational landscapes, from single-GPU setups to distributed scenarios, thereby offering robust performance optimization that adapts to variable use cases. △ Less

Submitted 2 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: In Proceeding of 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

arXiv:2303.08374 [pdf, other]

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Authors: Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

Abstract: In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of co… ▽ More In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales. Examples of models using advanced parallelism strategies include Deep Learning Recommendation Models (DLRM) and Mixture-of-Experts (MoE). Communication libraries' performance varies wildly across different communication operations, scales, and message sizes. We propose MCR-DL: an extensible DL communication framework that supports all point-to-point and collective operations while enabling users to dynamically mix-and-match communication backends for a given operation without deadlocks. MCR-DL also comes packaged with a tuning suite for dynamically selecting the best communication backend for a given input tensor. We select DeepSpeed-MoE and DLRM as candidate DL models and demonstrate a 31% improvement in DS-MoE throughput on 256 V100 GPUs on the Lassen HPC system. Further, we achieve a 20% throughput improvement in a dense Megatron-DeepSpeed model and a 25% throughput improvement in DLRM on 32 A100 GPUs with the Theta-GPU HPC system. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted, to be presented at IPDPS 2023

arXiv:2303.05016 [pdf, other]

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

Authors: Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K., Panda

Abstract: Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance characterization of the benefits of using quantization techniques -- mainly FP16/INT8 variants with static and dynamic schemes -- using the MLPerf Edge Infer… ▽ More Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance characterization of the benefits of using quantization techniques -- mainly FP16/INT8 variants with static and dynamic schemes -- using the MLPerf Edge Inference benchmarking methodology. The study is conducted on Intel x86 processors and Raspberry Pi device with ARM processor. The paper uses a number of DNN inference frameworks, including OpenVINO (for Intel CPUs only), TensorFlow Lite (TFLite), ONNX, and PyTorch with MobileNetV2, VGG-19, and DenseNet-121. The single-stream, multi-stream, and offline scenarios of the MLPerf Edge Inference benchmarks are used for measuring latency and throughput in our experiments. Our evaluation reveals that OpenVINO and TFLite are the most optimized frameworks for Intel CPUs and Raspberry Pi device, respectively. We observe no loss in accuracy except for the static quantization techniques. We also observed the benefits of using quantization for these optimized frameworks. For example, INT8-based quantized models deliver $3.3\times$ and $4\times$ better performance over FP32 using OpenVINO on Intel CPU and TFLite on Raspberry Pi device, respectively, for the MLPerf offline scenario. To the best of our knowledge, this paper is the first one that presents a unique characterization study characterizing the impact of quantization for a range of DNN inference frameworks -- including OpenVINO, TFLite, PyTorch, and ONNX -- on Intel x86 processors and Raspberry Pi device with ARM processor using the MLPerf Edge Inference benchmark methodology. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Extended version of accepted short paper by ICFEC 2023

arXiv:2110.10659 [pdf, other]

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Authors: Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K Panda

Abstract: Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Effici… ▽ More Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py -- and Python MPI codes in general -- on modern HPC systems. In order to bridge this gap, we propose OMB-Py -- Python extensions to the open-source OSU Micro-Benchmark (OMB) suite -- aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We plan to publicly release OMB-Py to benefit the Python HPC community. △ Less

Submitted 24 August, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

arXiv:2101.08878 [pdf, other]

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Authors: Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda

Abstract: Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks using UCX-Py -- a Cython wrapper… ▽ More Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks using UCX-Py -- a Cython wrapper to UCX. This paper presents the design and implementation of a new communication backend for Dask -- called MPI4Dask -- that is targeted for modern HPC clusters built with GPUs. MPI4Dask exploits mpi4py over MVAPICH2-GDR, which is a GPU-aware implementation of the Message Passing Interface (MPI) standard. MPI4Dask provides point-to-point asynchronous I/O communication coroutines, which are non-blocking concurrent operations defined using the async/await keywords from the Python's asyncio framework. Our latency and throughput comparisons suggest that MPI4Dask outperforms UCX by 6x for 1 Byte message and 4x for large messages (2 MBytes and beyond) respectively. We also conduct comparative performance evaluation of MPI4Dask with UCX using two benchmark applications: 1) sum of cuPy array with its transpose, and 2) cuDF merge. MPI4Dask speeds up the overall execution time of the two applications by an average of 3.47x and 3.11x respectively on an in-house cluster built with NVIDIA Tesla V100 GPUs for 1-6 Dask workers. We also perform scalability analysis of MPI4Dask against UCX for these applications on TACC's Frontera (GPU) system with upto 32 Dask workers on 32 NVIDIA Quadro RTX 5000 GPUs and 256 CPU cores. MPI4Dask speeds up the execution time for cuPy and cuDF applications by an average of 1.71x and 2.91x respectively for 1-32 Dask workers on the Frontera (GPU) system. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: 10 pages, 9 figures, 1 table

ACM Class: C.4; D.1.3

arXiv:1811.02402 [pdf]

Realisation of Highly Precise and Low Power Tunable Voltage Amplifier Based on the Translinear Circuit Scheme of CCCII+

Authors: Umar Mohammad, Mir Aamir Shafi

Abstract: In the past few years, advancements in the field of nano circuit design has become tougher than the demand. Low power devices have emerged tremendously.Both voltage mode aswell as current mode devices have proven alternative to each other for satisfying the demand of the growing market. As such, current conveyors have equitably established their uniqueness as an important circuit design element. T… ▽ More In the past few years, advancements in the field of nano circuit design has become tougher than the demand. Low power devices have emerged tremendously.Both voltage mode aswell as current mode devices have proven alternative to each other for satisfying the demand of the growing market. As such, current conveyors have equitably established their uniqueness as an important circuit design element. The literature available to us during the few years in the field of analog VLSI design, quotes a huge number of application elements based on current conveyors. Likely, in this paper, a new tunable low power voltage amplifier based on the translinear circuit scheme of second generation current controlled current conveyor has been proposed. The modeling of the circuit presented in this paper employs the minimum number of passive elements. The magnitude of the tuning or the amplitude of the voltage presented here, is being controlled by means of two variable resistors. Current conveyor second generation translinear circuit scheme is taken into consideration to implement the proposed tunable voltage amplifier. CCCII works on the outlines of low power and low voltage design. Tunable voltage amplifiers find use in analog as well as in digital signal processing applications. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Comments: 7 pages Under revisions in Indonesian journal

arXiv:1410.0373 [pdf, other]

Teaching Parallel Programming Using Java

Authors: Aamir Shafi, Aleem Akhtar, Ansar Javed, Bryan Carpenter

Abstract: This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle… ▽ More This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered parallel programming techniques for shared memory systems that include multicore and Symmetric Multi-Processor (SMP) systems. In this section, Java threads was taught as a viable programming API for such systems. The second section was dedicated to parallel programming tools meant for distributed memory systems including clusters and network of computers. We used MPJ Express-a Java MPI library-for conducting programming assignments and lab work for this section. The third and the final section covered advanced topics including the MapReduce programming model using Hadoop and the General Purpose Computing on Graphics Processing Units (GPGPU). △ Less

Submitted 27 August, 2014; originally announced October 2014.

Comments: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel Programming

ACM Class: K.3.2

arXiv:1408.6347 [pdf, other]

Design and Implementation of Parallel Debugger and Profiler for MPJ Express

Authors: Aleem Akhtar, Aamir Shafi, Mohsan Jameel

Abstract: MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. Despite its successful adoption in the Java HPC community, the MPJ Express software currently does not provide any support for debugging and profiling parallel applications and hence forces its users to rely on manual and tedious debug… ▽ More MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. Despite its successful adoption in the Java HPC community, the MPJ Express software currently does not provide any support for debugging and profiling parallel applications and hence forces its users to rely on manual and tedious debugging/profiling methods. Support for such tools is essential to help application developers increase their overall productivity. To address this we have developed debugging and profiling tools for MPJ Express, which are the main topic of this paper. Key design goals for these tools include: 1) maintain compatibility with existing logging, debugging, and visualizing tools, 2) build these tools by extending existing debugging/profiling tools instead of reinventing the wheel. The first tool, named MPJDebug, builds on the open-source Eclipse Integrated Development Environment (IDE). It provides an Eclipse-based plugin developed using the Eclipse Plugin Development Environment (PDE). The default Eclipse debugger currently does not support debugging parallel applications running on a compute cluster. The second tool, named MPJProf, is a utility based on Tuning and Analysis Utility (TAU)-an open-source performance evaluation tool. Our goal here is to exploit TAU to profile Java applications parallelized using MPJ Express by generating profiles and traces, which can later be visualized using existing tools like paraprof and Jumpshot. Towards the end of the paper, we quantify the overhead of using MPJProf, which we found to be negligible in the profiling stage of parallel application development. △ Less

Submitted 27 August, 2014; originally announced August 2014.

Comments: 6 pages, 7 figures

arXiv:1310.5848 [pdf]

Evaluation and Performance of Reactive Protocols Using Mobility Model

Authors: Naveed Anjum Imran Shafi, Sohail Abidi

Abstract: A Mobile Ad-hoc Network (MANET) is a self-motivated wireless network which has no centralized point. It is an independent network that is connected by wireless link so, in which every point or device work as a router. In this network every node forward the packets to the destination as a router and it's not operating as an ending point. In this network every node adjusts them self by on his way in… ▽ More A Mobile Ad-hoc Network (MANET) is a self-motivated wireless network which has no centralized point. It is an independent network that is connected by wireless link so, in which every point or device work as a router. In this network every node forward the packets to the destination as a router and it's not operating as an ending point. In this network every node adjusts them self by on his way in any direction because they are independent and change their position regularly. There are exist three main types of routing protocols which are reactive, proactive and final is hybrid protocols. This whole work compares the performance of some reactive protocols which also known as on - demand protocols, which are DSR, AODV and the final is AOMDV. DSR and AODV are reactive protocols which connected the devices on the network when needed by a doorway. The AOMDV protocol was designed for ad hoc networks whenever any route or link fail and also maintain routes with sequence numbers to avoid loo**. △ Less

Submitted 22 October, 2013; originally announced October 2013.

Comments: 10 pages, 14 figures, http://www.IJCSI.org

Journal ref: IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 3, No 1, May 2013

arXiv:0912.1653 [pdf, ps, other]

doi 10.1103/PhysRevC.82.015201

Measurement of $K^- p$ radiative capture to $γΛ$ and $γΣ^0$ for $p_{K^-}$ between 514 and 750 MeV/$c$

Authors: S. Prakhov, P. Vancraeyveld, N. Phaisangittisakul, B. M. K. Nefkens, V. Bekrenev, W. J. Briscoe, L. De Cruz, D. Isenhower, N. Knecht, A. Koulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, A. Marušić, S. McDonald, Z. Papandreou, D. Peaslee, J. W. Price, J. Ryckebusch, M. Sadler, A. Shafi, A. Starostin, H. M. Staudenmaier, I. I. Strakovsky , et al. (2 additional authors not shown)

Abstract: Differential cross sections for $K^-$ radiative capture in flight on the proton, leading to the $γΛ$ and $γΣ^0$ final states, have been measured at eight $K^-$ momenta between 514 and 750 MeV/$c$. The data were obtained with the Crystal Ball multiphoton spectrometer installed at the separated $K/π$ beam line C6 of the BNL Alternating Gradient Synchrotron. The results substantially improve the exis… ▽ More Differential cross sections for $K^-$ radiative capture in flight on the proton, leading to the $γΛ$ and $γΣ^0$ final states, have been measured at eight $K^-$ momenta between 514 and 750 MeV/$c$. The data were obtained with the Crystal Ball multiphoton spectrometer installed at the separated $K/π$ beam line C6 of the BNL Alternating Gradient Synchrotron. The results substantially improve the existing experimental data available for studying radiative decays of excited hyperon states. An exploratory theoretical analysis is performed within the Regge-plus-resonance approach. According to this analysis, the $γΣ^0$ final state is dominated by hyperonresonance exchange and hints at an important role for a resonance in the mass region of 1700 MeV. In the $γΛ$ final state, on the other hand, the resonant contributions account for only half the strength, and the data suggest the importance of a resonance in the mass region of 1550 MeV. △ Less

Submitted 8 June, 2010; v1 submitted 8 December, 2009; originally announced December 2009.

Journal ref: Phys.Rev.C82:015201,2010

arXiv:0908.3845 [pdf, ps, other]

doi 10.1103/PhysRevC.80.055207

Differential cross sections of the charge-exchange reaction pi- p --> pi0 n in the momentum range from 103 to 178 MeV/c

Authors: D. Mekterović, I. Supek, V. Abaev, V. Bekrenev, C. Bircher, W. J. Briscoe, R. V. Cadman, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhover, M. Jerkins, M. Joy, N. Knecht, D. D. Koetke, N. Kozlenko, A. Kulbardis, S. Kruglov, G. Lolos, I. Lopatin, D. M. Manley, R. Manweiler, A. Marušić, S. McDonald , et al. (18 additional authors not shown)

Abstract: Measured values of the differential cross sections for pion-nucleon charge exchange, pi- p --> pi0 n, are presented for pi- momenta of 103, 112, 120, 130, 139, 152, and 178 MeV/c. Complete angular distributions were obtained by using the Crystal Ball detector at the Alternating Gradient Synchrotron at Brookhaven National Laboratory. Statistical uncertainties of the differential cross sections va… ▽ More Measured values of the differential cross sections for pion-nucleon charge exchange, pi- p --> pi0 n, are presented for pi- momenta of 103, 112, 120, 130, 139, 152, and 178 MeV/c. Complete angular distributions were obtained by using the Crystal Ball detector at the Alternating Gradient Synchrotron at Brookhaven National Laboratory. Statistical uncertainties of the differential cross sections vary from 3% to 6% in the backward angle region, and from 6% to about 20% in the forward region with the exception of the two most forward angles. The systematic uncertainties are estimated to be about 3% for all momenta. △ Less

Submitted 26 August, 2009; originally announced August 2009.

Comments: 18 pages, 12 figures, submitted to Phys. Rev. C

Journal ref: Phys.Rev.C80:055207,2009

arXiv:0812.1888 [pdf, ps, other]

doi 10.1103/PhysRevC.80.025204

Measurement of $π^0 Λ$, $\bar{K}^0 n$, and $π^0 Σ^0$ production in $K^- p$ interactions for $p_{K^-}$ between 514 and 750 MeV/$c$

Authors: S. Prakhov, B. M. K. Nefkens, V. Bekrenev, W. J. Briscoe, N. Knecht, A. Koulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, A. Marušić, S. McDonald, D. Peaslee, N. Phaisangittisakul, J. W. Price, A. Shafi, A. Starostin, H. M. Staudenmaier, I. I. Strakovsky, I. Supek

Abstract: Differential cross sections and hyperon polarizations have been measured for $\bar{K}^0 n$, $π^0 Λ$, and $π^0 Σ^0$ production in $K^- p$ interactions at eight $K^-$ momenta between 514 and 750 MeV/$c$. The experiment detected the multiphoton final states with the Crystal Ball spectrometer using a $K^-$ beam from the Alternating Gradient Synchrotron of BNL. The results provide significantly great… ▽ More Differential cross sections and hyperon polarizations have been measured for $\bar{K}^0 n$, $π^0 Λ$, and $π^0 Σ^0$ production in $K^- p$ interactions at eight $K^-$ momenta between 514 and 750 MeV/$c$. The experiment detected the multiphoton final states with the Crystal Ball spectrometer using a $K^-$ beam from the Alternating Gradient Synchrotron of BNL. The results provide significantly greater precision than the existing data, allowing a detailed reexamination of the excited hyperon states in our energy range. △ Less

Submitted 26 June, 2009; v1 submitted 10 December, 2008; originally announced December 2008.

Journal ref: Phys.Rev.C80:025204,2009

arXiv:nucl-ex/0405026 [pdf, ps, other]

doi 10.1103/PhysRevC.70.035204

Measurement of Inverse Pion Photoproduction at Energies Spanning the N(1440) Resonance

Authors: A. Shafi, S. Prakhov, I. I. Strakovsky, W. J. Briscoe, B. M. K. Nefkens, C. E. Allgower, R. A. Arndt, V. Bekrenev, C. Bennhold, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhower, N. Knecht, D. D. Koetke, A. Kulbardis, N. Kozlenko, S. Kruglov, G. Lolos, I. Lopatin, D. M. Manley, R. Manweiler, A. Marusic, S. McDonald , et al. (14 additional authors not shown)

Abstract: Differential cross sections for the process pi^- p -> gamma n have been measured at Brookhaven National Laboratory's Alternating Gradient Synchrotron with the Crystal Ball multiphoton spectrometer. Measurements were made at 18 pion momenta from 238 to 748 MeV/c, corresponding to E_gamma for the inverse reaction from 285 to 769 MeV. The data have been used to evaluate the gamma n multipoles in th… ▽ More Differential cross sections for the process pi^- p -> gamma n have been measured at Brookhaven National Laboratory's Alternating Gradient Synchrotron with the Crystal Ball multiphoton spectrometer. Measurements were made at 18 pion momenta from 238 to 748 MeV/c, corresponding to E_gamma for the inverse reaction from 285 to 769 MeV. The data have been used to evaluate the gamma n multipoles in the vicinity of the N(1440) resonance. We compare our data and multipoles to previous determinations. A new three-parameter SAID fit yields 36 +/- 7 (GeV)^-1/2 X 10^-3 for the A^n_1/2 amplitude of the P_11. △ Less

Submitted 25 May, 2004; originally announced May 2004.

Comments: 14 pages, 8 figures, submitted to PRC

Journal ref: Phys.Rev.C70:035204,2004

arXiv:nucl-ex/0308005 [pdf, ps, other]

doi 10.1016/j.physletb.2004.02.070

Does the Sigma(1580)3/2- resonance exist?

Authors: Crystal Ball Collaboration, J. Olmsted, S. Prakhov, D. M. Manley, C. E. Allgower, V. S. Bekrenev, W. J. Briscoe, M. Clajus, J. R. Comfort, K. Craig, D. Grosnick, D. Isenhower, N. Knecht, D. D. Koetke, N. G. Kozlenko, S. Kruglov, A. A. Kulbardis, G. Lolos, I. V. Lopatin, R. Manweiler, A. Marusic, S. McDonald, B. M. K. Nefkens, Z. Papandreou, D. C. Peaslee , et al. (12 additional authors not shown)

Abstract: Precise new data for the reaction $K^- p \to π^0 Λ$ are presented in the c.m. energy range 1565 to 1600 MeV. Our analysis of these data sheds new light on claims for the $Σ(1580){3/2}^-$ resonance, which (if it exists with the specified quantum numbers) must be an exotic baryon because of its very low mass. Our results show no evidence for this state. Precise new data for the reaction $K^- p \to π^0 Λ$ are presented in the c.m. energy range 1565 to 1600 MeV. Our analysis of these data sheds new light on claims for the $Σ(1580){3/2}^-$ resonance, which (if it exists with the specified quantum numbers) must be an exotic baryon because of its very low mass. Our results show no evidence for this state. △ Less

Submitted 23 February, 2004; v1 submitted 6 August, 2003; originally announced August 2003.

Comments: 4 pages, 4 figures

Journal ref: Phys.Lett.B588:29-34,2004

arXiv:cs/0305062 [pdf]

DIAMOnDS - DIstributed Agents for MObile & Dynamic Services

Authors: Aamir Shafi, Umer Farooq, Saad Kiani, Maria Riaz, Anjum Shehzad, Arshad Ali, Iosif Legrand, Harvey Newman

Abstract: Distributed Services Architecture with support for mobile agents between services, offer significantly improved communication and computational flexibility. The uses of agents allow execution of complex operations that involve large amounts of data to be processed effectively using distributed resources. The prototype system Distributed Agents for Mobile and Dynamic Services (DIAMOnDS), allows a… ▽ More Distributed Services Architecture with support for mobile agents between services, offer significantly improved communication and computational flexibility. The uses of agents allow execution of complex operations that involve large amounts of data to be processed effectively using distributed resources. The prototype system Distributed Agents for Mobile and Dynamic Services (DIAMOnDS), allows a service to send agents on its behalf, to other services, to perform data manipulation and processing. Agents have been implemented as mobile services that are discovered using the **i Lookup mechanism and used by other services for task management and communication. Agents provide proxies for interaction with other services as well as specific GUI to monitor and control the agent activity. Thus agents acting on behalf of one service cooperate with other services to carry out a job, providing inter-operation of loosely coupled services in a semi-autonomous way. Remote file system access functionality has been incorporated by the agent framework and allows services to dynamically share and browse the file system resources of hosts, running the services. Generic database access functionality has been implemented in the mobile agent framework that allows performing complex data mining and processing operations efficiently in distributed system. A basic data searching agent is also implemented that performs a query based search in a file system. The testing of the framework was carried out on WAN by moving Connectivity Test agents between AgentStations in CERN, Switzerland and NUST, Pakistan. △ Less

Submitted 13 June, 2003; v1 submitted 30 May, 2003; originally announced May 2003.

Comments: 7 pages, 4 figures, CHEP03, La Jolla, California, March 24-28, 2003

ACM Class: C.2.4

Journal ref: ECONFC0303241:THAT003,2003

arXiv:nucl-ex/0302017 [pdf, ps, other]

Nucleon and Hyperon Resonances with the Crystal Ball

Authors: Crystal Ball Collaboration, W. J. Briscoe, A. Shafi, I. I. Strakovsky

Abstract: The Crystal Ball Spectrometer is being used at Brookhaven National Laboratory in a series of experiments which study all neutral final states of pi-p and K-p induced reactions. We report about the experimental setup and progress in obtaining new results for the radiative capture reactions pi-p-->gn and K-p-->gL,charge exchange pi-p-->pi0n,two pi0 production pi-p-->pi0pi0n, and eta production pi-… ▽ More The Crystal Ball Spectrometer is being used at Brookhaven National Laboratory in a series of experiments which study all neutral final states of pi-p and K-p induced reactions. We report about the experimental setup and progress in obtaining new results for the radiative capture reactions pi-p-->gn and K-p-->gL,charge exchange pi-p-->pi0n,two pi0 production pi-p-->pi0pi0n, and eta production pi-p-->eta n reactions. Data have also been obtained on the decays of N*, Delta, Lambda, and Sigma resonances. Threshold eta production has been studied in detail for both pi-p and K-p. Sequential resonance decays have been investigated bystudying the 2pi0 production mechanism both in the fundamental interaction and in nuclei. In addition, we have used the etas produced near threshold to make precision measurements searching in particular for rare and forbidden eta decays. △ Less

Submitted 14 February, 2003; originally announced February 2003.

Comments: 10 pages, 4 figures. Talk given at N*2002 conference

Showing 1–19 of 19 results for author: Shafi, A