-
Nonclassical Light in a Three-Waveguide Coupler with Second-Order Nonlinearity
Authors:
Mohd Syafiq M. Hanapi,
Abdel-Baset M. A. Ibrahim,
Rafael Julius,
Pankaj K. Choudhury,
Hichem Eleuch
Abstract:
Possible squeezed states generated in a three-waveguide nonlinear coupler operating with second harmonic generation is discussed. This study is carried out using two well-known techniques; the phase space method (based on positive P-representation) and the Heisenberg-based analytical perturbative method. The effect of the key design parameters is analyzed for both codirectional and contra-directio…
▽ More
Possible squeezed states generated in a three-waveguide nonlinear coupler operating with second harmonic generation is discussed. This study is carried out using two well-known techniques; the phase space method (based on positive P-representation) and the Heisenberg-based analytical perturbative method. The effect of the key design parameters is analyzed for both codirectional and contra-directional propagation. The optimal degree of feasible squeezing is identified. Also, the performance and capacities of both methods are critically evaluated. For low levels of key design parameters and in the early stages of evolution, a high level of agreement between the two methods is noticed. In the new era of quantum-based technology, the proposed system opens a new avenue for utilising nonlinear couplers in nonclassical light generation.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
Authors:
Mohamed Assem Ibrahim,
Mahzabeen Islam,
Shaizeen Aga
Abstract:
With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving considerable attention. A challenge with GEMVs is the high memory bandwidth this primitive demands. Multiple memory vendors have proposed commercially viable processing-in-memory (PIM) prototypes that attain bandwidth boost o…
▽ More
With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving considerable attention. A challenge with GEMVs is the high memory bandwidth this primitive demands. Multiple memory vendors have proposed commercially viable processing-in-memory (PIM) prototypes that attain bandwidth boost over processor via augmenting memory banks with compute capabilities and broadcasting same command to all banks. While proposed PIM designs stand to accelerate GEMV, we observe in this work that a key impediment to truly harness PIM acceleration is deducing optimal data-placement to place the matrix in memory banks. To this end, we tease out several factors that impact data-placement and propose PIMnast methodology which, like a gymnast, balances these factors to identify data-placements that deliver GEMV acceleration. Across a spectrum of GenAI models, our proposed PIMnast methodology along with additional orchestration knobs we identify delivers up to 6.86$\times$ speedup for GEMVs (of the available 7$\times$ roofline speedup) leading to up to 5$\times$ speedup for per-token latencies.
△ Less
Submitted 1 April, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
ML-based Real-Time Control at the Edge: An Approach Using hls4ml
Authors:
R. Shi,
S. Ogrenci,
J. M. Arnold,
J. R. Berlioz,
P. Hanlet,
K. J. Hazelwood,
M. A. Ibrahim,
H. Liu,
V. P. Nagaslaev,
A. Narayanan 1,
D. J. Nicklaus,
J. Mitrevski,
G. Pradhan,
A. L. Saewert,
B. A. Schupbach,
K. Seiya,
M. Thieme,
R. M. Thurman-Keup,
N. V. Tran
Abstract:
This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data…
▽ More
This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data from these sensors is monitored by human operators who predict the relative contribution of different sub-systems to the beam loss. Using this information, they engage control interventions. In this paper, we present a controller to track this phenomenon in real-time using edge-Machine Learning (ML) and support control with low latency and high accuracy. We implemented this system on an Intel Arria 10 SoC. Optimizations at the algorithm, high-level synthesis, and interface levels to improve latency and resource usage are presented. Our design implements a neural network, which can predict the main source of beam loss (between two possible causes) at speeds up to 575 frames per second (fps) (average latency of 1.74 ms). The practical deployed system is required to operate at 320 fps, with a 3ms latency requirement, which has been met by our design successfully.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
Authors:
Mohamed Assem Ibrahim,
Shaizeen Aga,
Ada Li,
Suchita Pati,
Mahzabeen Islam
Abstract:
Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-precision during training. Furthermore, with emerging directional data formats (e.g., MX9, MX6, etc.) multiple low-precision weight copies can be requi…
▽ More
Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-precision during training. Furthermore, with emerging directional data formats (e.g., MX9, MX6, etc.) multiple low-precision weight copies can be required. To lower memory capacity needs of weights, we explore just-in-time quantization (JIT-Q) where we only store high-precision weights in memory and generate low-precision weights only when needed. To perform JIT-Q efficiently, in this work, we evaluate emerging processing-in-memory (PIM) technology to execute quantization. With PIM, we can offload quantization to in-memory compute units enabling quantization to be performed without incurring costly data movement while allowing quantization to be concurrent with accelerator computation. Our proposed PIM-offloaded quantization keeps up with GPU compute and delivers considerable capacity savings (up to 24\%) at marginal throughput loss (up to 2.4\%). Said memory capacity savings can unlock several benefits such as fitting larger model in the same system, reducing model parallelism requirement, and improving overall ML training efficiency.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Spectroscopic constants from atomic properties: a machine learning approach
Authors:
Mahmoud A. E. Ibrahim,
X. LiU,
J. Pérez-Ríos
Abstract:
We present a machine-learning approach toward predicting spectroscopic constants based on atomic properties. After collecting spectroscopic information on diatomics and generating an extensive database, we employ Gaussian process regression to identify the most efficient characterization of molecules to predict the equilibrium distance, vibrational harmonic frequency, and dissociation energy. As a…
▽ More
We present a machine-learning approach toward predicting spectroscopic constants based on atomic properties. After collecting spectroscopic information on diatomics and generating an extensive database, we employ Gaussian process regression to identify the most efficient characterization of molecules to predict the equilibrium distance, vibrational harmonic frequency, and dissociation energy. As a result, we show that it is possible to predict the equilibrium distance with an absolute error of 0.04 Å and vibrational harmonic frequency with an absolute error of 36 $\text{cm}^{-1}$, including only atomic properties. These results can be improved by including prior information on molecular properties leading to an absolute error of 0.02 Å and 28 $\text{cm}^{-1}$ for the equilibrium distance and vibrational harmonic frequency, respectively. In contrast, the dissociation energy is predicted with an absolute error $\lesssim 0.4$ eV. Alongside these results, we prove that it is possible to predict spectroscopic constants of homonuclear molecules from the atomic and molecular properties of heteronuclear. Finally, based on our results, we present a new way to classify diatomic molecules beyond chemical bond properties.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures
Authors:
Mohamed Assem Ibrahim,
Shaizeen Aga
Abstract:
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT.…
▽ More
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT. To this end, we first deduce a map** of FFT computation to a strawman PIM architecture representative of recent commercial designs. We observe that even with careful data map**, PIM is not effective in accelerating FFT. To address this, we make a case for collaborative acceleration of FFT with PIM and GPU. Further, we propose software and hardware innovations which lower PIM operations necessary for a given FFT. Overall, our optimized PIM FFT map**, termed Pimacolaba, delivers performance and data movement savings of up to 1.38$\times$ and 2.76$\times$, respectively, over a range of FFT sizes.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Synchronous High-frequency Distributed Readout For Edge Processing At The Fermilab Main Injector And Recycler
Authors:
J. R. Berlioz,
M. R. Austin,
J. M. Arnold,
K. J. Hazelwood,
P. Hanlet,
M. A. Ibrahim,
A. Narayanan,
D. J. Nicklaus,
G. Praudhan,
A. L. Saewert,
B. A. Schupbach,
K. Seiya,
R. M. Thurman-Keup,
N. V. Tran,
J. Jang,
H. Liu,
S. Memik,
R. Shi,
M. Thieme,
D. Ulusel
Abstract:
The Main Injector (MI) was commissioned using data acquisition systems developed for the Fermilab Main Ring in the 1980s. New VME-based instrumentation was commissioned in 2006 for beam loss monitors (BLM)[2], which provided a more systematic study of the machine and improved displays of routine operation. However, current projects are demanding more data and at a faster rate from this aging hardw…
▽ More
The Main Injector (MI) was commissioned using data acquisition systems developed for the Fermilab Main Ring in the 1980s. New VME-based instrumentation was commissioned in 2006 for beam loss monitors (BLM)[2], which provided a more systematic study of the machine and improved displays of routine operation. However, current projects are demanding more data and at a faster rate from this aging hardware. One such project, Real-time Edge AI for Distributed Systems (READS), requires the high-frequency, low-latency collection of synchronized BLM readings from around the approximately two-mile accelerator complex. Significant work has been done to develop new hardware to monitor the VME backplane and broadcast BLM measurements over Ethernet, while not disrupting the existing operations critical functions of the BLM system. This paper will detail the design, implementation, and testing of this parallel data pathway.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Accelerator Real-time Edge AI for Distributed Systems (READS) Proposal
Authors:
K. Seiya,
K. J. Hazelwood,
M. A. Ibrahim,
V. P. Nagaslaev,
D. J. Nicklaus,
B. A. Schupbach,
R. M. Thurman-Keup,
N. V. Tran,
H. Liu,
S. Memik
Abstract:
Our objective will be to integrate ML into Fermilab accelerator operations and furthermore provide an accessible framework which can also be used by a broad range of other accelerator systems with dynamic tuning needs. We will develop of real-time accelerator control using embedded ML on-chip hardware and fast communication between distributed systems in this proposal. We will demonstrate this tec…
▽ More
Our objective will be to integrate ML into Fermilab accelerator operations and furthermore provide an accessible framework which can also be used by a broad range of other accelerator systems with dynamic tuning needs. We will develop of real-time accelerator control using embedded ML on-chip hardware and fast communication between distributed systems in this proposal. We will demonstrate this technology for the Mu2e experiment by increasing the overall duty factor and uptime of the experiment through two synergistic projects. First, we will use deep reinforcement learning techniques to improve the performance of the regulation loop through guided optimization to provide stable proton beams extracted from the Delivery Ring to the Mu2e experiment. This requires the development of a digital twin of the system to model the accelerator and develop real-time ML algorithms. Second, we will use de-blending techniques to disentangle and classify overlap** beam losses in the Main Injector and Recycler Ring to reduce overall beam downtime in each machine. This ML model will be deployed within a semi-autonomous operational mode. Both applications require processing at the millisecond scale and will share similar ML-in-hardware techniques and beam instrumentation readout technology. A collaboration between Fermilab and Northwestern University will pull together the talents and resources of accelerator physicists, beam instrumentation engineers, embedded system architects, FPGA board design experts, and ML experts to solve complex real-time accelerator controls challenges which will enhance the physics program. More broadly, the framework developed for Accelerator Real-time Edge AI Distributed Systems (READS) can be applied to future projects as the accelerator complex is upgraded for the PIP-II and DUNE era.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Reflective Parametric Frequency Selective Limiters with sub-dB Loss and $μ$Watts Power Thresholds
Authors:
Hussein M. E. Hussein,
Mahmoud A. A. Ibrahim,
Matteo Rinaldi,
Marvin Onabajo,
Cristian Cassella
Abstract:
This article describes the design methodology to achieve reflective diode-based parametric frequency selective limiters (pFSLs) with low power thresholds ($P_{th}$) and sub-dB insertion-loss values ($IL^{s.s}$) for driving power levels ($P_{in}$) lower than $P_{th}$. In addition, we present the measured performance of a reflective pFSL designed through the discussed methodology and assembled on a…
▽ More
This article describes the design methodology to achieve reflective diode-based parametric frequency selective limiters (pFSLs) with low power thresholds ($P_{th}$) and sub-dB insertion-loss values ($IL^{s.s}$) for driving power levels ($P_{in}$) lower than $P_{th}$. In addition, we present the measured performance of a reflective pFSL designed through the discussed methodology and assembled on a FR-4 printed circuit board (PCB). Thanks to its optimally engineered dynamics, the built pFSL can operate around $\sim$2.1 GHz while exhibiting record-low $P_{th}$ (-3.4 dBm) and $IL^{s.s}$ (0.94 dB) values. Furthermore, while the pFSL can selectively attenuate undesired signals with power ranging from -3.4 dBm to 13 dBm, it provides a strong suppression level (IS > 12.0 dB) even when driven by much higher $P_{in}$ values approaching 28 dBm. Such measured performance metrics demonstrate how the unique nonlinear dynamics of parametric-based FSLs can be leveraged through components and systems compatible with conventional chip-scale manufacturing processes in order to increase the resilience to electromagnetic interference (EMI), even of wireless radios designed for a low-power consumption and consequently characterized by a narrow dynamic range.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis
Authors:
Faiza Memood,
Muhammad Usman Ghani,
Muhammad Ali Ibrahim,
Rehab Shehzadi,
Muhammad Nabeel Asim
Abstract:
In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this paper for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly a…
▽ More
In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this paper for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu dataset which consists of 3241 sentiments annotated against positive, negative and neutral classes. To provide benchmark baseline performance over the presented dataset, we adapt diverse machine learning (Support Vector Machine Logistic Regression, Naive Bayes), deep learning (convolutional neural network, recurrent neural network), and hybrid approaches. Effectiveness of generated neural word embeddings is evaluated by comparing the performance of machine and deep learning based methodologies using 7, and 5 distinct feature representation approaches respectively. Finally, it proposes a novel precisely extreme multi-channel hybrid methodology which outperforms state-of-the-art adapted machine and deep learning approaches by the figure of 9%, and 4% in terms of F1-score. Roman Urdu Sentiment Analysis, Pretrain word embeddings for Roman Urdu, Word2Vec, Glove, Fast-Text
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification
Authors:
Muhammad Nabeel Asim,
Muhammad Usman Ghani,
Muhammad Ali Ibrahim,
Sheraz Ahmad,
Waqar Mahmood,
Andreas Dengel
Abstract:
In order to provide benchmark performance for Urdu text document classification, the contribution of this paper is manifold. First, it pro-vides a publicly available benchmark dataset manually tagged against 6 classes. Second, it investigates the performance impact of traditional machine learning based Urdu text document classification methodologies by embedding 10 filter-based feature selection a…
▽ More
In order to provide benchmark performance for Urdu text document classification, the contribution of this paper is manifold. First, it pro-vides a publicly available benchmark dataset manually tagged against 6 classes. Second, it investigates the performance impact of traditional machine learning based Urdu text document classification methodologies by embedding 10 filter-based feature selection algorithms which have been widely used for other languages. Third, for the very first time, it as-sesses the performance of various deep learning based methodologies for Urdu text document classification. In this regard, for experimentation, we adapt 10 deep learning classification methodologies which have pro-duced best performance figures for English text classification. Fourth, it also investigates the performance impact of transfer learning by utiliz-ing Bidirectional Encoder Representations from Transformers approach for Urdu language. Fifth, it evaluates the integrity of a hybrid approach which combines traditional machine learning based feature engineering and deep learning based automated feature engineering. Experimental results show that feature selection approach named as Normalised Dif-ference Measure along with Support Vector Machine outshines state-of-the-art performance on two closed source benchmark datasets CLE Urdu Digest 1000k, and CLE Urdu Digest 1Million with a significant margin of 32%, and 13% respectively. Across all three datasets, Normalised Differ-ence Measure outperforms other filter based feature selection algorithms as it significantly uplifts the performance of all adopted machine learning, deep learning, and hybrid approaches. The source code and presented dataset are available at Github repository.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Systematic Synthesis and Design of Ultra-Low Threshold Parametric Frequency Dividers
Authors:
Hussein M. E. Hussein,
Mahmoud A. A. Ibrahim,
Giuseppe Michetti,
Matteo Rinaldi,
Marvin Onabajo,
Cristian Cassella
Abstract:
A new method is discussed for the systematic synthesis, design and performance optimization of varactor-based parametric frequency dividers (PFDs) exhibiting an ultra-low power threshold ($P_{th}$). For the first time, it is analytically shown that the $P_{th}$-value exhibited by any PFD can always be expressed as an explicit closed-form function of the different impedances forming its network. Su…
▽ More
A new method is discussed for the systematic synthesis, design and performance optimization of varactor-based parametric frequency dividers (PFDs) exhibiting an ultra-low power threshold ($P_{th}$). For the first time, it is analytically shown that the $P_{th}$-value exhibited by any PFD can always be expressed as an explicit closed-form function of the different impedances forming its network. Such a unique and unexplored property permits to rely on linear models, during the PFD design and performance optimization. The validity of our analytical model has been verified, in a commercial circuit simulator, through time-domain and frequency-domain algorithms. To demonstrate the effectiveness of our new synthesis approach, we also report on a lumped prototype of a 200:100MHz PFD, realized on a printed circuit board (PCB). Although inductors with quality factors lower than 50 were used, the PFD prototype exhibits a $P_{th}$-value lower than $-$15dBm. Such a low $P_{th}$-value is the lowest one ever reported for passive varactor-based PFDs, operating in the same frequency range.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Tuning electronic properties in graphene quantum dots by chemical functionalization: Density functional theory calculations
Authors:
Hazem Abdelsalam,
Hanan Elhaes,
Medhat A. Ibrahim
Abstract:
The electronic energy gap and total dipole moment of chemically functionalized hexagonal and triangular graphene quantum dots are investigated by the density functional theory. It has been found that the energy gap can be efficiently tuned in the selected clusters by edge passivation with different elements or groups. Edge passivation with oxygen provides a considerable decrease of the large energ…
▽ More
The electronic energy gap and total dipole moment of chemically functionalized hexagonal and triangular graphene quantum dots are investigated by the density functional theory. It has been found that the energy gap can be efficiently tuned in the selected clusters by edge passivation with different elements or groups. Edge passivation with oxygen provides a considerable decrease of the large energy gap observed in hexagonal nanodots. The edge states and energy gap in triangular graphene quantum dots can also be manipulated by passivation with fluorine. The total dipole moment strongly depends on: (a) the shape and edge termination of the graphene quantum dot, (b) the attached group, and (c) the position to which the groups are attached. With respect to the shape, edge termination, and the attached group the chemically modified hexagonal-armchair quantum dot has the highest total dipole moment. Depending on the position of the attached groups, the total dipole can be increased, decreased, or eliminated. The significant features, the tunable energy gap and total dipole moment, of the functionalized graphene quantum dots are confirmed by the stability calculations. The obtained positive binding energy and positive frequencies in the infrared spectra imply that all the selected clusters are stable under edge functionalization and passivation with various groups and elements.
△ Less
Submitted 12 December, 2017;
originally announced December 2017.
-
First principles study of edge carboxylated graphene quantum dots
Authors:
Hazem Abdelsalam,
Hanan Elhaes,
Medhat A. Ibrahim
Abstract:
The structure stability and electronic properties of edge carboxylated hexagonal and triangular graphene quantum dots are investigated by using density functional theory. The calculated binding energies show that the hexagonal clusters with armchair edges have the highest stability among all other flakes. The binding energy of carboxylated graphene quantum dots increases by increasing the number o…
▽ More
The structure stability and electronic properties of edge carboxylated hexagonal and triangular graphene quantum dots are investigated by using density functional theory. The calculated binding energies show that the hexagonal clusters with armchair edges have the highest stability among all other flakes. The binding energy of carboxylated graphene quantum dots increases by increasing the number of attached carboxyl groups. Our study shows that the total dipole moment significantly increases by adding COOH with the highest values observed in triangular clusters. The edge states in triangular graphene with zigzag edges produce completely different energy spectrum from other shapes as (a) the energy gap in triangular zigzag cluster is very small compared to other clusters and (b) the highest occupied molecular orbital is localized at the edges which is in contrast to other clusters where it is distributed over the cluster surface. The enhanced reactivity and the controllable energy gap by shape and edge termination make graphene quantum dots ideal nanodevices for various applications such as sensors. The infrared spectra for different flakes are presented for confirmation and detection of the obtained results.
△ Less
Submitted 9 December, 2017;
originally announced December 2017.
-
Operation of the Intensity Monitors in Beam Transport Lines at Fermilab During Run II
Authors:
J. Crisp,
B. Fellenz,
J. Fitzgerald,
D. Heikkinen,
M. A. Ibrahim
Abstract:
The intensity of charged particle beams at Fermilab must be kept within pre-determined safety and operational envelopes in part by assuring all beam within a few percent has been transported from any source to destination. Beam intensity monitors with toroidial pickups provide such beam intensity measurements in the transport lines between accelerators at FNAL. During Run II, much effort was made…
▽ More
The intensity of charged particle beams at Fermilab must be kept within pre-determined safety and operational envelopes in part by assuring all beam within a few percent has been transported from any source to destination. Beam intensity monitors with toroidial pickups provide such beam intensity measurements in the transport lines between accelerators at FNAL. During Run II, much effort was made to continually improve the resolution and accuracy of the system.
△ Less
Submitted 21 September, 2012;
originally announced September 2012.