-
Vertical Power Delivery for Emerging Packaging and Integration Platforms -- Power Conversion and Distribution
Authors:
Sriharini Krishnakumar,
Dr. Inna Partin-Vaisband
Abstract:
Efficient delivery of current from PCB to point-of-load (POL) is a primary concern in modern high-power high-density integrated systems. Traditionally, a 48 V power signal is converted to the low, POL voltage at the board and/or package level. As interconnect has become the dominant power loss component, minimizing voltage drop across the laterally routed portions of the board-to-die interconnect…
▽ More
Efficient delivery of current from PCB to point-of-load (POL) is a primary concern in modern high-power high-density integrated systems. Traditionally, a 48 V power signal is converted to the low, POL voltage at the board and/or package level. As interconnect has become the dominant power loss component, minimizing voltage drop across the laterally routed portions of the board-to-die interconnect (referred to as horizontal interconnect) is a promising approach to enhance the efficiency of the power delivery system. Delivering lower current vertically, at a higher voltage should therefore be considered. High-power conversion near POL, however, results in higher switching and inductor losses, exhibiting an undesired power efficiency tradeoff. To address this problem, four vertical power delivery architectures are proposed in this paper, considering state-of-the-art power converter topologies, integration levels, and voltage conversion schemes. Embedding Silicon (Si) and Gallium Nitride (GaN) power devices and inductors on top of and/or within the interposer is investigated. Integrating GaN power devices on a dedicated power die is also discussed. Various multi-stage 48V-to-1V power conversion schemes are examined and state-of-the-art power conversion circuits are reviewed. Power delivery characteristics with these architectures are determined for a high power (1 kW) high-current density (2 A/mm$^2$) system.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
A Machine Learning Pipeline Stage for Adaptive Frequency Adjustment
Authors:
Arash Fouman Ajirlou,
Inna Partin-Vaisband
Abstract:
A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage withi…
▽ More
A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within a baseline processor. The modified system is experimentally tested at the gate level in 45 nm CMOS technology, exhibiting a speedup of 70% and energy reduction of 30% with coarse-grained ML classification. A speedup of 89% is demonstrated with finer granularities with 15.5% reduction in energy consumption.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
A Unified Learning Platform for Dynamic Frequency Scaling in Pipelined Processors
Authors:
Arash Fouman Ajirlou,
Inna Partin-Vaisband
Abstract:
A machine learning (ML) design framework is proposed for dynamically adjusting clock frequency based on propagation delay of individual instructions. A Random Forest model is trained to classify propagation delays in real-time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage with…
▽ More
A machine learning (ML) design framework is proposed for dynamically adjusting clock frequency based on propagation delay of individual instructions. A Random Forest model is trained to classify propagation delays in real-time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within a baseline processor. The modified system is simulated at the gate-level in 45 nm CMOS technology, exhibiting a speed-up of 68% and energy reduction of 37% with coarse-grained ML classification. A speed-up of 95% is demonstrated with finer granularities at additional energy costs.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Progressive VAE Training on Highly Sparse and Imbalanced Data
Authors:
Dmitry Utyamishev,
Inna Partin-Vaisband
Abstract:
In this paper, we present a novel approach for training a Variational Autoencoder (VAE) on a highly imbalanced data set. The proposed training of a high-resolution VAE model begins with the training of a low-resolution core model, which can be successfully trained on imbalanced data set. In subsequent training steps, new convolutional, upsampling, deconvolutional, and downsampling layers are itera…
▽ More
In this paper, we present a novel approach for training a Variational Autoencoder (VAE) on a highly imbalanced data set. The proposed training of a high-resolution VAE model begins with the training of a low-resolution core model, which can be successfully trained on imbalanced data set. In subsequent training steps, new convolutional, upsampling, deconvolutional, and downsampling layers are iteratively attached to the model. In each iteration, the additional layers are trained based on the intermediate pretrained model - a result of previous training iterations. Thus, the resolution of the model is progressively increased up to the required resolution level. In this paper, the progressive VAE training is exploited for learning a latent representation with imbalanced, highly sparse data sets and, consequently, generating routes in a constrained 2D space. Routing problems (e.g., vehicle routing problem, travelling salesman problem, and arc routing) are of special significance in many modern applications (e.g., route planning, network maintenance, develo** high-performance nanoelectronic systems, and others) and typically associated with sparse imbalanced data. In this paper, the critical problem of routing billions of components in nanoelectronic devices is considered. The proposed approach exhibits a significant training speedup as compared with state-of-the-art existing VAE training methods, while generating expected image outputs from unseen input data. Furthermore, the final progressive VAE models exhibit much more precise output representation, than the Generative Adversarial Network (GAN) models trained with comparable training time. The proposed method is expected to be applicable to a wide range of applications, including but not limited image impainting, sentence interpolation, and semi-supervised learning.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.
-
Exploiting Dual-Gate Ambipolar CNFETs for Scalable Machine Learning Classification
Authors:
Farid Kenarangi,
Xuan Hu,
Yihan Liu,
Jean Anne C. Incorvia,
Joseph S. Friedman,
Inna Partin-Vaisband
Abstract:
Ambipolar carbon nanotube based field-effect transistors (AP-CNFETs) exhibit unique electrical characteristics, such as tri-state operation and bi-directionality, enabling systems with complex and reconfigurable computing. In this paper, AP-CNFETs are used to design a mixed-signal machine learning (ML) classifier. The classifier is designed in SPICE with feature size of 15 nm and operates at 250 M…
▽ More
Ambipolar carbon nanotube based field-effect transistors (AP-CNFETs) exhibit unique electrical characteristics, such as tri-state operation and bi-directionality, enabling systems with complex and reconfigurable computing. In this paper, AP-CNFETs are used to design a mixed-signal machine learning (ML) classifier. The classifier is designed in SPICE with feature size of 15 nm and operates at 250 MHz. The system is demonstrated based on MNIST digit dataset, yielding 90% accuracy and no accuracy degradation as compared with the classification of this dataset in Python. The system also exhibits lower power consumption and smaller physical size as compared with the state-of-the-art CMOS and memristor based mixed-signal classifiers.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning Classification
Authors:
Farid Kenarangi,
Inna Partin-Vaisband
Abstract:
Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resol…
▽ More
Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resolution multiplication is facilitated within a single-MOSFET by feeding the features and feature weights into, respectively, the body and gate inputs. High-resolution classifier that considers the confidence of the individual predictors is designed at 45 nm technology node and operates at 100 MHz in subthreshold region. To evaluate the performance of the classifier, a reduced MNIST dataset is generated by downsampling the MNIST digit images from 28 $\times$ 28 features to 9 $\times$ 9 features. The system is simulated across a wide range of PVT variations, exhibiting nominal accuracy of 90%, energy consumption of 6.2 pJ per classification (over 45 times lower than state-of-the-art classifiers), area of 2,179 $μ$$m^{2}$ (over 7.3 times lower than state-of-the-art classifiers), and a stable response under PVT variations.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.