-
Solar Power Prediction Using Satellite Data in Different Parts of Nepal
Authors:
Raj Krishna Nepal,
Bibek Khanal,
Vibek Ghimire,
Kismat Neupane,
Atul Pokharel,
Kshitij Niraula,
Baburam Tiwari,
Nawaraj Bhattarai,
Khem N. Poudyal,
Nawaraj Karki,
Mohan B Dangi,
John Biden
Abstract:
Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors,…
▽ More
Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors, and deep learning models like LSTM and ANN-MLP are employed and evaluated for their performance. The results indicate high accuracy in predicting solar irradiance, with R-squared(R2) scores close to unity for both train and test datasets. The impact of parameter integration on model performance is analyzed, revealing the significance of various parameters in enhancing predictive accuracy. Each model demonstrates strong performance across all parameters, consistently achieving MAE values below 6, RMSE values under 10, MBE within |2|, and nearly unity R2 values. Upon removal of various solar parameters such as "Solar_Irradiance_Clear_Sky", "UVA", etc. from the datasets, the model's performance is significantly affected. This exclusion leads to considerable increases in MAE, reaching up to 82, RMSE up to 135, and MBE up to |7|. Among the models, KNN displays the weakest performance, with an R2 of 0.7582546. Conversely, ANN exhibits the strongest performance, boasting an R2 value of 0.9245877. Hence, the study concludes that Artificial Neural Network (ANN) performs exceptionally well, showcasing its versatility even under sparse data parameter conditions.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
In-Network Accumulation: Extending the Role of NoC for DNN Acceleration
Authors:
Binayak Tiwari,
Mei Yang,
Xiaohang Wang,
Yingtao Jiang
Abstract:
Network-on-Chip (NoC) plays a significant role in the performance of a DNN accelerator. The scalability and modular design property of the NoC help in improving the performance of a DNN execution by providing flexibility in running different kinds of workloads. Data movement in a DNN workload is still a challenging task for DNN accelerators and hence a novel approach is required. In this paper, we…
▽ More
Network-on-Chip (NoC) plays a significant role in the performance of a DNN accelerator. The scalability and modular design property of the NoC help in improving the performance of a DNN execution by providing flexibility in running different kinds of workloads. Data movement in a DNN workload is still a challenging task for DNN accelerators and hence a novel approach is required. In this paper, we propose the In-Network Accumulation (INA) method to further accelerate a DNN workload execution on a many-core spatial DNN accelerator for the Weight Stationary (WS) dataflow model. The INA method expands the router's function to support partial sum accumulation. This method avoids the overhead of injecting and ejecting an incoming partial sum to the local processing element. The simulation results on AlexNet, ResNet-50, and VGG-16 workloads show that the proposed INA method achieves 1.22x improvement in latency and 2.16x improvement in power consumption on the WS dataflow model.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration
Authors:
Binayak Tiwari,
Mei Yang,
Xiaohang Wang,
Yingtao Jiang
Abstract:
The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow…
▽ More
The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow patterns and enabling processing with communication parallelism in a DNN accelerator. However, the widely used mesh-based NoC architectures inherently cannot support the efficient one-to-many and many-to-one traffic largely existing in DNN workloads. In this paper, we propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many (multicast) traffic, and the use of gather packets to support many-to-one (gather) traffic. The analysis of the runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture for an Output Stationary (OS) dataflow architecture. The simulation results demonstrate that the gather packets can help to reduce the runtime latency up to 1.8 times and network power consumption up to 1.7 times, compared with the repetitive unicast method on modified mesh architectures supporting two-way streaming.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Improving the Performance of a NoC-based CNN Accelerator with Gather Support
Authors:
Binayak Tiwari,
Mei Yang,
Xiaohang Wang,
Yingtao Jiang,
Venkatesan Muthukumar
Abstract:
The increasing application of deep learning technology drives the need for an efficient parallel computing architecture for Convolutional Neural Networks (CNNs). A significant challenge faced when designing a many-core CNN accelerator is to handle the data movement between the processing elements. The CNN workload introduces many-to-one traffic in addition to one-to-one and one-to-many traffic. As…
▽ More
The increasing application of deep learning technology drives the need for an efficient parallel computing architecture for Convolutional Neural Networks (CNNs). A significant challenge faced when designing a many-core CNN accelerator is to handle the data movement between the processing elements. The CNN workload introduces many-to-one traffic in addition to one-to-one and one-to-many traffic. As the de-facto standard for on-chip communication, Network-on-Chip (NoC) can support various unicast and multicast traffic. For many-to-one traffic, repetitive unicast is employed which is not an efficient way. In this paper, we propose to use the gather packet on mesh-based NoCs employing output stationary systolic array in support of many-to-one traffic. The gather packet will collect the data from the intermediate nodes eventually leading to the destination efficiently. This method is evaluated using the traffic traces generated from the convolution layer of AlexNet and VGG-16 with improvement in the latency and power than the repetitive unicast method.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Efficient On-Chip Multicast Routing based on Dynamic Partition Merging
Authors:
Binayak Tiwari,
Mei Yang,
Yingtao Jiang,
Xiaohang Wang
Abstract:
Networks-on-chips (NoCs) have become the mainstream communication infrastructure for chip multiprocessors (CMPs) and many-core systems. The commonly used parallel applications and emerging machine learning-based applications involve a significant amount of collective communication patterns. In CMP applications, multicast is widely used in multithreaded programs and protocols for barrier/clock sync…
▽ More
Networks-on-chips (NoCs) have become the mainstream communication infrastructure for chip multiprocessors (CMPs) and many-core systems. The commonly used parallel applications and emerging machine learning-based applications involve a significant amount of collective communication patterns. In CMP applications, multicast is widely used in multithreaded programs and protocols for barrier/clock synchronization and cache coherence. Multicast routing plays an important role on the system performance of a CMP. Existing partition-based multicast routing algorithms all use static destination set partition strategy which lacks the global view of path optimization. In this paper, we propose an efficient Dynamic Partition Merging (DPM)-based multicast routing algorithm. The proposed algorithm divides the multicast destination set into partitions dynamically by comparing the routing cost of different partition merging options and selecting the merged partitions with lower cost. The simulation results of synthetic traffic and PARSEC benchmark applications confirm that the proposed algorithm outperforms the existing path-based routing algorithms. The proposed algorithm is able to improve up to 23\% in average packet latency and 14\% in power consumption against the existing multipath routing algorithm when tested in PARSEC benchmark workloads.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
eSampling: Energy Harvesting ADCs
Authors:
Neha Jain,
Nir Shlezinger,
Bhawna Tiwari,
Yonina C. Eldar,
Anubha Gupta,
Vivek Ashok Bohara,
Pydi Ganga Bahubalindruni
Abstract:
Analog-to-digital converters (ADCs) allow physical signals to be processed using digital hardware. The power consumed in conversion grows with the sampling rate and quantization resolution, imposing a major challenge in power-limited systems. A common ADC architecture is based on sample-and-hold (S/H) circuits, where the analog signal is being tracked only for a fraction of the sampling period. In…
▽ More
Analog-to-digital converters (ADCs) allow physical signals to be processed using digital hardware. The power consumed in conversion grows with the sampling rate and quantization resolution, imposing a major challenge in power-limited systems. A common ADC architecture is based on sample-and-hold (S/H) circuits, where the analog signal is being tracked only for a fraction of the sampling period. In this paper, we propose the concept of eSampling ADCs, which harvest energy from the analog signal during the time periods where the signal is not being tracked. This harvested energy can be used to supplement the ADC itself, paving the way to the possibility of zero-power consumption and power-saving ADCs. We analyze the tradeoff between the ability to recover the sampled signal and the energy harvested, and provide guidelines for setting the sampling rate in the light of accuracy and energy constraints. Our analysis indicates that eSampling ADCs operating with up to 12 bits per sample can acquire bandlimited analog signals such that they can be perfectly recovered without requiring power from the external source. Furthermore, our theoretical results reveal that eSampling ADCs can in fact save power by harvesting more energy than they consume. To verify the feasibility of eSampling ADCs, we present a circuit-level design using standard complementary metal oxide semiconductor (CMOS) 65 nm technology. An eSampling 8-bit ADC which samples at 40 MHZ is designed on a Cadence Virtuoso platform. Our experimental study involving Nyquist rate sampling of bandlimited signals demonstrates that such ADCs are indeed capable of harvesting more energy than that spent during analog-to-digital conversion, without affecting the accuracy.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.