Search | arXiv e-print repository

An Analysis of Various Design Pathways Towards Multi-Terabit Photonic On-Interposer Interconnects

Authors: Venkata Sai Praneeth Karempudi, Janibul Bashir, Ishan G Thakkar

Abstract: In the wake of dwindling Moore's Law, to address the rapidly increasing complexity and cost of fabricating large-scale, monolithic systems-on-chip (SoCs), the industry has adopted dis-aggregation as a solution, wherein a large monolithic SoC is partitioned into multiple smaller chiplets that are then assembled into a large system-in-package (SiP) using advanced packaging substrates such as silicon… ▽ More In the wake of dwindling Moore's Law, to address the rapidly increasing complexity and cost of fabricating large-scale, monolithic systems-on-chip (SoCs), the industry has adopted dis-aggregation as a solution, wherein a large monolithic SoC is partitioned into multiple smaller chiplets that are then assembled into a large system-in-package (SiP) using advanced packaging substrates such as silicon interposer. For such interposer-based SiPs, there is a push to realize on-interposer inter-chiplet communication bandwidth of multi-Tb/s and end-to-end communication latency of no more than 10ns. This push comes as the natural progression from some recent prior works on SiP design, and is driven by the proliferating bandwidth demand of modern data-intensive workloads. To meet this bandwidth and latency goal, prior works have focused on a potential solution of using the silicon photonic interposer (SiPhI) for integrating and interconnecting a large number of chiplets into an SiP. Despite the early promise, the existing designs of on-SiPhI interconnects still have to evolve by leaps and bounds to meet the goal of multi-Tb/s bandwidth. However, the possible design pathways, upon which such an evolution can be achieved, have not been explored in any prior works yet. In this paper, we have identified several design pathways that can help evolve on-SiPhI interconnects to achieve multi-Tb/s aggregate bandwidth. We perform an extensive link-level and system-level analysis in which we explore these design pathways in isolation and in different combinations of each other. From our link-level analysis, we have observed that the design pathways that simultaneously enhance the spectral range and optical power budget available for wavelength multiplexing can render aggregate bandwidth of up to 4Tb/s per on-SiPhI link. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: Under review (ACM JETC)

arXiv:2306.07238 [pdf, other]

A Silicon Nitride Microring Modulator for High-Performance Photonic Integrated Circuits

Authors: Venkata Sai Praneeth Karempudi, Ishan G Thakkar, Jeffrey Todd Hastings

Abstract: The use of the Silicon-on-Insulator (SOI) platform has been prominent for realizing CMOS-compatible, high-performance photonic integrated circuits (PICs). But in recent years, the silicon-nitride-on-silicon-dioxide (SiN-on-SiO$_2$) platform has garnered increasing interest as an alternative, because of its several beneficial properties over the SOI platform, such as low optical losses, high thermo… ▽ More The use of the Silicon-on-Insulator (SOI) platform has been prominent for realizing CMOS-compatible, high-performance photonic integrated circuits (PICs). But in recent years, the silicon-nitride-on-silicon-dioxide (SiN-on-SiO$_2$) platform has garnered increasing interest as an alternative, because of its several beneficial properties over the SOI platform, such as low optical losses, high thermo-optic stability, broader wavelength transparency range, and high tolerance to fabrication-process variations. However, SiN-on-SiO$_2$ based active devices, such as modulators, are scarce and lack in desired performance due to the absence of free-carrier-based activity in the SiN material and the complexity of integrating other active materials with SiN-on-SiO$_2$ platform. This shortcoming hinders the SiN-on-SiO$_2$ platform for realizing active PICs. To address this shortcoming, in this article, we demonstrate a SiN-on-SiO$_2$ microring resonator (MRR) based active modulator. Our designed MRR modulator employs an Indium-Tin-Oxide (ITO)-SiO$_2$-ITO thin-film stack as the active upper cladding and leverages the free-carrier assisted, high-amplitude refractive index change in the ITO films to affect a large electro-refractive optical modulation in the device. Based on the electrostatic, transient, and finite difference time domain (FDTD) simulations, conducted using photonics foundry-validated tools, we show that our modulator achieves 450 pm/V resonance modulation efficiency, $\sim$46.2 GHz 3-dB modulation bandwidth, 18 nm free-spectral range (FSR), 0.24 dB insertion loss, and 8.2 dB extinction ratio for optical on-off-keying (OOK) modulation at 30 Gb/s. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2212.06326

arXiv:2207.05278 [pdf, other]

Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors

Authors: Sairam Sri Vatsavai, Ishan G Thakkar

Abstract: Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of… ▽ More Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently map** tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Paper accepted at CASES (ESWEEK) 2022

arXiv:2110.06105 [pdf]

Photonic Networks-on-Chip Employing Multilevel Signaling: A Cross-Layer Comparative Study

Authors: Venkata Sai Praneeth Karempudi, Febin Sunny, Ishan G Thakkar, Sai Vineel Reddy Chittamuru, Mahdi Nikdast, Sudeep Pasricha

Abstract: Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve higher aggregated datarate in photonic links, and hence higher throughput in PNoCs, requires sophisticated and costly laser sources along… ▽ More Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve higher aggregated datarate in photonic links, and hence higher throughput in PNoCs, requires sophisticated and costly laser sources along with extra photonic hardware. This extra hardware can introduce undesired noise to the photonic link and increase the bit-error-rate (BER), power, and area consumption of PNoCs. To mitigate these issues, the use of 4-pulse amplitude modulation (4-PAM) signaling, instead of the conventional on-off keying (OOK) signaling, can halve the wavelength signals utilized in photonic links for achieving the target aggregate datarate while reducing the overhead of crosstalk noise, BER, and photonic hardware. There are various designs of 4- PAM modulators reported in the literature. For example, the signal superposition (SS), electrical digital-to-analog converter (EDAC), and optical digital-to-analog converter (ODAC) based designs of 4-PAM modulators have been reports. However, it is yet to be explored how these SS, EDAC, and ODAC based 4-PAM modulators can be utilized to design DWDM-based photonic links and PNoC architectures. In this paper, we provide an extensive link-level and system-level of the SS, EDAC, and ODAC types of 4-PAM modulators from prior work with regards to their applicability and utilization overheads. From our link-level and PNoC-level evaluation, we have observed that the 4-PAM EDAC based variants of photonic links and PNoCs exhibit better performance and energy-efficiency compared to the OOK, 4-PAM SS, and 4-PAM ODAC based links and PNoCs. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: Submitted and Accepted to publish in ACM Journal on Emerging Technologies in Computing Systems

arXiv:2105.12781 [pdf]

ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing

Authors: Supreeth Mysore Shivanandamurthy, Ishan. G. Thakkar, Sayed Ahmad Salehi

Abstract: With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inferen… ▽ More With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inference of CNNs. ATRIA employs light-weight modifications in DRAM cell arrays to implement bit-parallel stochastic arithmetic based acceleration of multiply-accumulate (MAC) operations inside DRAM. ATRIA significantly improves the latency, throughput, and efficiency of processing CNN inferences by performing 16 MAC operations in only five consecutive memory operation cycles. We mapped the inference tasks of four benchmark CNNs on ATRIA to compare its performance with five state-of-the-art in-DRAM CNN accelerators from prior work. The results of our analysis show that ATRIA exhibits only 3.5% drop in CNN inference accuracy and still achieves improvements of up to 3.2x in frames-per-second (FPS) and up to 10x in efficiency (FPS/W/mm2), compared to the best-performing in-DRAM accelerator from prior work. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Preprint accepted in ISVLSI 2021

arXiv:2103.03953 [pdf]

ODIN: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM

Authors: Supreeth Mysore Shivanandamurthy, Ishan. G. Thakkar, Sayed Ahmad Salehi

Abstract: Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change… ▽ More Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change RAM (PCRAM) to enable a low-overhead in-situ acceleration of all essential ANN functions such as multiply-accumulate (MAC), nonlinear activation, and pooling. We mapped four ANN benchmark applications on ODIN to compare its performance with a conventional processor-centric design and a crossbar-based in-situ ANN accelerator from prior work. The results of our analysis for the considered ANN topologies indicate that our ODIN accelerator can be at least 5.8x faster and 23.2x more energy-efficient, and up to 90.8x faster and 1554x more energy-efficient, compared to the crossbar-based in-situ ANN accelerator from prior work. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 6 pages, 6 Figures, 4 Tables

arXiv:2008.11367 [pdf]

Mitigating the Latency-Area Tradeoffs for DRAM Design with Coarse-Grained Monolithic 3D (M3D) Integration

Authors: Chao-Hsuan Huang, Ishan G Thakkar

Abstract: Over the years, the DRAM latency has not scaled proportionally with its density due to the cost-centric mindset of the DRAM industry. Prior work has shown that this shortcoming can be overcome by reducing the critical length of DRAM access path. However, doing so decreases DRAM area-efficiency, exacerbating the latency-area tradeoffs for DRAM design. In this paper, we show that reorganizing DRAM c… ▽ More Over the years, the DRAM latency has not scaled proportionally with its density due to the cost-centric mindset of the DRAM industry. Prior work has shown that this shortcoming can be overcome by reducing the critical length of DRAM access path. However, doing so decreases DRAM area-efficiency, exacerbating the latency-area tradeoffs for DRAM design. In this paper, we show that reorganizing DRAM cell-arrays using the emerging monolithic 3D (M3D) integration technology can mitigate these fundamental latency-area tradeoffs. Based on our evaluation results for PARSEC benchmarks, our designed M3D DRAM cell-array organizations can yield up to 9.56% less latency, up to 4.96% less power consumption, and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, com-pared to the conventional 2D DDR4 DRAM. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: Accepted in ICCD 2020

arXiv:2007.10454 [pdf]

Exploiting Process Variations to Secure Photonic NoC Architectures from Snoo** Attacks

Authors: Sai Vineel Reddy Chittamuru, Ishan G Thakkar, Sudeep Pasricha, Sairam Sri Vatsavai, Varun Bhat

Abstract: The compact size and high wavelength-selectivity of microring resonators (MRs) enable photonic networks-on-chip (PNoCs) to utilize dense-wavelength-division-multiplexing (DWDM) in their photonic waveguides, and as a result, attain high bandwidth on-chip data transfers. Unfortunately, a Hardware Trojan in a PNoC can manipulate the electrical driving circuit of its MRs to cause the MRs to snoop data… ▽ More The compact size and high wavelength-selectivity of microring resonators (MRs) enable photonic networks-on-chip (PNoCs) to utilize dense-wavelength-division-multiplexing (DWDM) in their photonic waveguides, and as a result, attain high bandwidth on-chip data transfers. Unfortunately, a Hardware Trojan in a PNoC can manipulate the electrical driving circuit of its MRs to cause the MRs to snoop data from the neighboring wavelength channels in a shared photonic waveguide, which introduces a serious security threat. This paper presents a framework that utilizes process variation-based authentication signatures along with architecture-level enhancements to protect against data-snoo** Hardware Trojans during unicast as well as multicast transfers in PNoCs. Evaluation results indicate that our framework can improve hardware security across various PNoC architectures with minimal overheads of up to 14.2% in average latency and of up to 14.6% in energy-delay-product (EDP). △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: Pre-Print: Accepted in IEEE TCAD Journal on July 16, 2020

Showing 1–8 of 8 results for author: Thakkar, I G