-
OFHE: An Electro-Optical Accelerator for Discretized TFHE
Authors:
Mengxin Zheng,
Cheng Chu,
Qian Lou,
Nathan Youngblood,
Mo Li,
Sajjad Moazeni,
Lei Jiang
Abstract:
This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrap**s. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomia…
▽ More
This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrap**s. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomial multiplications, which can be time-consuming. Existing TFHE accelerators are not easily upgradable to support DTFHE operations due to limited datapaths, a lack of datapath bit-width reconfigurability, and power inefficiencies when processing FFT and inverse FFT (IFFT) kernels. Compared to prior TFHE accelerators, OFHE addresses these challenges by improving the DTFHE operation latency by 8.7\%, the DTFHE operation throughput by $57\%$, and the DTFHE operation throughput per Watt by $94\%$.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Next-generation Co-Packaged Optics for Future Disaggregated AI Systems
Authors:
Sajjad Moazeni
Abstract:
Co-packaged optics is poised to solve the interconnect bandwidth bottleneck for GPUs and AI accelerators in near future. This technology can immediately boost today's AI/ML compute power to train larger neural networks that can perform more complex tasks. More importantly, co-packaged optics unlocks new system-level opportunities to rethink our conventional supercomputing and datacenter architectu…
▽ More
Co-packaged optics is poised to solve the interconnect bandwidth bottleneck for GPUs and AI accelerators in near future. This technology can immediately boost today's AI/ML compute power to train larger neural networks that can perform more complex tasks. More importantly, co-packaged optics unlocks new system-level opportunities to rethink our conventional supercomputing and datacenter architectures. Disaggregation of memory and compute units is one of such new paradigms that can greatly speed up AI/ML workloads by providing low-latency and high-throughput performance, while maintaining flexibility to support conventional cloud computing applications as well. This paper gives a brief overview of state-of-the-art of co-packaged optical I/O and requirements of its next generations. We also discuss ideas to exploit co-packaged optics in disaggregated AI systems and possible future directions.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration
Authors:
Daniel Sturm,
Sajjad Moazeni
Abstract:
Optical computing has been recently proposed as a new compute paradigm to meet the demands of future AI/ML workloads in datacenters and supercomputers. However, proposed implementations so far suffer from lack of scalability, large footprints and high power consumption, and incomplete system-level architectures to become integrated within existing datacenter architecture for real-world application…
▽ More
Optical computing has been recently proposed as a new compute paradigm to meet the demands of future AI/ML workloads in datacenters and supercomputers. However, proposed implementations so far suffer from lack of scalability, large footprints and high power consumption, and incomplete system-level architectures to become integrated within existing datacenter architecture for real-world applications. In this work, we present a truly scalable optical AI accelerator based on a crossbar architecture. We have considered all major roadblocks and address them in this design. Weights will be stored on chip using phase change material (PCM) that can be monolithically integrated in silicon photonic processes. All electro-optical components and circuit blocks are modeled based on measured performance metrics in a 45nm monolithic silicon photonic process, which can be co-packaged with advanced CPU/GPUs and HBM memories. We also present a system-level modeling and analysis of our chip's performance for the Resnet-50V1.5, considering all critical parameters, including memory size, array size, photonic losses, and energy consumption of peripheral electronics. Both on-chip SRAM and off-chip DRAM energy overheads have been considered in this modeling. We additionally address how using a dual-core crossbar design can eliminate programming time overhead at practical SRAM block sizes and batch sizes. Our results show that a 128 x 128 proposed architecture can achieve inference per second (IPS) similar to Nvidia A100 GPU at 15.4 times lower power and 7.24 times lower area.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.