65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics
Authors:
Zhong** Lin,
Bhavin J. Shastri,
Shangxuan Yu,
**gxiang Song,
Yuntao Zhu,
Arman Safarnejadian,
Wangning Cai,
Yanmei Lin,
Wei Ke,
Mustafa Hammood,
Tianye Wang,
Mengyue Xu,
Zibo Zheng,
Mohammed Al-Qadasi,
Omid Esmaeeli,
Mohamed Rahim,
Grzegorz Pakulski,
Jens Schmid,
Pedro Barrios,
Weihong Jiang,
Hugh Morison,
Matthew Mitchell,
Xiaogang Qiang,
Xun Guan,
Nicolas A. F. Jaeger
, et al. (6 additional authors not shown)
Abstract:
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III…
▽ More
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, can implement an entire layer of a neural network. It can execute 65 billion operations per second (GOPS) per neuron, including simultaneous weight updates-a hitherto unachieved speed. Our processor stands out from conventional photonic processors, which have static weights set during training, as it supports fast "hardware-in-the-loop" training, and can dynamically adjust the inputs (fan-in) and outputs (fan-out) within a layer, thereby enhancing its versatility. Our processor can perform large-scale dot-product operations with vector dimensions up to 131,072. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112*112-pixel images after "hardware-in-the-loop" training. To handle "hardware-in-the-loop" training for clustering AI tasks, we provide a solution for multiplications involving two negative numbers based on our processor.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.