-
Hypermultiplexed Integrated Tensor Optical Processor
Authors:
Shaoyuan Ou,
Alexander Sludds,
Ryan Hamerly,
Ke Zhang,
Hanke Feng,
Eric Zhong,
Cheng Wang,
Dirk Englund,
Mengjie Yu,
Zaijun Chen
Abstract:
The escalating data volume and complexity resulting from the rapid expansion of artificial intelligence (AI), internet of things (IoT) and 5G/6G mobile networks is creating an urgent need for energy-efficient, scalable computing hardware. Here we demonstrate a hypermultiplexed integrated tensor optical processor (HITOP) that performs trillions of operations per second (TeraOPS) at the energy cost…
▽ More
The escalating data volume and complexity resulting from the rapid expansion of artificial intelligence (AI), internet of things (IoT) and 5G/6G mobile networks is creating an urgent need for energy-efficient, scalable computing hardware. Here we demonstrate a hypermultiplexed integrated tensor optical processor (HITOP) that performs trillions of operations per second (TeraOPS) at the energy cost of 25 femtojoule per operation (25 fJ/OP). Based on space-time-wavelength three-dimensional (3D) data streaming, HITOP is built with arrays of wafer-fabricated III/V-based micron-scale lasers (spanning ~1 THz) incorporating thin-film Lithium-Niobate electro-optic (EO) photonics. Multidimensional parallelism allows matrix-matrix multiplications ($N^{3}$ operations) using $O(N)$ devices, facilitating scalable on-chip integration. With each device activating 10 billion parameters per second, the HITOP scalability is validated in machine learning models with 405,000 parameters, which is 25,000 times more than previous integrated optical systems. A combination of high clockrates (10 GS/s), parallel processing and real-time reprogrammability unlocks the full potential of light for next-generation AI accelerators in applications ranging from training with trillions of parameters, real-time decision making in autonomous vehicles and robotics, dynamic optimization in smart manufacturing, to complex simulation for climate modeling and drug discovery.
△ Less
Submitted 13 February, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Photonic crystal cavity IQ modulators in thin-film lithium niobate for coherent communications
Authors:
Hugo Larocque,
Dashiell L. P. Vitullo,
Alexander Sludds,
Hamed Sattari,
Ian Christen,
Gregory Choong,
Ivan Prieto,
Jacopo Leo,
Homa Zarebidaki,
Sanjaya Lohani,
Brian T. Kirby,
Öney O. Soykal,
Moe Soltani,
Amir H. Ghadimi,
Dirk Englund,
Mikkel Heuck
Abstract:
Thin-Film Lithium Niobate (TFLN) is an emerging integrated photonic platform showing great promise due to its large second-order nonlinearity at microwave and optical frequencies, cryogenic compatibility, large piezoelectric response, and low optical loss at visible and near-infrared wavelengths. These properties enabled Mach-Zehnder interferometer-based devices to demonstrate amplitude- and in-ph…
▽ More
Thin-Film Lithium Niobate (TFLN) is an emerging integrated photonic platform showing great promise due to its large second-order nonlinearity at microwave and optical frequencies, cryogenic compatibility, large piezoelectric response, and low optical loss at visible and near-infrared wavelengths. These properties enabled Mach-Zehnder interferometer-based devices to demonstrate amplitude- and in-phase/quadrature (IQ) modulation at voltage levels compatible with complementary metal-oxide-semiconductor (CMOS) electronics. Maintaining low-voltage operation requires centimeter-scale device lengths, making it challenging to realize the large-scale circuits required by ever-increasing bandwidth demands in data communications. Reduced device sizes reaching the 10 um scale are possible with photonic crystal (PhC) cavities. So far, their operation has been limited to modulation of amplitudes and required circulators or lacked cascadability. Here, we demonstrate a compact IQ modulator using two PhC cavities operating as phase shifters in a Fabry-Perot-enhanced Michelson interferometer configuration. It supports cascadable amplitude and phase modulation at GHz bandwidths with CMOS-compatible voltages. While the bandwidth limitation of resonant devices is often considered detrimental, their compactness enables dense co-integration with CMOS electronics where clock-rate-level operation (few GHz) removes power-hungry electrical time-multiplexing. Recent demonstrations of chip-scale transceivers with dense-wavelength division multiplied transceivers could be monolithically implemented and driven toward ultimate information densities using TFLN electro-optic frequency combs and our PhC IQ modulators.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Single chip photonic deep neural network with accelerated training
Authors:
Saumil Bandyopadhyay,
Alexander Sludds,
Stefan Krastanov,
Ryan Hamerly,
Nicholas Harris,
Darius Bunandar,
Matthew Streshinsky,
Michael Hochberg,
Dirk Englund
Abstract:
As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at…
▽ More
As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at exceptionally high rate and efficiency, motivating recent demonstrations of low latency linear algebra and optical energy consumption below a photon per multiply-accumulate operation. However, demonstrating systems that co-integrate both linear and nonlinear processing units in a single chip remains a central challenge. Here we introduce such a system in a scalable photonic integrated circuit (PIC), enabled by several key advances: (i) high-bandwidth and low-power programmable nonlinear optical function units (NOFUs); (ii) coherent matrix multiplication units (CMXUs); and (iii) in situ training with optical acceleration. We experimentally demonstrate this fully-integrated coherent optical neural network (FICONN) architecture for a 3-layer DNN comprising 12 NOFUs and three CMXUs operating in the telecom C-band. Using in situ training on a vowel classification task, the FICONN achieves 92.7% accuracy on a test set, which is identical to the accuracy obtained on a digital computer with the same number of weights. This work lends experimental evidence to theoretical proposals for in situ training, unlocking orders of magnitude improvements in the throughput of training data. Moreover, the FICONN opens the path to inference at nanosecond latency and femtojoule per operation energy efficiency.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Deep Learning with Coherent VCSEL Neural Networks
Authors:
Zaijun Chen,
Alexander Sludds,
Ronald Davis,
Ian Christen,
Liane Bernstein,
Tobias Heuser,
Niels Heermeier,
James A. Lott,
Stephan Reitzenstein,
Ryan Hamerly,
Dirk Englund
Abstract:
Deep neural networks (DNNs) are resha** the field of information processing. With their exponential growth challenging existing electronic hardware, optical neural networks (ONNs) are emerging to process DNN tasks in the optical domain with high clock rates, parallelism and low-loss data transmission. However, to explore the potential of ONNs, it is necessary to investigate the full-system perfo…
▽ More
Deep neural networks (DNNs) are resha** the field of information processing. With their exponential growth challenging existing electronic hardware, optical neural networks (ONNs) are emerging to process DNN tasks in the optical domain with high clock rates, parallelism and low-loss data transmission. However, to explore the potential of ONNs, it is necessary to investigate the full-system performance incorporating the major DNN elements, including matrix algebra and nonlinear activation. Existing challenges to ONNs are high energy consumption due to low electro-optic (EO) conversion efficiency, low compute density due to large device footprint and channel crosstalk, and long latency due to the lack of inline nonlinearity. Here we experimentally demonstrate an ONN system that simultaneously overcomes all these challenges. We exploit neuron encoding with volume-manufactured micron-scale vertical-cavity surface-emitting laser (VCSEL) transmitter arrays that exhibit high EO conversion (<5 attojoule/symbol with $V_π$=4 mV), high operation bandwidth (up to 25 GS/s), and compact footprint (<0.01 mm$^2$ per device). Photoelectric multiplication allows low-energy matrix operations at the shot-noise quantum limit. Homodyne detection-based nonlinearity enables nonlinear activation with instantaneous response. The full-system energy efficiency and compute density reach 7 femtojoules per operation (fJ/OP) and 25 TeraOP/(mm$^2\cdot$ s), both representing a >100-fold improvement over state-of-the-art digital computers, with substantially several more orders of magnitude for future improvement. Beyond neural network inference, its feature of rapid weight updating is crucial for training deep learning models. Our technique opens an avenue to large-scale optoelectronic processors to accelerate machine learning tasks from data centers to decentralized edge devices.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Netcast: Low-Power Edge Computing with WDM-defined Optical Neural Networks
Authors:
Ryan Hamerly,
Alexander Sludds,
Saumil Bandyopadhyay,
Zaijun Chen,
Zhizhen Zhong,
Liane Bernstein,
Dirk Englund
Abstract:
This paper analyzes the performance and energy efficiency of Netcast, a recently proposed optical neural-network architecture designed for edge computing. Netcast performs deep neural network inference by dividing the computational task into two steps, which are split between the server and (edge) client: (1) the server employs a wavelength-multiplexed modulator array to encode the network's weigh…
▽ More
This paper analyzes the performance and energy efficiency of Netcast, a recently proposed optical neural-network architecture designed for edge computing. Netcast performs deep neural network inference by dividing the computational task into two steps, which are split between the server and (edge) client: (1) the server employs a wavelength-multiplexed modulator array to encode the network's weights onto an optical signal in an analog time-frequency basis, and (2) the client obtains the desired matrix-vector product through modulation and time-integrated detection. The simultaneous use of wavelength multiplexing, broadband modulation, and integration detection allows large neural networks to be run at the client by effectively pushing the energy and memory requirements back to the server. The performance and energy efficiency are fundamentally limited by crosstalk and detector noise, respectively. We derive analytic expressions for these limits and perform numerical simulations to verify these bounds.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Single-Shot Optical Neural Network
Authors:
Liane Bernstein,
Alexander Sludds,
Christopher Panuski,
Sivan Trajtenberg-Mills,
Ryan Hamerly,
Dirk Englund
Abstract:
As deep neural networks (DNNs) grow to solve increasingly complex problems, they are becoming limited by the latency and power consumption of existing digital processors. For improved speed and energy efficiency, specialized analog optical and electronic hardware has been proposed, however, with limited scalability (input vector length $K$ of hundreds of elements). Here, we present a scalable, sin…
▽ More
As deep neural networks (DNNs) grow to solve increasingly complex problems, they are becoming limited by the latency and power consumption of existing digital processors. For improved speed and energy efficiency, specialized analog optical and electronic hardware has been proposed, however, with limited scalability (input vector length $K$ of hundreds of elements). Here, we present a scalable, single-shot-per-layer analog optical processor that uses free-space optics to reconfigurably distribute an input vector and integrated optoelectronics for static, updatable weighting and the nonlinearity -- with $K \approx 1,000$ and beyond. We experimentally test classification accuracy of the MNIST handwritten digit dataset, achieving 94.7% (ground truth 96.3%) without data preprocessing or retraining on the hardware. We also determine the fundamental upper bound on throughput ($\sim$0.9 exaMAC/s), set by the maximum optical bandwidth before significant increase in error. Our combination of wide spectral and spatial bandwidths in a CMOS-compatible system enables highly efficient computing for next-generation DNNs.
△ Less
Submitted 22 June, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Delocalized Photonic Deep Learning on the Internet's Edge
Authors:
Alexander Sludds,
Saumil Bandyopadhyay,
Zaijun Chen,
Zhizhen Zhong,
Jared Cochrane,
Liane Bernstein,
Darius Bunandar,
P. Ben Dixon,
Scott A. Hamilton,
Matthew Streshinsky,
Ari Novack,
Tom Baehr-Jones,
Michael Hochberg,
Manya Ghobadi,
Ryan Hamerly,
Dirk Englund
Abstract:
Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on low-power devices, such as smartphones and sensors -- and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are underway to lower power consumption, but a fundamental…
▽ More
Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on low-power devices, such as smartphones and sensors -- and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are underway to lower power consumption, but a fundamental bottleneck remains due to energy consumption in matrix algebra, even for analog approaches including neuromorphic, analog memory and photonic meshes. Here we introduce and demonstrate a new approach that sharply reduces energy required for matrix algebra by doing away with weight memory access on edge devices, enabling orders of magnitude energy and latency reduction. At the core of our approach is a new concept that decentralizes the DNN for delocalized, optically accelerated matrix algebra on edge devices. Using a silicon photonic smart transceiver, we demonstrate experimentally that this scheme, termed Netcast, dramatically reduces energy consumption. We demonstrate operation in a photon-starved environment with 40 aJ/multiply of optical energy for 98.8% accurate image recognition and <1 photon/multiply using single photon detectors. Furthermore, we show realistic deployment of our system, classifying images with 3 THz of bandwidth over 86 km of deployed optical fiber in a Boston-area fiber network. Our approach enables computing on a new generation of edge devices with speeds comparable to modern digital electronics and power consumption that is orders of magnitude lower.
△ Less
Submitted 1 April, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Freely scalable and reconfigurable optical hardware for deep learning
Authors:
Liane Bernstein,
Alexander Sludds,
Ryan Hamerly,
Vivienne Sze,
Joel Emer,
Dirk Englund
Abstract:
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neur…
▽ More
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neural network (DONN) with intralayer optical interconnects and reconfigurable input values. The near path-length-independence of optical energy consumption enables information locality between a transmitter and arbitrarily arranged receivers, which allows greater flexibility in architecture design to circumvent scaling limitations. In a proof-of-concept experiment, we demonstrate optical multicast in the classification of 500 MNIST images with a 3-layer, fully-connected network. We also analyze the energy consumption of the DONN and find that optical data transfer is beneficial over electronics when the spacing of computational units is on the order of >10 micrometers.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Large-Scale Optical Neural Networks based on Photoelectric Multiplication
Authors:
Ryan Hamerly,
Liane Bernstein,
Alexander Sludds,
Marin Soljačić,
Dirk Englund
Abstract:
Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large ($N \gtrsim 10^6$) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spa…
▽ More
Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large ($N \gtrsim 10^6$) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit- and image-classification reveal a "standard quantum limit" for optical neural networks, set by photodetector shot noise. This bound, which can be as low as 50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully-connected and convolutional networks. We also present a scheme for back-propagation and training that can be performed in the same hardware. This architecture will enable a new class of ultra-low-energy processors for deep learning.
△ Less
Submitted 18 May, 2019; v1 submitted 12 November, 2018;
originally announced December 2018.