-
Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra
Authors:
Daniel Hesslow,
Alessandro Cappelli,
Igor Carron,
Laurent Daudet,
Raphaël Lafargue,
Kilian Müller,
Ruben Ohana,
Gustave Pariente,
Iacopo Poli
Abstract:
Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near…
▽ More
Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.
△ Less
Submitted 7 May, 2021; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Authors:
Julien Launay,
Iacopo Poli,
Kilian Müller,
Gustave Pariente,
Igor Carron,
Laurent Daudet,
Florent Krzakala,
Sylvain Gigan
Abstract:
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute n…
▽ More
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignement, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Light-in-the-loop: using a photonics co-processor for scalable training of neural networks
Authors:
Julien Launay,
Iacopo Poli,
Kilian Müller,
Igor Carron,
Laurent Daudet,
Florent Krzakala,
Sylvain Gigan
Abstract:
As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks. We rely on direct feedback…
▽ More
As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks. We rely on direct feedback alignment as an alternative to backpropagation, and perform the error projection step optically. Leveraging the optical random projections delivered by our co-processor, we demonstrate its use to train a neural network for handwritten digits recognition.
△ Less
Submitted 3 June, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Don't take it lightly: Phasing optical random projections with unknown operators
Authors:
Sidharth Gupta,
Rémi Gribonval,
Laurent Daudet,
Ivan Dokmanić
Abstract:
In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbfξ \in \mathbb{R}^N$ is mixed by a random sca…
▽ More
In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbfξ \in \mathbb{R}^N$ is mixed by a random scattering medium to compute the projection $\mathbf{y} = \mathbf{A} \mathbfξ$, with $\mathbf{A} \in \mathbb{C}^{M \times N}$ being a realization of a standard complex Gaussian iid random matrix. Such optics-based matrix multiplications can be much faster and energy-efficient than their CPU or GPU counterparts, yet two difficulties must be resolved: only the intensity ${|\mathbf{y}|}^2$ can be recorded by the camera, and the transmission matrix $\mathbf{A}$ is unknown. We show that even without knowing $\mathbf{A}$, we can recover the unknown phase of $\mathbf{y}$ for some equivalent transmission matrix with the same distribution as $\mathbf{A}$. Our method is based on two observations: first, conjugating or changing the phase of any row of $\mathbf{A}$ does not change its distribution; and second, since we control the input we can interfere $\mathbfξ$ with arbitrary reference signals. We show how to leverage these observations to cast the measurement phase retrieval problem as a Euclidean distance geometry problem. We demonstrate appealing properties of the proposed algorithm in both numerical simulations and real hardware experiments. Not only does our algorithm accurately recover the missing phase, but it mitigates the effects of quantization and the sensitivity threshold, thus improving the measured magnitudes.
△ Less
Submitted 13 February, 2020; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Dictionary Subselection Using an Overcomplete Joint Sparsity Model
Authors:
Mehrdad Yaghoobi,
Laurent Daudet,
Michael E. Davies
Abstract:
Many natural signals exhibit a sparse representation, whenever a suitable describing model is given. Here, a linear generative model is considered, where many sparsity-based signal processing techniques rely on such a simplified model. As this model is often unknown for many classes of the signals, we need to select such a model based on the domain knowledge or using some exemplar signals. This pa…
▽ More
Many natural signals exhibit a sparse representation, whenever a suitable describing model is given. Here, a linear generative model is considered, where many sparsity-based signal processing techniques rely on such a simplified model. As this model is often unknown for many classes of the signals, we need to select such a model based on the domain knowledge or using some exemplar signals. This paper presents a new exemplar based approach for the linear model (called the dictionary) selection, for such sparse inverse problems. The problem of dictionary selection, which has also been called the dictionary learning in this setting, is first reformulated as a joint sparsity model. The joint sparsity model here differs from the standard joint sparsity model as it considers an overcompleteness in the representation of each signal, within the range of selected subspaces. The new dictionary selection paradigm is examined with some synthetic and realistic simulations.
△ Less
Submitted 10 June, 2013; v1 submitted 12 December, 2012;
originally announced December 2012.