-
Training of Physical Neural Networks
Authors:
Ali Momeni,
Babak Rahmani,
Benjamin Scellier,
Logan G. Wright,
Peter L. McMahon,
Clara C. Wanjura,
Yuhang Li,
Anas Skalli,
Natalia G. Berloff,
Tatsuhiro Onodera,
Ilker Oguz,
Francesco Morichetti,
Philipp del Hougne,
Manuel Le Gallo,
Abu Sebastian,
Azalia Mirhoseini,
Cheng Zhang,
Danijela Marković,
Daniel Brunner,
Christophe Moser,
Sylvain Gigan,
Florian Marquardt,
Aydogan Ozcan,
Julie Grollier,
Andrea J. Liu
, et al. (3 additional authors not shown)
Abstract:
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also…
▽ More
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Multicasting Optical Reconfigurable Switch
Authors:
Niyazi Ulas Dinc,
Mustafa Yildirim,
Ilker Oguz,
Christophe Moser,
Demetri Psaltis
Abstract:
Artificial Intelligence (AI) demands large data flows within datacenters, heavily relying on multicasting data transfers. As AI models scale, the requirement for high-bandwidth and low-latency networking compounds. The common use of electrical packet switching faces limitations due to optical-electrical-optical conversion bottlenecks. Optical switches, while bandwidth-agnostic and low-latency, suf…
▽ More
Artificial Intelligence (AI) demands large data flows within datacenters, heavily relying on multicasting data transfers. As AI models scale, the requirement for high-bandwidth and low-latency networking compounds. The common use of electrical packet switching faces limitations due to optical-electrical-optical conversion bottlenecks. Optical switches, while bandwidth-agnostic and low-latency, suffer from having only unicast or non-scalable multicasting capability. This paper introduces an optical switching technique addressing this challenge. Our approach enables arbitrarily programmable simultaneous unicast and multicast connectivity, eliminating the need for optical splitters that hinder scalability due to optical power loss. We use phase modulation in multiple layers, tailored to implement any multicast connectivity map. Phase modulation also enables wavelength selectivity on top of spatial selectivity, resulting in an optical switch that implements space-wavelength routing. We conducted simulations and experiments to validate our approach. Our results affirm the concept's feasibility, effectiveness, and scalability, as a multicasting switch by experimentally demonstrating 16 spatial ports using 2 wavelength channels. Numerically, 64 spatial ports with 4 wavelength channels each were simulated, with approximately constant efficiency (< 3 dB) as ports and wavelength channels scale.
△ Less
Submitted 28 February, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Nonlinear Processing with Linear Optics
Authors:
Mustafa Yildirim,
Niyazi Ulas Dinc,
Ilker Oguz,
Demetri Psaltis,
Christophe Moser
Abstract:
Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the…
▽ More
Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light. Moreover, we empirically found that scaling of this optical framework follows the power law as in state-of-the-art deep digital networks.
△ Less
Submitted 13 February, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Forward-Forward Training of an Optical Neural Network
Authors:
Ilker Oguz,
Junjie Ke,
Qifei Wang,
Feng Yang,
Mustafa Yildirim,
Niyazi Ulas Dinc,
Jih-Liang Hsieh,
Christophe Moser,
Demetri Psaltis
Abstract:
Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these…
▽ More
Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these physical systems poses challenges, as they are difficult to fully characterize and describe with differentiable functions, hindering the use of error backpropagation algorithm. The recently introduced Forward-Forward Algorithm (FFA) eliminates the need for perfect characterization of the learning system and shows promise for efficient training with large numbers of programmable parameters. The FFA does not require backpropagating an error signal to update the weights, rather the weights are updated by only sending information in one direction. The local loss function for each set of trainable weights enables low-power analog hardware implementations without resorting to metaheuristic algorithms or reinforcement learning. In this paper, we present an experiment utilizing multimode nonlinear wave propagation in an optical fiber demonstrating the feasibility of the FFA approach using an optical system. The results show that incorporating optical transforms in multilayer NN architectures trained with the FFA, can lead to performance improvements, even with a relatively small number of trainable weights. The proposed method offers a new path to the challenge of training optical NNs and provides insights into leveraging physical transformations for enhancing NN performance.
△ Less
Submitted 10 August, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Decentralized core-periphery structure in social networks accelerates cultural innovation in agent-based model
Authors:
Jesse Milzman,
Cody Moser
Abstract:
Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378…
▽ More
Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378-8733(99)00019-2], we distinguish decentralized core-periphery, centralized core-periphery, and affinity network structure. We generate networks of these three classes from stochastic block models (SBMs), and use them to run an agent-based model (ABM) of collective cultural innovation, in which agents can only directly interact with their network neighbors. In order to discover the highest-scoring innovation, agents must discover and combine the highest innovations from two completely parallel technology trees. We find that decentralized core-periphery networks outperform the others by finding the final crossover innovation more quickly on average. We hypothesize that decentralized core-periphery network structure accelerates collective problem-solving by shielding peripheral nodes from the local optima known by the core community at any given time. We then build upon the "Two Truths" hypothesis regarding community structure in spectral graph embeddings, first articulated in [arXiv:1808.07801], which suggests that the adjacency spectral embedding (ASE) captures core-periphery structure, while the Laplacian spectral embedding (LSE) captures affinity. We find that, for core-periphery networks, ASE-based resampling best recreates networks with similar performance on the innovation SBM, compared to LSE-based resampling. Since the Two Truths hypothesis suggests that ASE captures core-periphery structure, this result further supports our hypothesis.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Nonlinear Optical Data Transformer for Machine Learning
Authors:
Mustafa Yildirim,
Ilker Oguz,
Fabian Kaufmann,
Marc Reig Escale,
Rachel Grange,
Demetri Psaltis,
Christophe Moser
Abstract:
Modern machine learning models use an ever-increasing number of parameters to train (175 billion parameters for GPT-3) with large datasets to obtain better performance. Bigger is better has been the norm. Optical computing has been reawakened as a potential solution to large-scale computing through optical accelerators that carry out linear operations while reducing electrical power. However, to a…
▽ More
Modern machine learning models use an ever-increasing number of parameters to train (175 billion parameters for GPT-3) with large datasets to obtain better performance. Bigger is better has been the norm. Optical computing has been reawakened as a potential solution to large-scale computing through optical accelerators that carry out linear operations while reducing electrical power. However, to achieve efficient computing with light, creating and controlling nonlinearity optically rather than electronically remains a challenge. This study explores a reservoir computing (RC) approach whereby a 14 mm long few-mode waveguide in LiNbO3 on insulator is used as a complex nonlinear optical processor. A dataset is encoded digitally on the spectrum of a femtosecond pulse which is then launched in the waveguide. The output spectrum depends nonlinearly on the input. We experimentally show that a simple digital linear classifier with 784 parameters using the output spectrum from the waveguide as input increased the classification accuracy of several databases compared to non-transformed data, approximately 10$\%$. In comparison, a deep digital neural network (NN) with 40000 parameters was necessary to achieve the same accuracy. Reducing the number of parameters by a factor of $\sim$50 illustrates that a compact optical RC approach can perform on par with a deep digital NN.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Programming Nonlinear Propagation for Efficient Optical Learning Machines
Authors:
Ilker Oguz,
Jih-Liang Hsieh,
Niyazi Ulas Dinc,
Uğur Teğin,
Mustafa Yildirim,
Carlo Gigli,
Christophe Moser,
Demetri Psaltis
Abstract:
The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with l…
▽ More
The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with light, generating and controlling nonlinearity optically is a necessity that is still elusive. Multimode fibers (MMF) have been shown that they can provide nonlinear effects with microwatts of average power while maintaining parallelism and low loss. In this work, we propose an optical neural network architecture, which performs nonlinear optical computation by controlling the propagation of ultrashort pulses in MMF by wavefront sha**. With a surrogate model, optimal sets of parameters are found to program this optical computer for different tasks with minimal utilization of an electronic computer. We show a remarkable decrease of 97% in the number of model parameters, which leads to an overall 99% digital operation reduction compared to an equivalently performing digital neural network. We further demonstrate that a fully optical implementation can also be performed with competitive accuracies.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Variational framework for partially-measured physical system control: examples of vision neuroscience and optical random media
Authors:
Babak Rahmani,
Demetri Psaltis,
Christophe Moser
Abstract:
To characterize a physical system to behave as desired, either its underlying governing rules must be known a priori or the system itself be accurately measured. The complexity of full measurements of the system scales with its size. When exposed to real-world conditions, such as perturbations or time-varying settings, the system calibrated for a fixed working condition might require non-trivial r…
▽ More
To characterize a physical system to behave as desired, either its underlying governing rules must be known a priori or the system itself be accurately measured. The complexity of full measurements of the system scales with its size. When exposed to real-world conditions, such as perturbations or time-varying settings, the system calibrated for a fixed working condition might require non-trivial re-calibration, a process that could be prohibitively expensive, inefficient and impractical for real-world use cases. In this work, we propose a learning procedure to obtain a desired target output from a physical system. We use Variational Auto-Encoders (VAE) to provide a generative model of the system function and use this model to obtain the required input of the system that produces the target output. We showcase the applicability of our method for two datasets in optical physics and neuroscience.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Hit by the Data: a visual data analysis regarding the effects of traffic public policies
Authors:
Luana Müller,
Camila Moser,
Guilherme Paris,
Lucas Freitas,
Mayara Oliveira,
Wagner Signoretti,
Isabel Harb Manssour,
Milene Selbach Silveira
Abstract:
The availability of Open Government Data (OGD) provides means for citizens to understand and follow governmental policies and decisions, showing evidence of how the latter have contributed to both the place they live in and their lives. In such a scenario, one of the proposals is the use of visualizations to support the process of data analysis and interpretation. Herein, we present the use of thr…
▽ More
The availability of Open Government Data (OGD) provides means for citizens to understand and follow governmental policies and decisions, showing evidence of how the latter have contributed to both the place they live in and their lives. In such a scenario, one of the proposals is the use of visualizations to support the process of data analysis and interpretation. Herein, we present the use of three different visualization tools, a commercial one and two academic ones, applied to two specific Brazilian cases: the implementation of the Drink Driving Law and the construction of a new overpass in an important city avenue. Our focus was on the analysis of how visualization could help in the identification of the effects of such traffic public policies. As our main contributions, we present details on the effects of the observed policies, as well as new cases showing how visualization tools can assist users to interpret OGD.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Scalable Optical Learning Operator
Authors:
Uğur Teğin,
Mustafa Yıldırım,
İlker Oğuz,
Christophe Moser,
Demetri Psaltis
Abstract:
Today's heavy machine learning tasks are fueled by large datasets. Computing is performed with power hungry processors whose performance is ultimately limited by the data transfer to and from memory. Optics is one of the powerful means of communicating and processing information and there is intense current interest in optical information processing for realizing high-speed computations. Here we p…
▽ More
Today's heavy machine learning tasks are fueled by large datasets. Computing is performed with power hungry processors whose performance is ultimately limited by the data transfer to and from memory. Optics is one of the powerful means of communicating and processing information and there is intense current interest in optical information processing for realizing high-speed computations. Here we present and experimentally demonstrate an optical computing framework based on spatiotemporal effects in multimode fibers for a range of learning tasks from classifying COVID-19 X-ray lung images and speech recognition to predicting age from face images. The presented framework overcomes the energy scaling problem of existing systems without compromising speed. We leveraged simultaneous, linear, and nonlinear interaction of spatial modes as a computation engine. We numerically and experimentally showed the ability of the method to execute several different tasks with accuracy comparable to a digital implementation.
△ Less
Submitted 26 May, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Large-scale randomized experiment reveals machine learning helps people learn and remember more effectively
Authors:
Utkarsh Upadhyay,
Graham Lancashire,
Christoph Moser,
Manuel Gomez-Rodriguez
Abstract:
Machine learning has typically focused on develo** models and algorithms that would ultimately replace humans at tasks where intelligence is required. In this work, rather than replacing humans, we focus on unveiling the potential of machine learning to improve how people learn and remember factual material. To this end, we perform a large-scale randomized controlled trial with thousands of lear…
▽ More
Machine learning has typically focused on develo** models and algorithms that would ultimately replace humans at tasks where intelligence is required. In this work, rather than replacing humans, we focus on unveiling the potential of machine learning to improve how people learn and remember factual material. To this end, we perform a large-scale randomized controlled trial with thousands of learners from a popular learning app in the area of mobility. After controlling for the length and frequency of study, we find that learners whose study sessions are optimized using machine learning remember the content over $\sim$67% longer than those whose study sessions are generated using two alternative heuristics. Our randomized controlled trial also reveals that the learners whose study sessions are optimized using machine learning are $\sim$50% more likely to return to the app within 4-7 days.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Competing Neural Networks for Robust Control of Nonlinear Systems
Authors:
Babak Rahmani,
Damien Loterie,
Eirini Kakkava,
Navid Borhani,
Uğur Teğin,
Demetri Psaltis,
Christophe Moser
Abstract:
The output of physical systems is often accessible by measurements such as the 3D position of a robotic arm actuated by many actuators or the speckle patterns formed by shining the spot of a laser pointer on a wall. The selection of the input of such a system (actuators and the shape of the laser spot respectively) to obtain a desired output is difficult because it is an ill-posed problem i.e. the…
▽ More
The output of physical systems is often accessible by measurements such as the 3D position of a robotic arm actuated by many actuators or the speckle patterns formed by shining the spot of a laser pointer on a wall. The selection of the input of such a system (actuators and the shape of the laser spot respectively) to obtain a desired output is difficult because it is an ill-posed problem i.e. there are multiple inputs yielding the same output. In this paper, we propose an approach that provides a robust solution to this dilemma for any physical system. We show that it is possible to find the appropriate input of a system that results in a desired output, despite the input-output relation being nonlinear and\or with incomplete measurements of the systems variables. We showcase our approach using an extremely ill-posed problem in imaging. We demonstrate the projection of arbitrary shapes through a multimode fiber (MMF) when a sample of intensity-only measurements are taken at the output. We show image projection fidelity as high as ~90 %, which is on par with the gold standard methods which characterize the system fully by phase and amplitude measurements. The generality as well as simplicity of the proposed approach provides a new way of target-oriented control in real-world applications.
△ Less
Submitted 3 February, 2020; v1 submitted 28 June, 2019;
originally announced July 2019.
-
Preferential Attachment in Online Networks: Measurement and Explanations
Authors:
Jérôme Kunegis,
Marcel Blattner,
Christine Moser
Abstract:
We perform an empirical study of the preferential attachment phenomenon in temporal networks and show that on the Web, networks follow a nonlinear preferential attachment model in which the exponent depends on the type of network considered. The classical preferential attachment model for networks by Barabási and Albert (1999) assumes a linear relationship between the number of neighbors of a node…
▽ More
We perform an empirical study of the preferential attachment phenomenon in temporal networks and show that on the Web, networks follow a nonlinear preferential attachment model in which the exponent depends on the type of network considered. The classical preferential attachment model for networks by Barabási and Albert (1999) assumes a linear relationship between the number of neighbors of a node in a network and the probability of attachment. Although this assumption is widely made in Web Science and related fields, the underlying linearity is rarely measured. To fill this gap, this paper performs an empirical longitudinal (time-based) study on forty-seven diverse Web network datasets from seven network categories and including directed, undirected and bipartite networks. We show that contrary to the usual assumption, preferential attachment is nonlinear in the networks under consideration. Furthermore, we observe that the deviation from linearity is dependent on the type of network, giving sublinear attachment in certain types of networks, and superlinear attachment in others. Thus, we introduce the preferential attachment exponent $β$ as a novel numerical network measure that can be used to discriminate different types of networks. We propose explanations for the behavior of that network measure, based on the mechanisms that underly the growth of the network in question.
△ Less
Submitted 23 March, 2013;
originally announced March 2013.