Search | arXiv e-print repository

Training of Physical Neural Networks

Authors: Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Marković, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu , et al. (3 additional authors not shown)

Abstract: Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also… ▽ More Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 29 pages, 4 figures

arXiv:2401.14173 [pdf]

Multicasting Optical Reconfigurable Switch

Authors: Niyazi Ulas Dinc, Mustafa Yildirim, Ilker Oguz, Christophe Moser, Demetri Psaltis

Abstract: Artificial Intelligence (AI) demands large data flows within datacenters, heavily relying on multicasting data transfers. As AI models scale, the requirement for high-bandwidth and low-latency networking compounds. The common use of electrical packet switching faces limitations due to optical-electrical-optical conversion bottlenecks. Optical switches, while bandwidth-agnostic and low-latency, suf… ▽ More Artificial Intelligence (AI) demands large data flows within datacenters, heavily relying on multicasting data transfers. As AI models scale, the requirement for high-bandwidth and low-latency networking compounds. The common use of electrical packet switching faces limitations due to optical-electrical-optical conversion bottlenecks. Optical switches, while bandwidth-agnostic and low-latency, suffer from having only unicast or non-scalable multicasting capability. This paper introduces an optical switching technique addressing this challenge. Our approach enables arbitrarily programmable simultaneous unicast and multicast connectivity, eliminating the need for optical splitters that hinder scalability due to optical power loss. We use phase modulation in multiple layers, tailored to implement any multicast connectivity map. Phase modulation also enables wavelength selectivity on top of spatial selectivity, resulting in an optical switch that implements space-wavelength routing. We conducted simulations and experiments to validate our approach. Our results affirm the concept's feasibility, effectiveness, and scalability, as a multicasting switch by experimentally demonstrating 16 spatial ports using 2 wavelength channels. Numerically, 64 spatial ports with 4 wavelength channels each were simulated, with approximately constant efficiency (< 3 dB) as ports and wavelength channels scale. △ Less

Submitted 28 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 17 pages, 4 figures, article

arXiv:2307.08533 [pdf]

Nonlinear Processing with Linear Optics

Authors: Mustafa Yildirim, Niyazi Ulas Dinc, Ilker Oguz, Demetri Psaltis, Christophe Moser

Abstract: Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the… ▽ More Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light. Moreover, we empirically found that scaling of this optical framework follows the power law as in state-of-the-art deep digital networks. △ Less

Submitted 13 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 15 pages and 5 figures

arXiv:2305.19170 [pdf]

Forward-Forward Training of an Optical Neural Network

Authors: Ilker Oguz, Junjie Ke, Qifei Wang, Feng Yang, Mustafa Yildirim, Niyazi Ulas Dinc, Jih-Liang Hsieh, Christophe Moser, Demetri Psaltis

Abstract: Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these… ▽ More Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these physical systems poses challenges, as they are difficult to fully characterize and describe with differentiable functions, hindering the use of error backpropagation algorithm. The recently introduced Forward-Forward Algorithm (FFA) eliminates the need for perfect characterization of the learning system and shows promise for efficient training with large numbers of programmable parameters. The FFA does not require backpropagating an error signal to update the weights, rather the weights are updated by only sending information in one direction. The local loss function for each set of trainable weights enables low-power analog hardware implementations without resorting to metaheuristic algorithms or reinforcement learning. In this paper, we present an experiment utilizing multimode nonlinear wave propagation in an optical fiber demonstrating the feasibility of the FFA approach using an optical system. The results show that incorporating optical transforms in multilayer NN architectures trained with the FFA, can lead to performance improvements, even with a relatively small number of trainable weights. The proposed method offers a new path to the challenge of training optical NNs and provides insights into leveraging physical transformations for enhancing NN performance. △ Less

Submitted 10 August, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2302.12121 [pdf, other]

Decentralized core-periphery structure in social networks accelerates cultural innovation in agent-based model

Authors: Jesse Milzman, Cody Moser

Abstract: Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378… ▽ More Previous investigations into creative and innovation networks have suggested that innovations often occurs at the boundary between the network's core and periphery. In this work, we investigate the effect of global core-periphery network structure on the speed and quality of cultural innovation. Drawing on differing notions of core-periphery structure from [arXiv:1808.07801] and [doi:10.1016/S0378-8733(99)00019-2], we distinguish decentralized core-periphery, centralized core-periphery, and affinity network structure. We generate networks of these three classes from stochastic block models (SBMs), and use them to run an agent-based model (ABM) of collective cultural innovation, in which agents can only directly interact with their network neighbors. In order to discover the highest-scoring innovation, agents must discover and combine the highest innovations from two completely parallel technology trees. We find that decentralized core-periphery networks outperform the others by finding the final crossover innovation more quickly on average. We hypothesize that decentralized core-periphery network structure accelerates collective problem-solving by shielding peripheral nodes from the local optima known by the core community at any given time. We then build upon the "Two Truths" hypothesis regarding community structure in spectral graph embeddings, first articulated in [arXiv:1808.07801], which suggests that the adjacency spectral embedding (ASE) captures core-periphery structure, while the Laplacian spectral embedding (LSE) captures affinity. We find that, for core-periphery networks, ASE-based resampling best recreates networks with similar performance on the innovation SBM, compared to LSE-based resampling. Since the Two Truths hypothesis suggests that ASE captures core-periphery structure, this result further supports our hypothesis. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 9 pages, 5 figures, AAMAS 2023 (accepted)

MSC Class: 05C90 (Primary) 91D10; 91D30 (Secondary) ACM Class: J.4; G.2.2; G.2.3

arXiv:2208.09398 [pdf]

Nonlinear Optical Data Transformer for Machine Learning

Authors: Mustafa Yildirim, Ilker Oguz, Fabian Kaufmann, Marc Reig Escale, Rachel Grange, Demetri Psaltis, Christophe Moser

Abstract: Modern machine learning models use an ever-increasing number of parameters to train (175 billion parameters for GPT-3) with large datasets to obtain better performance. Bigger is better has been the norm. Optical computing has been reawakened as a potential solution to large-scale computing through optical accelerators that carry out linear operations while reducing electrical power. However, to a… ▽ More Modern machine learning models use an ever-increasing number of parameters to train (175 billion parameters for GPT-3) with large datasets to obtain better performance. Bigger is better has been the norm. Optical computing has been reawakened as a potential solution to large-scale computing through optical accelerators that carry out linear operations while reducing electrical power. However, to achieve efficient computing with light, creating and controlling nonlinearity optically rather than electronically remains a challenge. This study explores a reservoir computing (RC) approach whereby a 14 mm long few-mode waveguide in LiNbO3 on insulator is used as a complex nonlinear optical processor. A dataset is encoded digitally on the spectrum of a femtosecond pulse which is then launched in the waveguide. The output spectrum depends nonlinearly on the input. We experimentally show that a simple digital linear classifier with 784 parameters using the output spectrum from the waveguide as input increased the classification accuracy of several databases compared to non-transformed data, approximately 10$\%$. In comparison, a deep digital neural network (NN) with 40000 parameters was necessary to achieve the same accuracy. Reducing the number of parameters by a factor of $\sim$50 illustrates that a compact optical RC approach can perform on par with a deep digital NN. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: 13 pages, 3 figures and 1 table

arXiv:2208.04951 [pdf]

Programming Nonlinear Propagation for Efficient Optical Learning Machines

Authors: Ilker Oguz, Jih-Liang Hsieh, Niyazi Ulas Dinc, Uğur Teğin, Mustafa Yildirim, Carlo Gigli, Christophe Moser, Demetri Psaltis

Abstract: The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with l… ▽ More The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with light, generating and controlling nonlinearity optically is a necessity that is still elusive. Multimode fibers (MMF) have been shown that they can provide nonlinear effects with microwatts of average power while maintaining parallelism and low loss. In this work, we propose an optical neural network architecture, which performs nonlinear optical computation by controlling the propagation of ultrashort pulses in MMF by wavefront sha**. With a surrogate model, optimal sets of parameters are found to program this optical computer for different tasks with minimal utilization of an electronic computer. We show a remarkable decrease of 97% in the number of model parameters, which leads to an overall 99% digital operation reduction compared to an equivalently performing digital neural network. We further demonstrate that a fully optical implementation can also be performed with competitive accuracies. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 32 pages, 11 figures

arXiv:2110.13228 [pdf, other]

Variational framework for partially-measured physical system control: examples of vision neuroscience and optical random media

Authors: Babak Rahmani, Demetri Psaltis, Christophe Moser

Abstract: To characterize a physical system to behave as desired, either its underlying governing rules must be known a priori or the system itself be accurately measured. The complexity of full measurements of the system scales with its size. When exposed to real-world conditions, such as perturbations or time-varying settings, the system calibrated for a fixed working condition might require non-trivial r… ▽ More To characterize a physical system to behave as desired, either its underlying governing rules must be known a priori or the system itself be accurately measured. The complexity of full measurements of the system scales with its size. When exposed to real-world conditions, such as perturbations or time-varying settings, the system calibrated for a fixed working condition might require non-trivial re-calibration, a process that could be prohibitively expensive, inefficient and impractical for real-world use cases. In this work, we propose a learning procedure to obtain a desired target output from a physical system. We use Variational Auto-Encoders (VAE) to provide a generative model of the system function and use this model to obtain the required input of the system that produces the target output. We showcase the applicability of our method for two datasets in optical physics and neuroscience. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2102.07621 [pdf, other]

Hit by the Data: a visual data analysis regarding the effects of traffic public policies

Authors: Luana Müller, Camila Moser, Guilherme Paris, Lucas Freitas, Mayara Oliveira, Wagner Signoretti, Isabel Harb Manssour, Milene Selbach Silveira

Abstract: The availability of Open Government Data (OGD) provides means for citizens to understand and follow governmental policies and decisions, showing evidence of how the latter have contributed to both the place they live in and their lives. In such a scenario, one of the proposals is the use of visualizations to support the process of data analysis and interpretation. Herein, we present the use of thr… ▽ More The availability of Open Government Data (OGD) provides means for citizens to understand and follow governmental policies and decisions, showing evidence of how the latter have contributed to both the place they live in and their lives. In such a scenario, one of the proposals is the use of visualizations to support the process of data analysis and interpretation. Herein, we present the use of three different visualization tools, a commercial one and two academic ones, applied to two specific Brazilian cases: the implementation of the Drink Driving Law and the construction of a new overpass in an important city avenue. Our focus was on the analysis of how visualization could help in the identification of the effects of such traffic public policies. As our main contributions, we present details on the effects of the observed policies, as well as new cases showing how visualization tools can assist users to interpret OGD. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2012.12404 [pdf]

doi 10.1038/s43588-021-00112-0

Scalable Optical Learning Operator

Authors: Uğur Teğin, Mustafa Yıldırım, İlker Oğuz, Christophe Moser, Demetri Psaltis

Abstract: Today's heavy machine learning tasks are fueled by large datasets. Computing is performed with power hungry processors whose performance is ultimately limited by the data transfer to and from memory. Optics is one of the powerful means of communicating and processing information and there is intense current interest in optical information processing for realizing high-speed computations. Here we p… ▽ More Today's heavy machine learning tasks are fueled by large datasets. Computing is performed with power hungry processors whose performance is ultimately limited by the data transfer to and from memory. Optics is one of the powerful means of communicating and processing information and there is intense current interest in optical information processing for realizing high-speed computations. Here we present and experimentally demonstrate an optical computing framework based on spatiotemporal effects in multimode fibers for a range of learning tasks from classifying COVID-19 X-ray lung images and speech recognition to predicting age from face images. The presented framework overcomes the energy scaling problem of existing systems without compromising speed. We leveraged simultaneous, linear, and nonlinear interaction of spatial modes as a computation engine. We numerically and experimentally showed the ability of the method to execute several different tasks with accuracy comparable to a digital implementation. △ Less

Submitted 26 May, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: Main text: 18 pages, 7 figures. Supplementary material: 13 pages, 11 figures, 2 tables

arXiv:2010.04430 [pdf, other]

Large-scale randomized experiment reveals machine learning helps people learn and remember more effectively

Authors: Utkarsh Upadhyay, Graham Lancashire, Christoph Moser, Manuel Gomez-Rodriguez

Abstract: Machine learning has typically focused on develo** models and algorithms that would ultimately replace humans at tasks where intelligence is required. In this work, rather than replacing humans, we focus on unveiling the potential of machine learning to improve how people learn and remember factual material. To this end, we perform a large-scale randomized controlled trial with thousands of lear… ▽ More Machine learning has typically focused on develo** models and algorithms that would ultimately replace humans at tasks where intelligence is required. In this work, rather than replacing humans, we focus on unveiling the potential of machine learning to improve how people learn and remember factual material. To this end, we perform a large-scale randomized controlled trial with thousands of learners from a popular learning app in the area of mobility. After controlling for the length and frequency of study, we find that learners whose study sessions are optimized using machine learning remember the content over $\sim$67% longer than those whose study sessions are generated using two alternative heuristics. Our randomized controlled trial also reveals that the learners whose study sessions are optimized using machine learning are $\sim$50% more likely to return to the app within 4-7 days. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:1907.00126 [pdf]

Competing Neural Networks for Robust Control of Nonlinear Systems

Authors: Babak Rahmani, Damien Loterie, Eirini Kakkava, Navid Borhani, Uğur Teğin, Demetri Psaltis, Christophe Moser

Abstract: The output of physical systems is often accessible by measurements such as the 3D position of a robotic arm actuated by many actuators or the speckle patterns formed by shining the spot of a laser pointer on a wall. The selection of the input of such a system (actuators and the shape of the laser spot respectively) to obtain a desired output is difficult because it is an ill-posed problem i.e. the… ▽ More The output of physical systems is often accessible by measurements such as the 3D position of a robotic arm actuated by many actuators or the speckle patterns formed by shining the spot of a laser pointer on a wall. The selection of the input of such a system (actuators and the shape of the laser spot respectively) to obtain a desired output is difficult because it is an ill-posed problem i.e. there are multiple inputs yielding the same output. In this paper, we propose an approach that provides a robust solution to this dilemma for any physical system. We show that it is possible to find the appropriate input of a system that results in a desired output, despite the input-output relation being nonlinear and\or with incomplete measurements of the systems variables. We showcase our approach using an extremely ill-posed problem in imaging. We demonstrate the projection of arbitrary shapes through a multimode fiber (MMF) when a sample of intensity-only measurements are taken at the output. We show image projection fidelity as high as ~90 %, which is on par with the gold standard methods which characterize the system fully by phase and amplitude measurements. The generality as well as simplicity of the proposed approach provides a new way of target-oriented control in real-world applications. △ Less

Submitted 3 February, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

Comments: 14 pages

arXiv:1303.6271 [pdf, other]

Preferential Attachment in Online Networks: Measurement and Explanations

Authors: Jérôme Kunegis, Marcel Blattner, Christine Moser

Abstract: We perform an empirical study of the preferential attachment phenomenon in temporal networks and show that on the Web, networks follow a nonlinear preferential attachment model in which the exponent depends on the type of network considered. The classical preferential attachment model for networks by Barabási and Albert (1999) assumes a linear relationship between the number of neighbors of a node… ▽ More We perform an empirical study of the preferential attachment phenomenon in temporal networks and show that on the Web, networks follow a nonlinear preferential attachment model in which the exponent depends on the type of network considered. The classical preferential attachment model for networks by Barabási and Albert (1999) assumes a linear relationship between the number of neighbors of a node in a network and the probability of attachment. Although this assumption is widely made in Web Science and related fields, the underlying linearity is rarely measured. To fill this gap, this paper performs an empirical longitudinal (time-based) study on forty-seven diverse Web network datasets from seven network categories and including directed, undirected and bipartite networks. We show that contrary to the usual assumption, preferential attachment is nonlinear in the networks under consideration. Furthermore, we observe that the deviation from linearity is dependent on the type of network, giving sublinear attachment in certain types of networks, and superlinear attachment in others. Thus, we introduce the preferential attachment exponent $β$ as a novel numerical network measure that can be used to discriminate different types of networks. We propose explanations for the behavior of that network measure, based on the mechanisms that underly the growth of the network in question. △ Less

Submitted 23 March, 2013; originally announced March 2013.

Comments: 10 pages, 5 figures, Accepted for the WebSci'13 Conference, Paris, 2013

ACM Class: H.4.0

Showing 1–13 of 13 results for author: Moser, C