Search | arXiv e-print repository

Training of Physical Neural Networks

Authors: Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Marković, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu , et al. (3 additional authors not shown)

Abstract: Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also… ▽ More Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 29 pages, 4 figures

arXiv:2406.01399 [pdf, other]

doi 10.1145/3630106.3658998

Null Compliance: NYC Local Law 144 and the Challenges of Algorithm Accountability

Authors: Lucas Wright, Roxana Mike Muenster, Briana Vecchione, Tianyao Qu, Pika, Cai, COMM/INFO 2450 Student Investigators, Jacob Metcalf, J. Nathan Matias

Abstract: In July 2023, New York City became the first jurisdiction globally to mandate bias audits for commercial algorithmic systems, specifically for automated employment decisions systems (AEDTs) used in hiring and promotion. Local Law 144 (LL 144) requires AEDTs to be independently audited annually for race and gender bias, and the audit report must be publicly posted. Additionally, employers are oblig… ▽ More In July 2023, New York City became the first jurisdiction globally to mandate bias audits for commercial algorithmic systems, specifically for automated employment decisions systems (AEDTs) used in hiring and promotion. Local Law 144 (LL 144) requires AEDTs to be independently audited annually for race and gender bias, and the audit report must be publicly posted. Additionally, employers are obligated to post a transparency notice with the job listing. In this study, 155 student investigators recorded 391 employers' compliance with LL 144 and the user experience for prospective job applicants. Among these employers, 18 posted audit reports and 13 posted transparency notices. These rates could potentially be explained by a significant limitation in the accountability mechanisms enacted by LL 144. Since the law grants employers substantial discretion over whether their system is in scope of the law, a null result cannot be said to indicate non-compliance, a condition we call ``null compliance." Employer discretion may also explain our finding that nearly all audits reported an impact factor over 0.8, a rule of thumb often used in employment discrimination cases. We also find that the benefit of LL 144 to ordinary job seekers is limited due to shortcomings in accessibility and usability. Our findings offer important lessons for policy-makers as they consider regulating algorithmic systems, particularly the degree of discretion to grant to regulated parties and the limitations of relying on transparency and end-user accountability. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2402.17750 [pdf, other]

Scaling on-chip photonic neural processors using arbitrarily programmable wave propagation

Authors: Tatsuhiro Onodera, Martin M. Stein, Benjamin A. Ash, Mandar M. Sohoni, Melissa Bosch, Ryotatsu Yanagimoto, Marc Jankowski, Timothy P. McKenna, Tianyu Wang, Gennady Shvets, Maxim R. Shcherbakov, Logan G. Wright, Peter L. McMahon

Abstract: On-chip photonic processors for neural networks have potential benefits in both speed and energy efficiency but have not yet reached the scale at which they can outperform electronic processors. The dominant paradigm for designing on-chip photonics is to make networks of relatively bulky discrete components connected by one-dimensional waveguides. A far more compact alternative is to avoid explici… ▽ More On-chip photonic processors for neural networks have potential benefits in both speed and energy efficiency but have not yet reached the scale at which they can outperform electronic processors. The dominant paradigm for designing on-chip photonics is to make networks of relatively bulky discrete components connected by one-dimensional waveguides. A far more compact alternative is to avoid explicitly defining any components and instead sculpt the continuous substrate of the photonic processor to directly perform the computation using waves freely propagating in two dimensions. We propose and demonstrate a device whose refractive index as a function of space, $n(x,z)$, can be rapidly reprogrammed, allowing arbitrary control over the wave propagation in the device. Our device, a 2D-programmable waveguide, combines photoconductive gain with the electro-optic effect to achieve massively parallel modulation of the refractive index of a slab waveguide, with an index modulation depth of $10^{-3}$ and approximately $10^4$ programmable degrees of freedom. We used a prototype device with a functional area of $12\,\text{mm}^2$ to perform neural-network inference with up to 49-dimensional input vectors in a single pass, achieving 96% accuracy on vowel classification and 86% accuracy on $7 \times 7$-pixel MNIST handwritten-digit classification. This is a scale beyond that of previous photonic chips relying on discrete components, illustrating the benefit of the continuous-waves paradigm. In principle, with large enough chip area, the reprogrammability of the device's refractive index distribution enables the reconfigurable realization of any passive, linear photonic circuit or device. This promises the development of more compact and versatile photonic systems for a wide range of applications, including optical processing, smart sensing, spectroscopy, and optical communications. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.00025 [pdf, other]

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition

Authors: Adnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti

Abstract: We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multi… ▽ More We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65% speed improvement on A100, and an average of 124% speed improvement on H100 (with a peak of 295%) for a range of matrix dimensions including those found in a llama-style model, where m < n = k. △ Less

Submitted 22 February, 2024; v1 submitted 5 January, 2024; originally announced February 2024.

arXiv:2310.18335 [pdf, other]

The hardware is the software

Authors: Jeremie Laydevant, Logan G. Wright, Tianyu Wang, Peter L. McMahon

Abstract: Human brains and bodies are not hardware running software: the hardware is the software. We reason that because the microscopic physics of artificial-intelligence hardware and of human biological "hardware" is distinct, neuromorphic engineers need to be cautious (and yet also creative) in how we take inspiration from biological intelligence. We should focus primarily on principles and design ideas… ▽ More Human brains and bodies are not hardware running software: the hardware is the software. We reason that because the microscopic physics of artificial-intelligence hardware and of human biological "hardware" is distinct, neuromorphic engineers need to be cautious (and yet also creative) in how we take inspiration from biological intelligence. We should focus primarily on principles and design ideas that respect -- and embrace -- the underlying hardware physics of non-biological intelligent systems, rather than abstracting it away. We see a major role for neuroscience in neuromorphic computing as identifying the physics-agnostic principles of biological intelligence -- that is the principles of biological intelligence that can be gainfully adapted and applied to any physical hardware. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2308.15265 [pdf, other]

A Multi-Perspective Learning to Rank Approach to Support Children's Information Seeking in the Classroom

Authors: Garrett Allen, Katherine Landau Wright, Jerry Alan Fails, Casey Kennington, Maria Soledad Pera

Abstract: We introduce a novel re-ranking model that aims to augment the functionality of standard search engines to support classroom search activities for children (ages 6 to 11). This model extends the known listwise learning-to-rank framework by balancing risk and reward. Doing so enables the model to prioritize Web resources of high educational alignment, appropriateness, and adequate readability by an… ▽ More We introduce a novel re-ranking model that aims to augment the functionality of standard search engines to support classroom search activities for children (ages 6 to 11). This model extends the known listwise learning-to-rank framework by balancing risk and reward. Doing so enables the model to prioritize Web resources of high educational alignment, appropriateness, and adequate readability by analyzing the URLs, snippets, and page titles of Web resources retrieved by a given mainstream search engine. Experimental results, including an ablation study and comparisons with existing baselines, showcase the correctness of the proposed model. The outcomes of this work demonstrate the value of considering multiple perspectives inherent to the classroom setting, e.g., educational alignment, readability, and objectionability, when applied to the design of algorithms that can better support children's information discovery. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Extended version of the manuscript to appear in proceedings of the 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology

arXiv:2307.15712 [pdf, other]

Quantum-noise-limited optical neural networks operating at a few quanta per activation

Authors: Shi-Yuan Ma, Tianyu Wang, Jérémie Laydevant, Logan G. Wright, Peter L. McMahon

Abstract: Analog physical neural networks, which hold promise for improved energy efficiency and speed compared to digital electronic neural networks, are nevertheless typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large (>10). What happens if an analog system is instead operated in an ultra-low-power regime, in which the behavior of the system becomes highly… ▽ More Analog physical neural networks, which hold promise for improved energy efficiency and speed compared to digital electronic neural networks, are nevertheless typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large (>10). What happens if an analog system is instead operated in an ultra-low-power regime, in which the behavior of the system becomes highly stochastic and the noise is no longer a small perturbation on the signal? In this paper, we study this question in the setting of optical neural networks operated in the limit where some layers use only a single photon to cause a neuron activation. Neuron activations in this limit are dominated by quantum noise from the fundamentally probabilistic nature of single-photon detection of weak optical signals. We show that it is possible to train stochastic optical neural networks to perform deterministic image-classification tasks with high accuracy in spite of the extremely high noise (SNR ~ 1) by using a training procedure that directly models the stochastic behavior of photodetection. We experimentally demonstrated MNIST classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to 0.008 photons per multiply-accumulate (MAC) operation, which is equivalent to 0.003 attojoules of optical energy per MAC. Our experiment used >40x fewer photons per inference than previous state-of-the-art low-optical-energy demonstrations, to achieve the same accuracy of >90%. Our work shows that some extremely stochastic analog systems, including those operating in the limit where quantum noise dominates, can nevertheless be used as layers in neural networks that deterministically perform classification tasks with high accuracy if they are appropriately trained. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 55 pages, 27 figures

arXiv:2304.11277 [pdf, other]

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Authors: Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

Abstract: It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit tech… ▽ More It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit technical barrier for the wider community to access and leverage these technologies. In this paper, we introduce PyTorch Fully Sharded Data Parallel (FSDP) as an industry-grade solution for large model training. FSDP has been closely co-designed with several key PyTorch core components including Tensor implementation, dispatcher system, and CUDA memory caching allocator, to provide non-intrusive user experiences and high training efficiency. Additionally, FSDP natively incorporates a range of techniques and settings to optimize resource utilization across a variety of hardware configurations. The experimental results demonstrate that FSDP is capable of achieving comparable performance to Distributed Data Parallel while providing support for significantly larger models with near-linear scalability in terms of TFLOPS. △ Less

Submitted 12 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2302.12043 [pdf, ps, other]

Conversational Agents and Children: Let Children Learn

Authors: Casey Kennington, Jerry Alan Fails, Katherine Landau Wright, Maria Soledad Pera

Abstract: Using online information discovery as a case study, in this position paper we discuss the need to design, develop, and deploy (conversational) agents that can -- non-intrusively -- guide children in their quest for online resources rather than simply finding resources for them. We argue that agents should "let children learn" and should be built to take on a teacher-facilitator function, allowing… ▽ More Using online information discovery as a case study, in this position paper we discuss the need to design, develop, and deploy (conversational) agents that can -- non-intrusively -- guide children in their quest for online resources rather than simply finding resources for them. We argue that agents should "let children learn" and should be built to take on a teacher-facilitator function, allowing children to develop their technical and critical thinking abilities as they interact with varied technology in a broad range of use cases. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 6 pages

arXiv:2302.10360 [pdf, other]

Optical Transformers

Authors: Maxwell G. Anderson, Shi-Yuan Ma, Tianyu Wang, Logan G. Wright, Peter L. McMahon

Abstract: The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical comp… ▽ More The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical computing. To test this idea, we performed small-scale optical experiments with a prototype accelerator to demonstrate that Transformer operations can run on optical hardware despite noise and errors. Using simulations, validated by our experiments, we then explored the energy efficiency of optical implementations of Transformers and identified scaling laws for model performance with respect to optical energy usage. We found that the optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where $d$ is the Transformer width, an asymptotic advantage over digital systems. We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a $100 \times$ energy-efficiency advantage for running some of the largest current Transformer models, and that if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical computers could have a $>8,000\times$ energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We analyzed how these results motivate and inform the construction of future optical accelerators along with optics-amenable deep-learning approaches. With assumptions about future improvements to electronics and Transformer quantization techniques (5$\times$ cheaper memory access, double the digital--analog conversion efficiency, and 4-bit precision), we estimated that optical computers' advantage against current 300-fJ/MAC digital processors could grow to $>100,000\times$. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 27 pages, 13 figures

Journal ref: Transactions on Machine Learning Research, 03/2024, https://openreview.net/forum?id=Xxw0edFFQC

arXiv:2207.14293 [pdf, other]

doi 10.1038/s41566-023-01170-8

Image sensing with multilayer, nonlinear optical neural networks

Authors: Tianyu Wang, Mandar M. Sohoni, Logan G. Wright, Martin M. Stein, Shi-Yuan Ma, Tatsuhiro Onodera, Maxwell G. Anderson, Peter L. McMahon

Abstract: Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but enc… ▽ More Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but encoding. By optically encoding images into a compressed, low-dimensional latent space suitable for efficient post-analysis, these image sensors can operate with fewer pixels and fewer photons, allowing higher-throughput, lower-latency operation. Optical neural networks (ONNs) offer a platform for processing data in the analog, optical domain. ONN-based sensors have however been limited to linear processing, but nonlinearity is a prerequisite for depth, and multilayer NNs significantly outperform shallow NNs on many tasks. Here, we realize a multilayer ONN pre-processor for image sensing, using a commercial image intensifier as a parallel optoelectronic, optical-to-optical nonlinear activation function. We demonstrate that the nonlinear ONN pre-processor can achieve compression ratios of up to 800:1 while still enabling high accuracy across several representative computer-vision tasks, including machine-vision benchmarks, flow-cytometry image classification, and identification of objects in real scenes. In all cases we find that the ONN's nonlinearity and depth allowed it to outperform a purely linear ONN encoder. Although our experiments are specialized to ONN sensors for incoherent-light images, alternative ONN platforms should facilitate a range of ONN sensors. These ONN sensors may surpass conventional sensors by pre-processing optical information in spatial, temporal, and/or spectral dimensions, potentially with coherent and quantum qualities, all natively in the optical domain. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: Nat. Photon. 18, 1-8 (2023)

arXiv:2203.03366 [pdf, other]

Improvements to Gradient Descent Methods for Quantum Tensor Network Machine Learning

Authors: Fergus Barratt, James Dborin, Lewis Wright

Abstract: Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems us… ▽ More Tensor networks have demonstrated significant value for machine learning in a myriad of different applications. However, optimizing tensor networks using standard gradient descent has proven to be difficult in practice. Tensor networks suffer from initialization problems resulting in exploding or vanishing gradients and require extensive hyperparameter tuning. Efforts to overcome these problems usually depend on specific network architectures, or ad hoc prescriptions. In this paper we address the problems of initialization and hyperparameter tuning, making it possible to train tensor networks using established machine learning techniques. We introduce a `copy node' method that successfully initializes arbitrary tensor networks, in addition to a gradient based regularization technique for bond dimensions. We present numerical results that show that the combination of techniques presented here produces quantum inspired tensor network models with far fewer parameters, while improving generalization performance. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Journal ref: Second Workshop on Quantum Tensor Networks in Machine Learning, 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2106.13731 [pdf, other]

Ranger21: a synergistic deep learning optimizer

Authors: Less Wright, Nestor Demeure

Abstract: As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial public… ▽ More As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state. △ Less

Submitted 6 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: for associated code, see https://github.com/lessw2020/Ranger21

ACM Class: I.2.6

arXiv:2105.03456 [pdf, other]

CASTing a Net: Supporting Teachers with Search Technology

Authors: Garrett Allen, Katherine Landau Wright, Jerry Alan Fails, Casey Kennington, Maria Soledad Pera

Abstract: Past and current research has typically focused on ensuring that search technology for the classroom serves children. In this paper, we argue for the need to broaden the research focus to include teachers and how search technology can aid them. In particular, we share how furnishing a behind-the-scenes portal for teachers can empower them by providing a window into the spelling, writing, and conce… ▽ More Past and current research has typically focused on ensuring that search technology for the classroom serves children. In this paper, we argue for the need to broaden the research focus to include teachers and how search technology can aid them. In particular, we share how furnishing a behind-the-scenes portal for teachers can empower them by providing a window into the spelling, writing, and concept connection skills of their students. △ Less

Submitted 7 May, 2021; originally announced May 2021.

Comments: KidRec '21: 5th International and Interdisciplinary Perspectives on Children & Recommender and Information Retrieval Systems (KidRec) Search and Recommendation Technology through the Lens of a Teacher- Co-located with ACM IDC 2021

arXiv:2104.13467 [pdf, other]

doi 10.1038/s41467-021-27774-8

An optical neural network using less than 1 photon per multiplication

Authors: Tianyu Wang, Shi-Yuan Ma, Logan G. Wright, Tatsuhiro Onodera, Brian Richard, Peter L. McMahon

Abstract: Deep learning has rapidly become a widespread tool in both scientific and commercial endeavors. Milestones of deep learning exceeding human performance have been achieved for a growing number of tasks over the past several years, across areas as diverse as game-playing, natural-language translation, and medical-image analysis. However, continued progress is increasingly hampered by the high energy… ▽ More Deep learning has rapidly become a widespread tool in both scientific and commercial endeavors. Milestones of deep learning exceeding human performance have been achieved for a growing number of tasks over the past several years, across areas as diverse as game-playing, natural-language translation, and medical-image analysis. However, continued progress is increasingly hampered by the high energy costs associated with training and running deep neural networks on electronic processors. Optical neural networks have attracted attention as an alternative physical platform for deep learning, as it has been theoretically predicted that they can fundamentally achieve higher energy efficiency than neural networks deployed on conventional digital computers. Here, we experimentally demonstrate an optical neural network achieving 99% accuracy on handwritten-digit classification using ~3.2 detected photons per weight multiplication and ~90% accuracy using ~0.64 photons (~$2.4 \times 10^{-19}$ J of optical energy) per weight multiplication. This performance was achieved using a custom free-space optical processor that executes matrix-vector multiplications in a massively parallel fashion, with up to ~0.5 million scalar (weight) multiplications performed at the same time. Using commercially available optical components and standard neural-network training methods, we demonstrated that optical neural networks can operate near the standard quantum limit with extremely low optical powers and still achieve high accuracy. Our results provide a proof-of-principle for low-optical-power operation, and with careful system design including the surrounding electronics used for data storage and control, open up a path to realizing optical processors that require only $10^{-16}$ J total energy per scalar multiplication -- which is orders of magnitude more efficient than current digital processors. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: 42 pages, 21 figures

Journal ref: Nature Communications 13, 123 (2022)

arXiv:2104.13386 [pdf, other]

doi 10.1038/s41586-021-04223-6

Deep physical neural networks enabled by a backpropagation algorithm for arbitrary physical systems

Authors: Logan G. Wright, Tatsuhiro Onodera, Martin M. Stein, Tianyu Wang, Darren T. Schachter, Zoey Hu, Peter L. McMahon

Abstract: Deep neural networks have become a pervasive tool in science and engineering. However, modern deep neural networks' growing energy requirements now increasingly limit their scaling and broader use. We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently trai… ▽ More Deep neural networks have become a pervasive tool in science and engineering. However, modern deep neural networks' growing energy requirements now increasingly limit their scaling and broader use. We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently train sequences of controllable physical systems to act as deep neural networks. This method automatically trains the functionality of any sequence of real physical systems, directly, using backpropagation, the same technique used for modern deep neural networks. To illustrate their generality, we demonstrate physical neural networks with three diverse physical systems-optical, mechanical, and electrical. Physical neural networks may facilitate unconventional machine learning hardware that is orders of magnitude faster and more energy efficient than conventional electronic processors. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Journal ref: Nature 601, 549-555 (2022)

arXiv:2102.00645 [pdf, other]

An End-to-End Food Image Analysis System

Authors: Jiangpeng He, Runyu Mao, Zeman Shao, Janine L. Wright, Deborah A. Kerr, Carol J. Boushey, Fengqing Zhu

Abstract: Modern deep learning techniques have enabled advances in image-based dietary assessment such as food recognition and food portion size estimation. Valuable information on the types of foods and the amount consumed are crucial for prevention of many chronic diseases. However, existing methods for automated image-based food analysis are neither end-to-end nor are capable of processing multiple tasks… ▽ More Modern deep learning techniques have enabled advances in image-based dietary assessment such as food recognition and food portion size estimation. Valuable information on the types of foods and the amount consumed are crucial for prevention of many chronic diseases. However, existing methods for automated image-based food analysis are neither end-to-end nor are capable of processing multiple tasks (e.g., recognition and portion estimation) together, making it difficult to apply to real life applications. In this paper, we propose an image-based food analysis framework that integrates food localization, classification and portion size estimation. Our proposed framework is end-to-end, i.e., the input can be an arbitrary food image containing multiple food items and our system can localize each single food item with its corresponding predicted food type and portion size. We also improve the single food portion estimation by consolidating localization results with a food energy distribution map obtained by conditional GAN to generate a four-channel RGB-Distribution image. Our end-to-end framework is evaluated on a real life food image dataset collected from a nutrition feeding study. △ Less

Submitted 1 February, 2021; originally announced February 2021.

arXiv:1807.04599 [pdf, other]

doi 10.1371/journal.pone.0207827

Benchmarking treewidth as a practical component of tensor-network--based quantum simulation

Authors: Eugene F. Dumitrescu, Allison L. Fisher, Timothy D. Goodrich, Travis S. Humble, Blair D. Sullivan, Andrew L. Wright

Abstract: Tensor networks are powerful factorization techniques which reduce resource requirements for numerically simulating principal quantum many-body systems and algorithms. The computational complexity of a tensor network simulation depends on the tensor ranks and the order in which they are contracted. Unfortunately, computing optimal contraction sequences (orderings) in general is known to be a compu… ▽ More Tensor networks are powerful factorization techniques which reduce resource requirements for numerically simulating principal quantum many-body systems and algorithms. The computational complexity of a tensor network simulation depends on the tensor ranks and the order in which they are contracted. Unfortunately, computing optimal contraction sequences (orderings) in general is known to be a computationally difficult (NP-complete) task. In 2005, Markov and Shi showed that optimal contraction sequences correspond to optimal (minimum width) tree decompositions of a tensor network's line graph, relating the contraction sequence problem to a rich literature in structural graph theory. While treewidth-based methods have largely been ignored in favor of dataset-specific algorithms in the prior tensor networks literature, we demonstrate their practical relevance for problems arising from two distinct methods used in quantum simulation: multi-scale entanglement renormalization ansatz (MERA) datasets and quantum circuits generated by the quantum approximate optimization algorithm (QAOA). We exhibit multiple regimes where treewidth-based algorithms outperform domain-specific algorithms, while demonstrating that the optimal choice of algorithm has a complex dependence on the network density, expected contraction complexity, and user run time requirements. We further provide an open source software framework designed with an emphasis on accessibility and extendability, enabling replicable experimental evaluations and future exploration of competing methods by practitioners. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: Open source code available

Showing 1–18 of 18 results for author: Wright, L