-
Learning a Clinically-Relevant Concept Bottleneck for Lesion Detection in Breast Ultrasound
Authors:
Arianna Bunnell,
Yannik Glaser,
Dustin Valdez,
Thomas Wolfgruber,
Aleen Altamirano,
Carol Zamora González,
Brenda Y. Hernandez,
Peter Sadowski,
John A. Shepherd
Abstract:
Detecting and classifying lesions in breast ultrasound images is a promising application of artificial intelligence (AI) for reducing the burden of cancer in regions with limited access to mammography. Such AI systems are more likely to be useful in a clinical setting if their predictions can be explained to a radiologist. This work proposes an explainable AI model that provides interpretable pred…
▽ More
Detecting and classifying lesions in breast ultrasound images is a promising application of artificial intelligence (AI) for reducing the burden of cancer in regions with limited access to mammography. Such AI systems are more likely to be useful in a clinical setting if their predictions can be explained to a radiologist. This work proposes an explainable AI model that provides interpretable predictions using a standard lexicon from the American College of Radiology's Breast Imaging and Reporting Data System (BI-RADS). The model is a deep neural network featuring a concept bottleneck layer in which known BI-RADS features are predicted before making a final cancer classification. This enables radiologists to easily review the predictions of the AI system and potentially fix errors in real time by modifying the concept predictions. In experiments, a model is developed on 8,854 images from 994 women with expert annotations and histological cancer labels. The model outperforms state-of-the-art lesion detection frameworks with 48.9 average precision on the held-out testing set, and for cancer classification, concept intervention is shown to increase performance from 0.876 to 0.885 area under the receiver operating characteristic curve. Training and evaluation code is available at https://github.com/hawaii-ai/bus-cbm.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
Authors:
Yannik Glaser,
Justin E. Stopa,
Linnea M. Wolniewicz,
Ralph Foster,
Doug Vandemark,
Alexis Mouche,
Bertrand Chapron,
Peter Sadowski
Abstract:
The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range…
▽ More
The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
FishNet: Deep Neural Networks for Low-Cost Fish Stock Estimation
Authors:
Moseli Mots'oehli,
Anton Nikolaev,
Wawan B. IGede,
John Lynham,
Peter J. Mous,
Peter Sadowski
Abstract:
Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish f…
▽ More
Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish from images containing multiple fish, possibly consisting of different species. Then each fish species is classified and the length is predicted using separate machine learning models. To develop the model, we use a dataset of 300,000 hand-labeled images containing 1.2M fish of 163 different species and ranging in length from 10cm to 250cm, with additional annotations and quality control methods used to curate high-quality training data. On held-out test data sets, our system achieves a 92% intersection over union on the fish segmentation task, a 89% top-1 classification accuracy on single fish species classification, and a 2.3cm mean absolute error on the fish length estimation task.
△ Less
Submitted 27 June, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Diffusion Models for High-Resolution Solar Forecasts
Authors:
Yusuke Hatanaka,
Yannik Glaser,
Geoff Galgon,
Giuseppe Torri,
Peter Sadowski
Abstract:
Forecasting future weather and climate is inherently difficult. Machine learning offers new approaches to increase the accuracy and computational efficiency of forecasts, but current methods are unable to accurately model uncertainty in high-dimensional predictions. Score-based diffusion models offer a new approach to modeling probability distributions over many dependent variables, and in this wo…
▽ More
Forecasting future weather and climate is inherently difficult. Machine learning offers new approaches to increase the accuracy and computational efficiency of forecasts, but current methods are unable to accurately model uncertainty in high-dimensional predictions. Score-based diffusion models offer a new approach to modeling probability distributions over many dependent variables, and in this work, we demonstrate how they provide probabilistic forecasts of weather and climate variables at unprecedented resolution, speed, and accuracy. We apply the technique to day-ahead solar irradiance forecasts by generating many samples from a diffusion model trained to super-resolve coarse-resolution numerical weather predictions to high-resolution weather satellite observations.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Quantitative Imaging Principles Improves Medical Image Learning
Authors:
Lambert T. Leong,
Michael C. Wong,
Yannik Glaser,
Thomas Wolfgruber,
Steven B. Heymsfield,
Peter Sadowski,
John A. Shepherd
Abstract:
Fundamental differences between natural and medical images have recently favored the use of self-supervised learning (SSL) over ImageNet transfer learning for medical image applications. Differences between image types are primarily due to the imaging modality and medical images utilize a wide range of physics based techniques while natural images are captured using only visible light. While many…
▽ More
Fundamental differences between natural and medical images have recently favored the use of self-supervised learning (SSL) over ImageNet transfer learning for medical image applications. Differences between image types are primarily due to the imaging modality and medical images utilize a wide range of physics based techniques while natural images are captured using only visible light. While many have demonstrated that SSL on medical images has resulted in better downstream task performance, our work suggests that more performance can be gained. The scientific principles which are used to acquire medical images are not often considered when constructing learning problems. For this reason, we propose incorporating quantitative imaging principles during generative SSL to improve image quality and quantitative biological accuracy. We show that this training schema results in better starting states for downstream supervised training on limited data. Our model also generates images that validate on clinical quantitative analysis software.
△ Less
Submitted 11 July, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Tourbillon: a Physically Plausible Neural Architecture
Authors:
Mohammadamin Tavakoli,
Peter Sadowski,
Pierre Baldi
Abstract:
In a physical neural system, backpropagation is faced with a number of obstacles including: the need for labeled data, the violation of the locality learning principle, the need for symmetric connections, and the lack of modularity. Tourbillon is a new architecture that addresses all these limitations. At its core, it consists of a stack of circular autoencoders followed by an output layer. The ci…
▽ More
In a physical neural system, backpropagation is faced with a number of obstacles including: the need for labeled data, the violation of the locality learning principle, the need for symmetric connections, and the lack of modularity. Tourbillon is a new architecture that addresses all these limitations. At its core, it consists of a stack of circular autoencoders followed by an output layer. The circular autoencoders are trained in self-supervised mode by recirculation algorithms and the top layer in supervised mode by stochastic gradient descent, with the option of propagating error information through the entire stack using non-symmetric connections. While the Tourbillon architecture is meant primarily to address physical constraints, and not to improve current engineering applications of deep learning, we demonstrate its viability on standard benchmark datasets including MNIST, Fashion MNIST, and CIFAR10. We show that Tourbillon can achieve comparable performance to models trained with backpropagation and outperform models that are trained with other physically plausible algorithms, such as feedback alignment.
△ Less
Submitted 22 July, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
LP WAN Gateway Location Selection Using Modified K-Dominating Set Algorithm
Authors:
Artur Frankiewicz,
Adam Glos,
Krzysztof Grochla,
Zbigniew Łaskarzewski,
Jarosław Miszczak,
Konrad Połys,
Przemysław Sadowski,
Anna Strzoda
Abstract:
The LP WAN networks use gateways or base stations to communicate with devices distributed on large distances, up to tens of kilometres. The selection of optimal gateway locations in wireless networks should allow providing the complete coverage for a given set of nodes, taking into account the limitations, such as the number of nodes served per access point or required redundancy. In this paper, w…
▽ More
The LP WAN networks use gateways or base stations to communicate with devices distributed on large distances, up to tens of kilometres. The selection of optimal gateway locations in wireless networks should allow providing the complete coverage for a given set of nodes, taking into account the limitations, such as the number of nodes served per access point or required redundancy. In this paper, we describe the problem of selecting the base stations in a network using the concept of $k$-dominating set. In our model, we include information about the required redundancy and spectral efficiency. We consider the additional requirements on the resulting connections and provide the greedy algorithm for solving the problem. The algorithm is evaluated in randomly generated network topologies and using the coordinates of sample real smart metering networks.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Sherpa: Robust Hyperparameter Optimization for Machine Learning
Authors:
Lars Hertel,
Julian Collado,
Peter Sadowski,
Jordan Ott,
Pierre Baldi
Abstract:
Sherpa is a hyperparameter optimization library for machine learning models. It is specifically designed for problems with computationally expensive, iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Sherpa can be run on either a single m…
▽ More
Sherpa is a hyperparameter optimization library for machine learning models. It is specifically designed for problems with computationally expensive, iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Sherpa can be run on either a single machine or in parallel on a cluster. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning practitioners by automating the more tedious aspects of model tuning. Its source code and documentation are available at https://github.com/sherpa-ai/sherpa.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Thermodynamic Computing
Authors:
Tom Conte,
Erik DeBenedictis,
Natesh Ganesh,
Todd Hylton,
John Paul Strachan,
R. Stanley Williams,
Alexander Alemi,
Lee Altenberg,
Gavin Crooks,
James Crutchfield,
Lidia del Rio,
Josh Deutsch,
Michael DeWeese,
Khari Douglas,
Massimiliano Esposito,
Michael Frank,
Robert Fry,
Peter Harsha,
Mark Hill,
Christopher Kello,
Jeff Krichmar,
Suhas Kumar,
Shih-Chii Liu,
Seth Lloyd,
Matteo Marsili
, et al. (14 additional authors not shown)
Abstract:
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hard…
▽ More
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hardware, devices have become so small that we are struggling to eliminate the effects of thermodynamic fluctuations, which are unavoidable at the nanometer scale. In terms of software, our ability to imagine and program effective computational abstractions and implementations are clearly challenged in complex domains. In terms of systems, currently five percent of the power generated in the US is used to run computing systems - this astonishing figure is neither ecologically sustainable nor economically scalable. Economically, the cost of building next-generation semiconductor fabrication plants has soared past $10 billion. All of these difficulties - device scaling, software complexity, adaptability, energy consumption, and fabrication economics - indicate that the current computing paradigm has matured and that continued improvements along this path will be limited. If technological progress is to continue and corresponding social and economic benefits are to continue to accrue, computing must become much more capable, energy efficient, and affordable. We propose that progress in computing can continue under a united, physically grounded, computational paradigm centered on thermodynamics. Herein we propose a research agenda to extend these thermodynamic foundations into complex, non-equilibrium, self-organizing systems and apply them holistically to future computing systems that will harness nature's innate computational capacity. We call this type of computing "Thermodynamic Computing" or TC.
△ Less
Submitted 14 November, 2019; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Machine Learning Kernel Method from a Quantum Generative Model
Authors:
Przemysław Sadowski
Abstract:
Recently the use of Noisy Intermediate Scale Quantum (NISQ) devices for machine learning tasks has been proposed. The propositions often perform poorly due to various restrictions. However, the quantum devices should perform well in sampling tasks. Thus, we recall theory of sampling-based approach to machine learning and propose a quantum sampling based classifier. Namely, we use randomized featur…
▽ More
Recently the use of Noisy Intermediate Scale Quantum (NISQ) devices for machine learning tasks has been proposed. The propositions often perform poorly due to various restrictions. However, the quantum devices should perform well in sampling tasks. Thus, we recall theory of sampling-based approach to machine learning and propose a quantum sampling based classifier. Namely, we use randomized feature map approach. We propose a method of quantum sampling based on random quantum circuits with parametrized rotations distribution. We obtain simple to use method with intuitive hyper-parameters that performs at least equally well as top out-of-the-box classical methods. In short we obtain a competitive quantum classifier with crucial component being quantum sampling -- a promising task for quantum supremacy.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Approximation of quantum control correction scheme using deep neural networks
Authors:
M. Ostaszewski,
J. A. Miszczak,
P. Sadowski,
L. Banchi
Abstract:
We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the map** f…
▽ More
We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the map** from quantum control pulses to system dynamics and then analysing the robustness of the latter against local variations in the control profile.
△ Less
Submitted 28 March, 2019; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Geometrical versus time-series representation of data in quantum control learning
Authors:
M. Ostaszewski,
J. A. Miszczak,
P. Sadowski
Abstract:
Recently machine learning techniques have become popular for analysing physical systems and solving problems occurring in quantum computing. In this paper we focus on using such techniques for finding the sequence of physical operations implementing the given quantum logical operation. In this context we analyse the flexibility of the data representation and compare the applicability of two machin…
▽ More
Recently machine learning techniques have become popular for analysing physical systems and solving problems occurring in quantum computing. In this paper we focus on using such techniques for finding the sequence of physical operations implementing the given quantum logical operation. In this context we analyse the flexibility of the data representation and compare the applicability of two machine learning approaches based on different representations of data. We demonstrate that the utilization of the geometrical structure of control pulses is sufficient for achieving high-fidelity of the implemented evolution. We also demonstrate that artificial neural networks, unlike geometrical methods, posses the generalization abilities enabling them to generate control pulses for the systems with variable strength of the disturbance. The presented results suggest that in some quantum control scenarios, geometrical data representation and processing is competitive to more complex methods.
△ Less
Submitted 29 April, 2020; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Quantum distance-based classifier with constant size memory, distributed knowledge and state recycling
Authors:
Przemysław Sadowski
Abstract:
In this work we examine recently proposed distance-based classification method designed for near-term quantum processing units with limited resources. We further study possibilities to reduce the quantum resources without any efficiency decrease. We show that only a part of the information undergoes coherent evolution and this fact allows us to introduce an algorithm with significantly reduced qua…
▽ More
In this work we examine recently proposed distance-based classification method designed for near-term quantum processing units with limited resources. We further study possibilities to reduce the quantum resources without any efficiency decrease. We show that only a part of the information undergoes coherent evolution and this fact allows us to introduce an algorithm with significantly reduced quantum memory size. Additionally, considering only partial information at a time, we propose a classification protocol with information distributed among a number of agents. Finally, we show that the information evolution during a measurement can lead to a better solution and that accuracy of the algorithm can be improved by harnessing the state after the final measurement.
△ Less
Submitted 2 March, 2018;
originally announced March 2018.
-
Learning in the Machine: the Symmetries of the Deep Learning Channel
Authors:
Pierre Baldi,
Peter Sadowski,
Zhiqin Lu
Abstract:
In a physical neural system, learning rules must be local both in space and time. In order for learning to occur, non-local information must be communicated to the deep synapses through a communication channel, the deep learning channel. We identify several possible architectures for this learning channel (Bidirectional, Conjoined, Twin, Distinct) and six symmetry challenges: 1) symmetry of archit…
▽ More
In a physical neural system, learning rules must be local both in space and time. In order for learning to occur, non-local information must be communicated to the deep synapses through a communication channel, the deep learning channel. We identify several possible architectures for this learning channel (Bidirectional, Conjoined, Twin, Distinct) and six symmetry challenges: 1) symmetry of architectures; 2) symmetry of weights; 3) symmetry of neurons; 4) symmetry of derivatives; 5) symmetry of processing; and 6) symmetry of learning rules. Random backpropagation (RBP) addresses the second and third symmetry, and some of its variations, such as skipped RBP (SRBP) address the first and the fourth symmetry. Here we address the last two desirable symmetries showing through simulations that they can be achieved and that the learning channel is particularly robust to symmetry variations. Specifically, random backpropagation and its variations can be performed with the same non-linear neurons used in the main input-output forward channel, and the connections in the learning channel can be adapted using the same algorithm used in the forward channel, removing the need for any specialized hardware in the learning channel. Finally, we provide mathematical results in simple cases showing that the learning equations in the forward and backward channels converge to fixed points, for almost any initial conditions. In symmetric architectures, if the weights in both channels are small at initialization, adaptation in both channels leads to weights that are essentially symmetric during and after learning. Biological connections are discussed.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Efficient Antihydrogen Detection in Antimatter Physics by Deep Learning
Authors:
Peter Sadowski,
Balint Radics,
Ananya,
Yasunori Yamazaki,
Pierre Baldi
Abstract:
Antihydrogen is at the forefront of antimatter research at the CERN Antiproton Decelerator. Experiments aiming to test the fundamental CPT symmetry and antigravity effects require the efficient detection of antihydrogen annihilation events, which is performed using highly granular tracking detectors installed around an antimatter trap. Improving the efficiency of the antihydrogen annihilation dete…
▽ More
Antihydrogen is at the forefront of antimatter research at the CERN Antiproton Decelerator. Experiments aiming to test the fundamental CPT symmetry and antigravity effects require the efficient detection of antihydrogen annihilation events, which is performed using highly granular tracking detectors installed around an antimatter trap. Improving the efficiency of the antihydrogen annihilation detection plays a central role in the final sensitivity of the experiments. We propose deep learning as a novel technique to analyze antihydrogen annihilation data, and compare its performance with a traditional track and vertex reconstruction method. We report that the deep learning approach yields significant improvement, tripling event coverage while simultaneously improving performance by over 5% in terms of Area Under Curve (AUC).
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Learning in the Machine: Random Backpropagation and the Deep Learning Channel
Authors:
Pierre Baldi,
Peter Sadowski,
Zhiqin Lu
Abstract:
Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requir…
▽ More
Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.
△ Less
Submitted 22 December, 2017; v1 submitted 8 December, 2016;
originally announced December 2016.
-
PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
Authors:
Md. Mostofa Ali Patwary,
Nadathur Rajagopalan Satish,
Narayanan Sundaram,
Jialin Liu,
Peter Sadowski,
Evan Racah,
Suren Byna,
Craig Tull,
Wahid Bhimji,
Prabhat,
Pradeep Dubey
Abstract:
Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited s…
▽ More
Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing $\sim$50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems. In addition, we showcase performance and scalability on the recently released Intel Xeon Phi processor showing that our algorithm scales well even on massively parallel architectures.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
Theano: A Python framework for fast computation of mathematical expressions
Authors:
The Theano Development Team,
Rami Al-Rfou,
Guillaume Alain,
Amjad Almahairi,
Christof Angermueller,
Dzmitry Bahdanau,
Nicolas Ballas,
Frédéric Bastien,
Justin Bayer,
Anatoly Belikov,
Alexander Belopolsky,
Yoshua Bengio,
Arnaud Bergeron,
James Bergstra,
Valentin Bisson,
Josh Bleecher Snyder,
Nicolas Bouchard,
Nicolas Boulanger-Lewandowski,
Xavier Bouthillier,
Alexandre de Brébisson,
Olivier Breuleux,
Pierre-Luc Carrier,
Kyunghyun Cho,
Jan Chorowski,
Paul Christiano
, et al. (88 additional authors not shown)
Abstract:
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu…
▽ More
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models.
The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
Parameterized Machine Learning for High-Energy Physics
Authors:
Pierre Baldi,
Kyle Cranmer,
Taylor Faucett,
Peter Sadowski,
Daniel Whiteson
Abstract:
We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual…
▽ More
We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual values. This simplifies the training process and gives improved performance at intermediate values, even for complex problems requiring deep learning. Applications include tools parameterized in terms of theoretical model parameters, such as the mass of a particle, which allow for a single network to provide improved discrimination across a range of masses. This concept is simple to implement and allows for optimized interpolatable results.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks
Authors:
Evan Racah,
Seyoon Ko,
Peter Sadowski,
Wahid Bhimji,
Craig Tull,
Sang-Yun Oh,
Pierre Baldi,
Prabhat
Abstract:
Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show…
▽ More
Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networks can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.
△ Less
Submitted 6 December, 2016; v1 submitted 27 January, 2016;
originally announced January 2016.
-
Lively quantum walks on cycles
Authors:
Przemysław Sadowski,
Jarosław Adam Miszczak,
Mateusz Ostaszewski
Abstract:
We introduce a family of quantum walks on cycles parametrized by their liveliness, defined by the ability to execute a long-range move. We investigate the behaviour of the probability distribution and time-averaged probability distribution. We show that the liveliness parameter, controlling the magnitude of the additional long-range move, has a direct impact on the periodicity of the limiting dist…
▽ More
We introduce a family of quantum walks on cycles parametrized by their liveliness, defined by the ability to execute a long-range move. We investigate the behaviour of the probability distribution and time-averaged probability distribution. We show that the liveliness parameter, controlling the magnitude of the additional long-range move, has a direct impact on the periodicity of the limiting distribution. We also show that the introduced model provides a method for network exploration which is robust against trap**.
△ Less
Submitted 8 February, 2017; v1 submitted 9 December, 2015;
originally announced December 2015.
-
A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation
Authors:
Pierre Baldi,
Peter Sadowski
Abstract:
In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of t…
▽ More
In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. We estimate the learning channel capacity associated with several algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost, even in recurrent networks. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far.
△ Less
Submitted 21 October, 2016; v1 submitted 22 June, 2015;
originally announced June 2015.
-
Quantum image classification using principal component analysis
Authors:
Mateusz Ostaszewski,
Przemysław Sadowski,
Piotr Gawron
Abstract:
We present a novel quantum algorithm for classification of images. The algorithm is constructed using principal component analysis and von Neuman quantum measurements. In order to apply the algorithm we present a new quantum representation of grayscale images.
We present a novel quantum algorithm for classification of images. The algorithm is constructed using principal component analysis and von Neuman quantum measurements. In order to apply the algorithm we present a new quantum representation of grayscale images.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.
-
Learning Activation Functions to Improve Deep Neural Networks
Authors:
Forest Agostinelli,
Matthew Hoffman,
Peter Sadowski,
Pierre Baldi
Abstract:
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-o…
▽ More
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.
△ Less
Submitted 21 April, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.
-
Enhanced Higgs to $τ^+τ^-$ Searches with Deep Learning
Authors:
Pierre Baldi,
Peter Sadowski,
Daniel Whiteson
Abstract:
The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5$σ$ significance barrier without more data. \emph{Deep learning} techniques have the potential to increase the statistical…
▽ More
The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5$σ$ significance barrier without more data. \emph{Deep learning} techniques have the potential to increase the statistical power of this analysis by \emph{automatically} learning complex, high-level data representations. In this work, deep neural networks are used to detect the decay of the Higgs to a pair of tau leptons. A Bayesian optimization algorithm is used to tune the network architecture and training algorithm hyperparameters, resulting in a deep network of eight non-linear processing layers that improves upon the performance of shallow classifiers even without the use of features specifically engineered by physicists for this application. The improvement in discovery significance is equivalent to an increase in the accumulated dataset of 25\%.
△ Less
Submitted 13 October, 2014;
originally announced October 2014.
-
Quantum network exploration with a faulty sense of direction
Authors:
Jarosław Adam Miszczak,
Przemysław Sadowski
Abstract:
We develop a model which can be used to analyse the scenario of exploring quantum network with a distracted sense of direction. Using this model we analyse the behaviour of quantum mobile agents operating with non-adaptive and adaptive strategies which can be employed in this scenario. We introduce the notion of node visiting suitable for analysing quantum superpositions of states by distinguishin…
▽ More
We develop a model which can be used to analyse the scenario of exploring quantum network with a distracted sense of direction. Using this model we analyse the behaviour of quantum mobile agents operating with non-adaptive and adaptive strategies which can be employed in this scenario. We introduce the notion of node visiting suitable for analysing quantum superpositions of states by distinguishing between visiting and attaining a position. We show that without a proper model of adaptiveness, it is not possible for the party representing the distraction in the sense of direction, to obtain the results analogous to the classical case. Moreover, with additional control resources the total number of attained positions is maintained if the number of visited positions is strictly limited.
△ Less
Submitted 25 March, 2014; v1 submitted 27 August, 2013;
originally announced August 2013.