-
Hardness of Learning Neural Networks under the Manifold Hypothesis
Authors:
Bobak T. Kiani,
Jason Wang,
Melanie Weber
Abstract:
The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under…
▽ More
The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Bounds on the ground state energy of quantum $p$-spin Hamiltonians
Authors:
Eric R. Anschuetz,
David Gamarnik,
Bobak T. Kiani
Abstract:
We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This valu…
▽ More
We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses. The proof of the limit existing follows from an extension of Fekete's Lemma after we demonstrate near super-additivity of the (normalized) quenched free energy. The proof of the value follows from a second moment method on the number of states achieving a given energy when restricting to an $ε$-net of product states.
Furthermore, we relate the maximal energy achieved over all states to a $p$-dependent constant $γ\left(p\right)$, which is defined by the degree of violation of a certain asymptotic independence ansatz over graph matchings. We show that the maximal energy achieved by all states $E^\ast\left(p\right)$ in the limit of large $n$ is at most $\sqrt{γ\left(p\right)}E_{\text{product}}^\ast$. We also prove using Lindeberg's interpolation method that the limiting $E^\ast\left(p\right)$ is robust with respect to the choice of the randomness and, for instance, also applies to the case of sparse random Hamiltonians. This robustness in the randomness extends to a wide range of random Hamiltonian models including SYK and random quantum max-cut.
△ Less
Submitted 17 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
On the hardness of learning under symmetries
Authors:
Bobak T. Kiani,
Thien Le,
Hannah Lawrence,
Stefanie Jegelka,
Melanie Weber
Abstract:
We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected…
▽ More
We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Product states optimize quantum $p$-spin models for large $p$
Authors:
Eric R. Anschuetz,
David Gamarnik,
Bobak T. Kiani
Abstract:
We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal…
▽ More
We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses.
Our most notable and (arguably) surprising result proves the existence of near-maximal energy states which are product states, and thus not entangled. Specifically, we prove that with high probability as $n\to\infty$, for any $E<E^*(p)$ there exists a product state with energy $\geq E$ at sufficiently large constant $p$. Even more surprisingly, this remains true even when restricting to tensor products of Pauli eigenstates. Our approximations go beyond what is known from monogamy-of-entanglement style arguments -- the best of which, in this normalization, achieve approximation error growing with $n$. Our results not only challenge prevailing beliefs in physics that extremely low-temperature states of random local Hamiltonians should exhibit non-negligible entanglement, but they also imply that classical algorithms can be just as effective as quantum algorithms in optimizing Hamiltonians with large locality -- though performing such optimization is still likely a hard problem.
Our results are robust with respect to the choice of the randomness (disorder) and apply to the case of sparse random Hamiltonian using Lindeberg's interpolation method. The proof of the main result is obtained by estimating the expected trace of the associated partition function, and then matching its asymptotics with the extremal energy of product states using the second moment method.
△ Less
Submitted 5 April, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Self-Supervised Learning with Lie Symmetries for Partial Differential Equations
Authors:
Grégoire Mialon,
Quentin Garrido,
Hannah Lawrence,
Danyal Rehman,
Yann LeCun,
Bobak T. Kiani
Abstract:
Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations…
▽ More
Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-step** performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs. Code: https://github.com/facebookresearch/SSLForPDEs.
△ Less
Submitted 14 February, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Equivariant Polynomials for Graph Neural Networks
Authors:
Omri Puny,
Derek Lim,
Bobak T. Kiani,
Haggai Maron,
Yaron Lipman
Abstract:
Graph Neural Networks (GNN) are inherently limited in their expressive power. Recent seminal works (Xu et al., 2019; Morris et al., 2019b) introduced the Weisfeiler-Lehman (WL) hierarchy as a measure of expressive power. Although this hierarchy has propelled significant advances in GNN analysis and architecture developments, it suffers from several significant limitations. These include a complex…
▽ More
Graph Neural Networks (GNN) are inherently limited in their expressive power. Recent seminal works (Xu et al., 2019; Morris et al., 2019b) introduced the Weisfeiler-Lehman (WL) hierarchy as a measure of expressive power. Although this hierarchy has propelled significant advances in GNN analysis and architecture developments, it suffers from several significant limitations. These include a complex definition that lacks direct guidance for model improvement and a WL hierarchy that is too coarse to study current GNNs. This paper introduces an alternative expressive power hierarchy based on the ability of GNNs to calculate equivariant polynomials of a certain degree. As a first step, we provide a full characterization of all equivariant graph polynomials by introducing a concrete basis, significantly generalizing previous results. Each basis element corresponds to a specific multi-graph, and its computation over some graph data input corresponds to a tensor contraction problem. Second, we propose algorithmic tools for evaluating the expressiveness of GNNs using tensor contraction sequences, and calculate the expressive power of popular GNNs. Finally, we enhance the expressivity of common GNN architectures by adding polynomial features or additional operations / aggregations inspired by our theory. These enhanced GNNs demonstrate state-of-the-art results in experiments across multiple graph learning benchmarks.
△ Less
Submitted 4 June, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
The SSL Interplay: Augmentations, Inductive Bias, and Generalization
Authors:
Vivien Cabannes,
Bobak T. Kiani,
Randall Balestriero,
Yann LeCun,
Alberto Bietti
Abstract:
Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architect…
▽ More
Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.
△ Less
Submitted 1 June, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Efficient classical algorithms for simulating symmetric quantum systems
Authors:
Eric R. Anschuetz,
Andreas Bauer,
Bobak T. Kiani,
Seth Lloyd
Abstract:
In light of recently proposed quantum algorithms that incorporate symmetries in the hope of quantum advantage, we show that with symmetries that are restrictive enough, classical algorithms can efficiently emulate their quantum counterparts given certain classical descriptions of the input. Specifically, we give classical algorithms that calculate ground states and time-evolved expectation values…
▽ More
In light of recently proposed quantum algorithms that incorporate symmetries in the hope of quantum advantage, we show that with symmetries that are restrictive enough, classical algorithms can efficiently emulate their quantum counterparts given certain classical descriptions of the input. Specifically, we give classical algorithms that calculate ground states and time-evolved expectation values for permutation-invariant Hamiltonians specified in the symmetrized Pauli basis with runtimes polynomial in the system size. We use tensor-network methods to transform symmetry-equivariant operators to the block-diagonal Schur basis that is of polynomial size, and then perform exact matrix multiplication or diagonalization in this basis. These methods are adaptable to a wide range of input and output states including those prescribed in the Schur basis, as matrix product states, or as arbitrary quantum states when given the power to apply low depth circuits and single qubit measurements.
△ Less
Submitted 21 November, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Joint Embedding Self-Supervised Learning in the Kernel Regime
Authors:
Bobak T. Kiani,
Randall Balestriero,
Yubei Chen,
Seth Lloyd,
Yann LeCun
Abstract:
The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods…
▽ More
The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. In this kernel regime, we derive methods to find the optimal form of the output representations for contrastive and non-contrastive loss functions. This procedure produces a new representation space with an inner product denoted as the induced kernel which generally correlates points which are related by an augmentation in kernel space and de-correlates points otherwise. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Beyond Barren Plateaus: Quantum Variational Algorithms Are Swamped With Traps
Authors:
Eric R. Anschuetz,
Bobak T. Kiani
Abstract:
One of the most important properties of classical neural networks is how surprisingly trainable they are, though their training algorithms typically rely on optimizing complicated, nonconvex loss functions. Previous results have shown that unlike the case in classical neural networks, variational quantum models are often not trainable. The most studied phenomenon is the onset of barren plateaus in…
▽ More
One of the most important properties of classical neural networks is how surprisingly trainable they are, though their training algorithms typically rely on optimizing complicated, nonconvex loss functions. Previous results have shown that unlike the case in classical neural networks, variational quantum models are often not trainable. The most studied phenomenon is the onset of barren plateaus in the training landscape of these quantum models, typically when the models are very deep. This focus on barren plateaus has made the phenomenon almost synonymous with the trainability of quantum models. Here, we show that barren plateaus are only a part of the story. We prove that a wide class of variational quantum models -- which are shallow, and exhibit no barren plateaus -- have only a superpolynomially small fraction of local minima within any constant energy from the global minimum, rendering these models untrainable if no good initial guess of the optimal parameters is known. We also study the trainability of variational quantum algorithms from a statistical query framework, and show that noisy optimization of a wide variety of quantum models is impossible with a sub-exponential number of queries. Finally, we numerically confirm our results on a variety of problem instances. Though we exclude a wide variety of quantum algorithms here, we give reason for optimism for certain classes of variational algorithms and discuss potential ways forward in showing the practical utility of such algorithms.
△ Less
Submitted 28 September, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Block-encoding dense and full-rank kernels using hierarchical matrices: applications in quantum numerical linear algebra
Authors:
Quynh T. Nguyen,
Bobak T. Kiani,
Seth Lloyd
Abstract:
Many quantum algorithms for numerical linear algebra assume black-box access to a block-encoding of the matrix of interest, which is a strong assumption when the matrix is not sparse. Kernel matrices, which arise from discretizing a kernel function $k(x,x')$, have a variety of applications in mathematics and engineering. They are generally dense and full-rank. Classically, the celebrated fast mult…
▽ More
Many quantum algorithms for numerical linear algebra assume black-box access to a block-encoding of the matrix of interest, which is a strong assumption when the matrix is not sparse. Kernel matrices, which arise from discretizing a kernel function $k(x,x')$, have a variety of applications in mathematics and engineering. They are generally dense and full-rank. Classically, the celebrated fast multipole method performs matrix multiplication on kernel matrices of dimension $N$ in time almost linear in $N$ by using the linear algebraic framework of hierarchical matrices. In light of this success, we propose a block-encoding scheme of the hierarchical matrix structure on a quantum computer. When applied to many physical kernel matrices, our method can improve the runtime of solving quantum linear systems of dimension $N$ to $O(κ\operatorname{polylog}(\frac{N}{\varepsilon}))$, where $κ$ and $\varepsilon$ are the condition number and error bound of the matrix operation. This runtime is near-optimal and, in terms of $N$, exponentially improves over prior quantum linear systems algorithms in the case of dense and full-rank kernel matrices. We discuss possible applications of our methodology in solving integral equations and accelerating computations in N-body problems.
△ Less
Submitted 6 December, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Implicit Bias of Linear Equivariant Networks
Authors:
Hannah Lawrence,
Kristian Georgiev,
Andrew Dienes,
Bobak T. Kiani
Abstract:
Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their \emph{explicit} symmetry bias, a recent line of work has proposed that the \emph{imp…
▽ More
Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their \emph{explicit} symmetry bias, a recent line of work has proposed that the \emph{implicit} bias of training algorithms on particular architectures is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative groups (such as permutations), as well as band-limited G-CNNs over infinite groups. We validate our theorems via experiments on a variety of groups, and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.
△ Less
Submitted 12 September, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Quantum algorithms for group convolution, cross-correlation, and equivariant transformations
Authors:
Grecia Castelazo,
Quynh T. Nguyen,
Giacomo De Palma,
Dirk Englund,
Seth Lloyd,
Bobak T. Kiani
Abstract:
Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic…
▽ More
Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic in the dimension of the group thus offering an exponential speedup compared to classical algorithms when input data is provided as a quantum state and linear operations are well conditioned. Motivated by the rich literature on quantum algorithms for solving algebraic problems, our theoretical framework opens a path for quantizing many algorithms in machine learning and numerical methods that employ group operations.
△ Less
Submitted 6 September, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Hamiltonian singular value transformation and inverse block encoding
Authors:
Seth Lloyd,
Bobak T. Kiani,
David R. M. Arvidsson-Shukur,
Samuel Bosch,
Giacomo De Palma,
William M. Kaminsky,
Zi-Wen Liu,
Milad Marvian
Abstract:
The quantum singular value transformation is a powerful quantum algorithm that allows one to apply a polynomial transformation to the singular values of a matrix that is embedded as a block of a unitary transformation. This paper shows how to perform the quantum singular value transformation for a matrix that can be embedded as a block of a Hamiltonian. The transformation can be implemented in a p…
▽ More
The quantum singular value transformation is a powerful quantum algorithm that allows one to apply a polynomial transformation to the singular values of a matrix that is embedded as a block of a unitary transformation. This paper shows how to perform the quantum singular value transformation for a matrix that can be embedded as a block of a Hamiltonian. The transformation can be implemented in a purely Hamiltonian context by the alternating application of Hamiltonians for chosen intervals: it is an example of the Quantum Alternating Operator Ansatz (generalized QAOA). We also show how to use the Hamiltonian quantum singular value transformation to perform inverse block encoding to implement a unitary of which a given Hamiltonian is a block. Inverse block encoding leads to novel procedures for matrix multiplication and for solving differential equations on quantum information processors in a purely Hamiltonian fashion.
△ Less
Submitted 30 May, 2021; v1 submitted 3 April, 2021;
originally announced April 2021.
-
Learning quantum data with the quantum Earth Mover's distance
Authors:
Bobak Toussi Kiani,
Giacomo De Palma,
Milad Marvian,
Zi-Wen Liu,
Seth Lloyd
Abstract:
Quantifying how far the output of a learning algorithm is from its target is an essential task in machine learning. However, in quantum settings, the loss landscapes of commonly used distance metrics often produce undesirable outcomes such as poor local minima and exponentially decaying gradients. To overcome these obstacles, we consider here the recently proposed quantum earth mover's (EM) or Was…
▽ More
Quantifying how far the output of a learning algorithm is from its target is an essential task in machine learning. However, in quantum settings, the loss landscapes of commonly used distance metrics often produce undesirable outcomes such as poor local minima and exponentially decaying gradients. To overcome these obstacles, we consider here the recently proposed quantum earth mover's (EM) or Wasserstein-1 distance as a quantum analog to the classical EM distance. We show that the quantum EM distance possesses unique properties, not found in other commonly used quantum distance metrics, that make quantum learning more stable and efficient. We propose a quantum Wasserstein generative adversarial network (qWGAN) which takes advantage of the quantum EM distance and provides an efficient means of performing learning on quantum data. We provide examples where our qWGAN is capable of learning a diverse set of quantum data with only resources polynomial in the number of qubits.
△ Less
Submitted 16 May, 2022; v1 submitted 8 January, 2021;
originally announced January 2021.
-
Quantum advantage for differential equation analysis
Authors:
Bobak T. Kiani,
Giacomo De Palma,
Dirk Englund,
William Kaminsky,
Milad Marvian,
Seth Lloyd
Abstract:
Quantum algorithms for both differential equation solving and for machine learning potentially offer an exponential speedup over all known classical algorithms. However, there also exist obstacles to obtaining this potential speedup in useful problem instances. The essential obstacle for quantum differential equation solving is that outputting useful information may require difficult post-processi…
▽ More
Quantum algorithms for both differential equation solving and for machine learning potentially offer an exponential speedup over all known classical algorithms. However, there also exist obstacles to obtaining this potential speedup in useful problem instances. The essential obstacle for quantum differential equation solving is that outputting useful information may require difficult post-processing, and the essential obstacle for quantum machine learning is that inputting the training set is a difficult task just by itself. In this paper, we demonstrate, when combined, these difficulties solve one another. We show how the output of quantum differential equation solving can serve as the input for quantum machine learning, allowing dynamical analysis in terms of principal components, power spectra, and wavelet decompositions. To illustrate this, we consider continuous time Markov processes on epidemiological and social networks. These quantum algorithms provide an exponential advantage over existing classical Monte Carlo methods.
△ Less
Submitted 26 April, 2022; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Adversarial Robustness Guarantees for Random Deep Neural Networks
Authors:
Giacomo De Palma,
Bobak T. Kiani,
Seth Lloyd
Abstract:
The reliability of deep learning algorithms is fundamentally challenged by the existence of adversarial examples, which are incorrectly classified inputs that are extremely close to a correctly classified input. We explore the properties of adversarial examples for deep neural networks with random weights and biases, and prove that for any $p\ge1$, the $\ell^p$ distance of any given input from the…
▽ More
The reliability of deep learning algorithms is fundamentally challenged by the existence of adversarial examples, which are incorrectly classified inputs that are extremely close to a correctly classified input. We explore the properties of adversarial examples for deep neural networks with random weights and biases, and prove that for any $p\ge1$, the $\ell^p$ distance of any given input from the classification boundary scales as one over the square root of the dimension of the input times the $\ell^p$ norm of the input. The results are based on the recently proved equivalence between Gaussian processes and deep neural networks in the limit of infinite width of the hidden layers, and are validated with experiments on both random deep neural networks and deep neural networks trained on the MNIST and CIFAR10 datasets. The results constitute a fundamental advance in the theoretical understanding of adversarial examples, and open the way to a thorough theoretical characterization of the relation between network architecture and robustness to adversarial perturbations.
△ Less
Submitted 22 July, 2021; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Quantum Medical Imaging Algorithms
Authors:
Bobak Toussi Kiani,
Agnes Villanyi,
Seth Lloyd
Abstract:
A central task in medical imaging is the reconstruction of an image or function from data collected by medical devices (e.g., CT, MRI, and PET scanners). We provide quantum algorithms for image reconstruction with exponential speedup over classical counterparts when data is input as a quantum state. Since outputs of our algorithms are stored in quantum states, individual pixels of reconstructed im…
▽ More
A central task in medical imaging is the reconstruction of an image or function from data collected by medical devices (e.g., CT, MRI, and PET scanners). We provide quantum algorithms for image reconstruction with exponential speedup over classical counterparts when data is input as a quantum state. Since outputs of our algorithms are stored in quantum states, individual pixels of reconstructed images may not be efficiently accessed classically; instead, we discuss various methods to extract information from outputs using a variety of quantum post-processing algorithms.
△ Less
Submitted 23 April, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Learning Unitaries by Gradient Descent
Authors:
Bobak Toussi Kiani,
Seth Lloyd,
Reevu Maity
Abstract:
We study the hardness of learning unitary transformations in $U(d)$ via gradient descent on time parameters of alternating operator sequences. We provide numerical evidence that, despite the non-convex nature of the loss landscape, gradient descent always converges to the target unitary when the sequence contains $d^2$ or more parameters. Rates of convergence indicate a "computational phase transi…
▽ More
We study the hardness of learning unitary transformations in $U(d)$ via gradient descent on time parameters of alternating operator sequences. We provide numerical evidence that, despite the non-convex nature of the loss landscape, gradient descent always converges to the target unitary when the sequence contains $d^2$ or more parameters. Rates of convergence indicate a "computational phase transition." With less than $d^2$ parameters, gradient descent converges to a sub-optimal solution, whereas with more than $d^2$ parameters, gradient descent converges exponentially to an optimal solution.
△ Less
Submitted 18 February, 2020; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Random deep neural networks are biased towards simple functions
Authors:
Giacomo De Palma,
Bobak Toussi Kiani,
Seth Lloyd
Abstract:
We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2π log n)), where n is the l…
▽ More
We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2π log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle Pérez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms. The probability distribution of the functions generated by random deep neural networks is a good choice for the prior probability distribution in the PAC-Bayesian generalization bounds. Our results constitute a fundamental step forward in the characterization of this distribution, therefore contributing to the understanding of the generalization properties of deep learning algorithms.
△ Less
Submitted 23 October, 2019; v1 submitted 25 December, 2018;
originally announced December 2018.