-
On the Expressive Power of Sparse Geometric MPNNs
Authors:
Yonatan Sverdlov,
Nadav Dym
Abstract:
Motivated by applications in chemistry and other sciences, we study the expressive power of message-passing neural networks for geometric graphs, whose node features correspond to 3-dimensional positions. Recent work has shown that such models can separate generic pairs of non-equivalent geometric graphs, though they may fail to separate some rare and complicated instances. However, these results…
▽ More
Motivated by applications in chemistry and other sciences, we study the expressive power of message-passing neural networks for geometric graphs, whose node features correspond to 3-dimensional positions. Recent work has shown that such models can separate generic pairs of non-equivalent geometric graphs, though they may fail to separate some rare and complicated instances. However, these results assume a fully connected graph, where each node possesses complete knowledge of all other nodes. In contrast, often, in application, every node only possesses knowledge of a small number of nearest neighbors. This paper shows that generic pairs of non-equivalent geometric graphs can be separated by message-passing networks with rotation equivariant features as long as the underlying graph is connected. When only invariant intermediate features are allowed, generic separation is guaranteed for generically globally rigid graphs. We introduce a simple architecture, EGENNET, which achieves our theoretical guarantees and compares favorably with alternative architecture on synthetic and chemical benchmarks.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
On the Hölder Stability of Multiset and Graph Neural Networks
Authors:
Yair Davidson,
Nadav Dym
Abstract:
Famously, multiset neural networks based on sum-pooling can separate all distinct multisets, and as a result can be used by message passing neural networks (MPNNs) to separate all pairs of graphs that can be separated by the 1-WL graph isomorphism test. However, the quality of this separation may be very weak, to the extent that the embeddings of "separable" multisets and graphs might even be cons…
▽ More
Famously, multiset neural networks based on sum-pooling can separate all distinct multisets, and as a result can be used by message passing neural networks (MPNNs) to separate all pairs of graphs that can be separated by the 1-WL graph isomorphism test. However, the quality of this separation may be very weak, to the extent that the embeddings of "separable" multisets and graphs might even be considered identical when using fixed finite precision.
In this work, we propose to fully analyze the separation quality of multiset models and MPNNs via a novel adaptation of Lipschitz and Hölder continuity to parametric functions. We prove that common sum-based models are lower-Hölder continuous, with a Hölder exponent that decays rapidly with the network's depth. Our analysis leads to adversarial examples of graphs which can be separated by three 1-WL iterations, but cannot be separated in practice by standard maximally powerful MPNNs. To remedy this, we propose two novel MPNNs with improved separation quality, one of which is lower Lipschitz continuous. We show these MPNNs can easily classify our adversarial examples, and compare favorably with standard MPNNs on standard graph learning tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Injective Sliced-Wasserstein embedding for weighted sets and point clouds
Authors:
Tal Amir,
Nadav Dym
Abstract:
We present the $\textit{Sliced Wasserstein Embedding}$ $\unicode{x2014}$ a novel method to embed multisets and distributions over $\mathbb{R}^d$ into Euclidean space. Our embedding is injective and approximately preserves the Sliced Wasserstein distance. Moreover, when restricted to multisets, it is bi-Lipschitz. We also prove that it is $\textit{impossible}$ to embed distributions over…
▽ More
We present the $\textit{Sliced Wasserstein Embedding}$ $\unicode{x2014}$ a novel method to embed multisets and distributions over $\mathbb{R}^d$ into Euclidean space. Our embedding is injective and approximately preserves the Sliced Wasserstein distance. Moreover, when restricted to multisets, it is bi-Lipschitz. We also prove that it is $\textit{impossible}$ to embed distributions over $\mathbb{R}^d$ into a Euclidean space in a bi-Lipschitz manner, even under the assumption that their support is bounded and finite. We demonstrate empirically that our embedding offers practical advantage in learning tasks over existing methods for handling multisets.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
A transversality theorem for semi-algebraic sets with application to signal recovery from the second moment and cryo-EM
Authors:
Tamir Bendory,
Nadav Dym,
Dan Edidin,
Arun Suresh
Abstract:
Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-al…
▽ More
Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits.
As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning.
△ Less
Submitted 10 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Equivariant Frames and the Impossibility of Continuous Canonicalization
Authors:
Nadav Dym,
Hannah Lawrence,
Jonathan W. Siegel
Abstract:
Canonicalization provides an architecture-agnostic method for enforcing equivariance, with generalizations such as frame-averaging recently gaining prominence as a lightweight and flexible alternative to equivariant architectures. Recent works have found an empirical benefit to using probabilistic frames instead, which learn weighted distributions over group elements. In this work, we provide stro…
▽ More
Canonicalization provides an architecture-agnostic method for enforcing equivariance, with generalizations such as frame-averaging recently gaining prominence as a lightweight and flexible alternative to equivariant architectures. Recent works have found an empirical benefit to using probabilistic frames instead, which learn weighted distributions over group elements. In this work, we provide strong theoretical justification for this phenomenon: for commonly-used groups, there is no efficiently computable choice of frame that preserves continuity of the function being averaged. In other words, unweighted frame-averaging can turn a smooth, non-symmetric function into a discontinuous, symmetric function. To address this fundamental robustness problem, we formally define and construct \emph{weighted} frames, which provably preserve continuity, and demonstrate their utility by constructing efficient and continuous weighted frames for the actions of $SO(2)$, $SO(3)$, and $S_n$ on point clouds.
△ Less
Submitted 18 June, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Weisfeiler Leman for Euclidean Equivariant Machine Learning
Authors:
Snir Hordan,
Tal Amir,
Nadav Dym
Abstract:
The $k$-Weisfeiler-Leman ($k$-WL) graph isomorphism test hierarchy is a common method for assessing the expressive power of graph neural networks (GNNs). Recently, GNNs whose expressive power is equivalent to the $2$-WL test were proven to be universal on weighted graphs which encode $3\mathrm{D}$ point cloud data, yet this result is limited to invariant continuous functions on point clouds. In th…
▽ More
The $k$-Weisfeiler-Leman ($k$-WL) graph isomorphism test hierarchy is a common method for assessing the expressive power of graph neural networks (GNNs). Recently, GNNs whose expressive power is equivalent to the $2$-WL test were proven to be universal on weighted graphs which encode $3\mathrm{D}$ point cloud data, yet this result is limited to invariant continuous functions on point clouds. In this paper, we extend this result in three ways: Firstly, we show that PPGN can simulate $2$-WL uniformly on all point clouds with low complexity. Secondly, we show that $2$-WL tests can be extended to point clouds which include both positions and velocities, a scenario often encountered in applications. Finally, we provide a general framework for proving equivariant universality and leverage it to prove that a simple modification of this invariant PPGN architecture can be used to obtain a universal equivariant architecture that can approximate all continuous equivariant functions uniformly. Building on our results, we develop our WeLNet architecture, which sets new state-of-the-art results on the N-Body dynamics task and the GEOM-QM9 molecular conformation generation task.
△ Less
Submitted 26 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Future Directions in the Theory of Graph Machine Learning
Authors:
Christopher Morris,
Fabrizio Frasca,
Nadav Dym,
Haggai Maron,
İsmail İlkan Ceylan,
Ron Levie,
Derek Lim,
Michael Bronstein,
Martin Grohe,
Stefanie Jegelka
Abstract:
Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on…
▽ More
Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on elucidating the coarse-grained expressive power of GNNs, predominantly employing combinatorial techniques. However, these studies do not perfectly align with practice, particularly in understanding the generalization behavior of GNNs when trained with stochastic first-order optimization techniques. In this position paper, we argue that the graph machine learning community needs to shift its attention to develo** a balanced theory of graph machine learning, focusing on a more thorough understanding of the interplay of expressive power, generalization, and optimization.
△ Less
Submitted 14 June, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Phase retrieval with semi-algebraic and ReLU neural network priors
Authors:
Tamir Bendory,
Nadav Dym,
Dan Edidin,
Arun Suresh
Abstract:
The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative…
▽ More
The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Equivariant Deep Weight Space Alignment
Authors:
Aviv Navon,
Aviv Shamsian,
Ethan Fetaya,
Gal Chechik,
Nadav Dym,
Haggai Maron
Abstract:
Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to e…
▽ More
Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.
△ Less
Submitted 31 May, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Neural Injective Functions for Multisets, Measures and Graphs via a Finite Witness Theorem
Authors:
Tal Amir,
Steven J. Gortler,
Ilai Avni,
Ravina Ravina,
Nadav Dym
Abstract:
Injective multiset functions have a key role in the theoretical study of machine learning on multisets and graphs. Yet, there remains a gap between the provably injective multiset functions considered in theory, which typically rely on polynomial moments, and the multiset functions used in practice, which rely on $\textit{neural moments}$ $\unicode{x2014}$ whose injectivity on multisets has not be…
▽ More
Injective multiset functions have a key role in the theoretical study of machine learning on multisets and graphs. Yet, there remains a gap between the provably injective multiset functions considered in theory, which typically rely on polynomial moments, and the multiset functions used in practice, which rely on $\textit{neural moments}$ $\unicode{x2014}$ whose injectivity on multisets has not been studied to date.
In this paper, we bridge this gap by showing that moments of neural networks do define injective multiset functions, provided that an analytic non-polynomial activation is used. The number of moments required by our theory is optimal essentially up to a multiplicative factor of two. To prove this result, we state and prove a $\textit{finite witness theorem}$, which is of independent interest.
As a corollary to our main theorem, we derive new approximation results for functions on multisets and measures, and new separation results for graph neural networks. We also provide two negative results: (1) moments of piecewise-linear neural networks cannot be injective multiset functions; and (2) even when moment-based multiset functions are injective, they can never be bi-Lipschitz.
△ Less
Submitted 29 October, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Complete Neural Networks for Complete Euclidean Graphs
Authors:
Snir Hordan,
Tal Amir,
Steven J. Gortler,
Nadav Dym
Abstract:
Neural networks for point clouds, which respect their natural invariance to permutation and rigid motion, have enjoyed recent success in modeling geometric phenomena, from molecular dynamics to recommender systems. Yet, to date, no model with polynomial complexity is known to be complete, that is, able to distinguish between any pair of non-isomorphic point clouds. We fill this theoretical gap by…
▽ More
Neural networks for point clouds, which respect their natural invariance to permutation and rigid motion, have enjoyed recent success in modeling geometric phenomena, from molecular dynamics to recommender systems. Yet, to date, no model with polynomial complexity is known to be complete, that is, able to distinguish between any pair of non-isomorphic point clouds. We fill this theoretical gap by showing that point clouds can be completely determined, up to permutation and rigid motion, by applying the 3-WL graph isomorphism test to the point cloud's centralized Gram matrix. Moreover, we formulate an Euclidean variant of the 2-WL test and show that it is also sufficient to achieve completeness. We then show how our complete Euclidean WL tests can be simulated by an Euclidean graph neural network of moderate size and demonstrate their separation capability on highly symmetrical point clouds.
△ Less
Submitted 9 April, 2024; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Symmetrized Robust Procrustes: Constant-Factor Approximation and Exact Recovery
Authors:
Tal Amir,
Shahar Kovalsky,
Nadav Dym
Abstract:
The classical $\textit{Procrustes}$ problem is to find a rigid motion (orthogonal transformation and translation) that best aligns two given point-sets in the least-squares sense. The $\textit{Robust Procrustes}$ problem is an important variant, in which a power-1 objective is used instead of least squares to improve robustness to outliers. While the optimal solution of the least-squares problem c…
▽ More
The classical $\textit{Procrustes}$ problem is to find a rigid motion (orthogonal transformation and translation) that best aligns two given point-sets in the least-squares sense. The $\textit{Robust Procrustes}$ problem is an important variant, in which a power-1 objective is used instead of least squares to improve robustness to outliers. While the optimal solution of the least-squares problem can be easily computed in closed form, dating back to Schönemann (1966), no such solution is known for the power-1 problem. In this paper we propose a novel convex relaxation for the Robust Procrustes problem. Our relaxation enjoys several theoretical and practical advantages: Theoretically, we prove that our method provides a $\sqrt{2}$-factor approximation to the Robust Procrustes problem, and that, under appropriate assumptions, it exactly recovers the true rigid motion from point correspondences contaminated by outliers. In practice, we find in numerical experiments on both synthetic and real robust Procrustes problems, that our method performs similarly to the standard Iteratively Reweighted Least Squares (IRLS). However the convexity of our algorithm allows incorporating additional convex penalties, which are not readily amenable to IRLS. This turns out to be a substantial advantage, leading to improved results in high-dimensional problems, including non-rigid shape alignment and semi-supervised interlingual word translation.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Low Dimensional Invariant Embeddings for Universal Geometric Learning
Authors:
Nadav Dym,
Steven J. Gortler
Abstract:
This paper studies separating invariants: map**s on $D$ dimensional domains which are invariant to an appropriate group action, and which separate orbits. The motivation for this study comes from the usefulness of separating invariants in proving universality of equivariant neural network architectures.
We observe that in several cases the cardinality of separating invariants proposed in the m…
▽ More
This paper studies separating invariants: map**s on $D$ dimensional domains which are invariant to an appropriate group action, and which separate orbits. The motivation for this study comes from the usefulness of separating invariants in proving universality of equivariant neural network architectures.
We observe that in several cases the cardinality of separating invariants proposed in the machine learning literature is much larger than the dimension $D$. As a result, the theoretical universal constructions based on these separating invariants is unrealistically large. Our goal in this paper is to resolve this issue.
We show that when a continuous family of semi-algebraic separating invariants is available, separation can be obtained by randomly selecting $2D+1 $ of these invariants. We apply this methodology to obtain an efficient scheme for computing separating invariants for several classical group actions which have been studied in the invariant learning literature. Examples include matrix multiplication actions on point clouds by permutations, rotations, and various other linear groups.
Often the requirement of invariant separation is relaxed and only generic separation is required. In this case, we show that only $D+1$ invariants are required. More importantly, generic invariants are often significantly easier to compute, as we illustrate by discussing generic and full separation for weighted graphs. Finally we outline an approach for proving that separating invariants can be constructed also when the random parameters have finite precision.
△ Less
Submitted 21 November, 2023; v1 submitted 5 May, 2022;
originally announced May 2022.
-
A Simple and Universal Rotation Equivariant Point-cloud Network
Authors:
Ben Finkelshtein,
Chaim Baskin,
Haggai Maron,
Nadav Dym
Abstract:
Equivariance to permutations and rigid motions is an important inductive bias for various 3D learning problems. Recently it has been shown that the equivariant Tensor Field Network architecture is universal -- it can approximate any equivariant function. In this paper we suggest a much simpler architecture, prove that it enjoys the same universality guarantees and evaluate its performance on Model…
▽ More
Equivariance to permutations and rigid motions is an important inductive bias for various 3D learning problems. Recently it has been shown that the equivariant Tensor Field Network architecture is universal -- it can approximate any equivariant function. In this paper we suggest a much simpler architecture, prove that it enjoys the same universality guarantees and evaluate its performance on Modelnet40. The code to reproduce our experiments is available at \url{https://github.com/simpleinvariance/UniversalNetwork}
△ Less
Submitted 27 May, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Neural Network Approximation of Refinable Functions
Authors:
Ingrid Daubechies,
Ronald DeVore,
Nadav Dym,
Shira Faigenbaum-Golovin,
Shahar Z. Kovalsky,
Kung-Ching Lin,
Josiah Park,
Guergana Petrova,
Barak Sober
Abstract:
In the desire to quantify the success of neural networks in deep learning and other applications, there is a great interest in understanding which functions are efficiently approximated by the outputs of neural networks. By now, there exists a variety of results which show that a wide range of functions can be approximated with sometimes surprising accuracy by these outputs. For example, it is kno…
▽ More
In the desire to quantify the success of neural networks in deep learning and other applications, there is a great interest in understanding which functions are efficiently approximated by the outputs of neural networks. By now, there exists a variety of results which show that a wide range of functions can be approximated with sometimes surprising accuracy by these outputs. For example, it is known that the set of functions that can be approximated with exponential accuracy (in terms of the number of parameters used) includes, on one hand, very smooth functions such as polynomials and analytic functions (see e.g. \cite{E,S,Y}) and, on the other hand, very rough functions such as the Weierstrass function (see e.g. \cite{EPGB,DDFHP}), which is nowhere differentiable. In this paper, we add to the latter class of rough functions by showing that it also includes refinable functions. Namely, we show that refinable functions are approximated by the outputs of deep ReLU networks with a fixed width and increasing depth with accuracy exponential in terms of their number of parameters. Our results apply to functions used in the standard construction of wavelets as well as to functions constructed via subdivision algorithms in Computer Aided Geometric Design.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
On the Universality of Rotation Equivariant Point Cloud Networks
Authors:
Nadav Dym,
Haggai Maron
Abstract:
Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation.
In this paper, we present a first study of the approxim…
▽ More
Learning functions on point clouds has applications in many fields, including computer vision, computer graphics, physics, and chemistry. Recently, there has been a growing interest in neural architectures that are invariant or equivariant to all three shape-preserving transformations of point clouds: translation, rotation, and permutation.
In this paper, we present a first study of the approximation power of these architectures. We first derive two sufficient conditions for an equivariant architecture to have the universal approximation property, based on a novel characterization of the space of equivariant polynomials. We then use these conditions to show that two recently suggested models are universal, and for devising two other novel universal architectures.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Stable Phase Retrieval from Locally Stable and Conditionally Connected Measurements
Authors:
Cheng Cheng,
Ingrid Daubechies,
Nadav Dym,
Jianfeng Lu
Abstract:
This paper is concerned with stable phase retrieval for a family of phase retrieval models we name "locally stable and conditionally connected" (LSCC) measurement schemes. For every signal $f$, we associate a corresponding weighted graph $G_f$, defined by the LSCC measurement scheme, and show that the phase retrievability of the signal $f$ is determined by the connectivity of $G_f$. We then charac…
▽ More
This paper is concerned with stable phase retrieval for a family of phase retrieval models we name "locally stable and conditionally connected" (LSCC) measurement schemes. For every signal $f$, we associate a corresponding weighted graph $G_f$, defined by the LSCC measurement scheme, and show that the phase retrievability of the signal $f$ is determined by the connectivity of $G_f$. We then characterize the phase retrieval stability of the signal $f$ by two measures that are commonly used in graph theory to quantify graph connectivity: the Cheeger constant of $G_f$ for real valued signals, and the algebraic connectivity of $G_f$ for complex valued signals.
We use our results to study the stability of two phase retrieval models that can be cast as LSCC measurement schemes, and focus on understanding for which signals the "curse of dimensionality" can be avoided. The first model we discuss is a finite-dimensional model for locally supported measurements such as the windowed Fourier transform. For signals "without large holes", we show the stability constant exhibits only a mild polynomial growth in the dimension, in stark contrast with the exponential growth which uniform stability constants tend to suffer from; more precisely, in $R^d$ the constant grows proportionally to $d^{1/2}$, while in $C^d$ it grows proportionally to $d$. We also show the growth of the constant in the complex case cannot be reduced, suggesting that complex phase retrieval is substantially more difficult than real phase retrieval. The second model we consider is an infinite-dimensional phase retrieval problem in a principal shift invariant space. We show that despite the infinite dimensionality of this model, signals with monotone exponential decay will have a finite stability constant. In contrast, the stability bound provided by our results will be infinite if the signal's decay is polynomial.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Quasi Branch and Bound for Smooth Global Optimization
Authors:
Nadav Dym
Abstract:
Quasi branch and bound is a recently introduced generalization of branch and bound, where lower bounds are replaced by a relaxed notion of quasi-lower bounds, required to be lower bounds only for sub-cubes containing a minimizer. This paper is devoted to studying the possible benefits of this approach, for the problem of minimizing a smooth function over a cube. This is accomplished by suggesting…
▽ More
Quasi branch and bound is a recently introduced generalization of branch and bound, where lower bounds are replaced by a relaxed notion of quasi-lower bounds, required to be lower bounds only for sub-cubes containing a minimizer. This paper is devoted to studying the possible benefits of this approach, for the problem of minimizing a smooth function over a cube. This is accomplished by suggesting two quasi branch and bound algorithms, qBnB(2) and qBnB(3), that compare favorably with alternative branch and bound algorithms.
The first algorithm we propose, qBnB(2), achieves second order convergence based only on a bound on second derivatives, without requiring calculation of derivatives. As such, this algorithm is suitable for derivative free optimization, for which typical algorithms such as Lipschitz optimization only have first order convergence and so suffer from limited accuracy due to the clustering problem. Additionally, qBnB(2) is provably more efficient than the second order Lipschitz gradient algorithm which does require exact calculation of gradients.
The second algorithm we propose, qBnB(3), has third order convergence and finite termination. In contrast with BnB algorithms with similar guarantees who typically compute lower bounds via solving relatively time consuming convex optimization problems, calculation of qBnB(3) bounds only requires solving a small number of Newton iterations. Our experiments verify the potential of both these methods in comparison with state of the art branch and bound algorithms.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Expression of Fractals Through Neural Network Functions
Authors:
Nadav Dym,
Barak Sober,
Ingrid Daubechies
Abstract:
To help understand the underlying mechanisms of neural networks (NNs), several groups have, in recent years, studied the number of linear regions $\ell$ of piecewise linear functions generated by deep neural networks (DNN). In particular, they showed that $\ell$ can grow exponentially with the number of network parameters $p$, a property often used to explain the advantages of DNNs over shallow NN…
▽ More
To help understand the underlying mechanisms of neural networks (NNs), several groups have, in recent years, studied the number of linear regions $\ell$ of piecewise linear functions generated by deep neural networks (DNN). In particular, they showed that $\ell$ can grow exponentially with the number of network parameters $p$, a property often used to explain the advantages of DNNs over shallow NNs in approximating complicated functions. Nonetheless, a simple dimension argument shows that DNNs cannot generate all piecewise linear functions with $\ell$ linear regions as soon as $\ell > p$. It is thus natural to seek to characterize specific families of functions with $\ell$ linear regions that can be constructed by DNNs. Iterated Function Systems (IFS) generate sequences of piecewise linear functions $F_k$ with a number of linear regions exponential in $k$. We show that, under mild assumptions, $F_k$ can be generated by a NN using only $\mathcal{O}(k)$ parameters. IFS are used extensively to generate, at low computational cost, natural-looking landscape textures in artificial images. They have also been proposed for compression of natural images, albeit with less commercial success. The surprisingly good performance of this fractal-based compression suggests that our visual system may lock in, to some extent, on self-similarities in images. The combination of this phenomenon with the capacity, demonstrated here, of DNNs to efficiently approximate IFS may contribute to the success of DNNs, particularly striking for image processing tasks, as well as suggest new algorithms for representing self similarities in images based on the DNN mechanism.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration
Authors:
Nadav Dym,
Shahar Ziv Kovalsky
Abstract:
In recent years, several branch-and-bound (BnB) algorithms have been proposed to globally optimize rigid registration problems. In this paper, we suggest a general framework to improve upon the BnB approach, which we name Quasi BnB. Quasi BnB replaces the linear lower bounds used in BnB algorithms with quadratic quasi-lower bounds which are based on the quadratic behavior of the energy in the vici…
▽ More
In recent years, several branch-and-bound (BnB) algorithms have been proposed to globally optimize rigid registration problems. In this paper, we suggest a general framework to improve upon the BnB approach, which we name Quasi BnB. Quasi BnB replaces the linear lower bounds used in BnB algorithms with quadratic quasi-lower bounds which are based on the quadratic behavior of the energy in the vicinity of the global minimum. While quasi-lower bounds are not truly lower bounds, the Quasi-BnB algorithm is globally optimal. In fact we prove that it exhibits linear convergence -- it achieves $ε$-accuracy in $~O(\log(1/ε)) $ time while the time complexity of other rigid registration BnB algorithms is polynomial in $1/ε$. Our experiments verify that Quasi-BnB is significantly more efficient than state-of-the-art BnB algorithms, especially for problems where high accuracy is desired.
△ Less
Submitted 14 April, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
A Linear Variational Principle for Riemann Map**s and Discrete Conformality
Authors:
Nadav Dym,
Yaron Lipman,
Raz Slutsky
Abstract:
We consider Riemann map**s from bounded Lipschitz domains in the plane to a triangle. We show that in this case the Riemann map** has a linear variational principle: it is the minimizer of the Dirichlet energy over an appropriate affine space. By discretizing the variational principle in a natural way we obtain discrete conformal maps which can be computed by solving a sparse linear system. We…
▽ More
We consider Riemann map**s from bounded Lipschitz domains in the plane to a triangle. We show that in this case the Riemann map** has a linear variational principle: it is the minimizer of the Dirichlet energy over an appropriate affine space. By discretizing the variational principle in a natural way we obtain discrete conformal maps which can be computed by solving a sparse linear system. We show that these discrete conformal maps converge to the Riemann map** in $H^1$, even for non-Delaunay triangulations. Additionally, for Delaunay triangulations the discrete conformal maps converge uniformly and are known to be bijective. As a consequence we show that the Riemann map** between two bounded Lipschitz domains can be uniformly approximated by composing the Riemann map**s between each Lipschitz domain and the triangle.
△ Less
Submitted 11 February, 2018; v1 submitted 6 November, 2017;
originally announced November 2017.
-
Sinkhorn Algorithm for Lifted Assignment Problems
Authors:
Yam Kushinsky,
Haggai Maron,
Nadav Dym,
Yaron Lipman
Abstract:
Recently, Sinkhorn's algorithm was applied for approximately solving linear programs emerging from optimal transport very efficiently. This was accomplished by formulating a regularized version of the linear program as Bregman projection problem onto the polytope of doubly-stochastic matrices, and then computing the projection using the efficient Sinkhorn algorithm, which is based on alternating c…
▽ More
Recently, Sinkhorn's algorithm was applied for approximately solving linear programs emerging from optimal transport very efficiently. This was accomplished by formulating a regularized version of the linear program as Bregman projection problem onto the polytope of doubly-stochastic matrices, and then computing the projection using the efficient Sinkhorn algorithm, which is based on alternating closed-form Bregman projections on the larger polytopes of row-stochastic and column-stochastic matrices. In this paper we suggest a generalization of this algorithm for solving a well-known lifted linear program relaxations of the Quadratic Assignment Problem (QAP), which is known as the Johnson Adams (JA) Relaxation. First, an efficient algorithm for Bregman projection onto the JA polytope by alternating closed-form Bregman projections onto one-sided local polytopes is devised. The one-sided polytopes can be seen as a high-dimensional, generalized version of the row/column-stochastic polytopes. Second, a new method for solving the original linear programs using the Bregman projections onto the JA polytope is developed and shown to be more accurate and numerically stable than the standard approach of driving the regularizer to zero. The resulting algorithm is considerably more scalable than standard linear solvers and is able to solve significantly larger linear programs.
△ Less
Submitted 19 July, 2018; v1 submitted 23 July, 2017;
originally announced July 2017.
-
Exact Recovery with Symmetries for the Doubly-Stochastic Relaxation
Authors:
Nadav Dym
Abstract:
Graph matching or quadratic assignment, is the problem of labeling the vertices of two graphs so that they are as similar as possible. A common method for approximately solving the NP-hard graph matching problem is relaxing it to a convex optimization problem over the set of doubly stochastic (DS) matrices. Recent analysis has shown that for almost all pairs of isomorphic and asymmetric graphs, th…
▽ More
Graph matching or quadratic assignment, is the problem of labeling the vertices of two graphs so that they are as similar as possible. A common method for approximately solving the NP-hard graph matching problem is relaxing it to a convex optimization problem over the set of doubly stochastic (DS) matrices. Recent analysis has shown that for almost all pairs of isomorphic and asymmetric graphs, the DS relaxation succeeds in correctly retrieving the isomorphism between the graphs. Our goal in this paper is to analyze the case of symmetric isomorphic graphs. This goal is motivated by shape matching applications where the graphs of interest usually have reflective symmetry.
For symmetric problems the graph matching problem has multiple isomorphisms and so convex relaxations admit all convex combinations of these isomorphisms as viable solutions. If the convex relaxation does not admit any additional superfluous solution we say that it is convex exact. In this case there are tractable algorithms to retrieve an isomorphism from the convex relaxation.
We show that convex exactness depends strongly on the symmetry group of the graphs; For a fixed symmetry group $G$, either the DS relaxation will be convex exact for almost all pairs of isomorphic graphs with symmetry group $G$, or the DS relaxation will fail for all such pairs. We show that for reflective groups with at least one full orbit convex exactness holds almost everywhere, and provide some simple examples of non-reflective symmetry groups for which convex exactness always fails.
When convex exactness holds, the isomorphisms of the graphs are the extreme points of the convex solution set. We suggest an efficient algorithm for retrieving an isomorphism in this case. We also show that the "convex to concave" projection method will also retrieve an isomorphism in this case.
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
DS++: A flexible, scalable and provably tight relaxation for matching problems
Authors:
Nadav Dym,
Haggai Maron,
Yaron Lipman
Abstract:
Correspondence problems are often modelled as quadratic optimization problems over permutations. Common scalable methods for approximating solutions of these NP-hard problems are the spectral relaxation for non-convex energies and the doubly stochastic (DS) relaxation for convex energies. Lately, it has been demonstrated that semidefinite programming relaxations can have considerably improved accu…
▽ More
Correspondence problems are often modelled as quadratic optimization problems over permutations. Common scalable methods for approximating solutions of these NP-hard problems are the spectral relaxation for non-convex energies and the doubly stochastic (DS) relaxation for convex energies. Lately, it has been demonstrated that semidefinite programming relaxations can have considerably improved accuracy at the price of a much higher computational cost. We present a convex quadratic programming relaxation which is provably stronger than both DS and spectral relaxations, with the same scalability as the DS relaxation. The derivation of the relaxation also naturally suggests a projection method for achieving meaningful integer solutions which improves upon the standard closest-permutation projection. Our method can be easily extended to optimization over doubly stochastic matrices, partial or injective matching, and problems with additional linear constraints. We employ recent advances in optimization of linear-assignment type problems to achieve an efficient algorithm for solving the convex relaxation.
We present experiments indicating that our method is more accurate than local minimization or competing relaxations for non-convex problems. We successfully apply our algorithm to shape matching and to the problem of ordering images in a grid, obtaining results which compare favorably with state of the art methods. We believe our results indicate that our method should be considered the method of choice for quadratic optimization over permutations.
△ Less
Submitted 17 May, 2017;
originally announced May 2017.
-
Exact Recovery with Symmetries for Procrustes Matching
Authors:
Nadav Dym,
Yaron Lipman
Abstract:
The Procrustes matching (PM) problem is the problem of finding the optimal rigid motion and labeling of two point sets so that they are as close as possible. Both rigid and non-rigid shape matching problems can be formulated as PM problems. Recently [Maron et al.] presented a novel convex semi-definite programming relaxation (PM-SDP) for PM which achieves state of the art results on common shape m…
▽ More
The Procrustes matching (PM) problem is the problem of finding the optimal rigid motion and labeling of two point sets so that they are as close as possible. Both rigid and non-rigid shape matching problems can be formulated as PM problems. Recently [Maron et al.] presented a novel convex semi-definite programming relaxation (PM-SDP) for PM which achieves state of the art results on common shape matching benchmarks.
In this paper we analyze the successfulness of PM-SDP in solving PM problems without noise (Exact PM problems). We begin by showing Exact PM to be computationally equivalent to the graph isomorphism problem. We demonstrate some natural theoretical properties of the relaxation, and use these properties together with the moment interpretation of [Lasserre] to show that for exact PM problems and for (generic) input shapes which are asymmetric or bilaterally symmetric, the relaxation returns a correct solution of PM.
For symmetric shapes, PM has multiple solutions. The non-convex set of optimal solutions of PM is strictly contained in the convex set of optimal solutions of PM-SDP, so that `most' solutions of PM-SDP will not be solutions of PM. We deal with this by showing the solution set of PM to be the extreme points of the solution set of PM-SDP, and suggesting a random algorithm which returns a solution of PM with probability one, and returns all solutions of PM with equal probability. We also show these results can be extended to the almost-exact case.
To the best of our knowledge, our work is the first to achieve exact recovery in the presence of multiple solutions.
△ Less
Submitted 29 November, 2017; v1 submitted 5 June, 2016;
originally announced June 2016.
-
Spatial Recurrence for Ergodic Fractal Measures
Authors:
Nadav Dym
Abstract:
We discuss an invertible version of Furstenberg's `Ergodic CP Shift Systems'. We show that the explicit regularity of these dynamical systems with respect to magnification of measures, implies certain regularity with respect to translation of measures; We show that the translation action on measures is non-singular, and prove pointwise discrete and continuous ergodic theorems for the translation a…
▽ More
We discuss an invertible version of Furstenberg's `Ergodic CP Shift Systems'. We show that the explicit regularity of these dynamical systems with respect to magnification of measures, implies certain regularity with respect to translation of measures; We show that the translation action on measures is non-singular, and prove pointwise discrete and continuous ergodic theorems for the translation action.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.