-
Wavelets Are All You Need for Autoregressive Image Generation
Authors:
Wael Mattar,
Idan Levy,
Nir Sharon,
Shai Dekel
Abstract:
In this paper, we take a new approach to autoregressive image generation that is based on two main ingredients. The first is wavelet image coding, which allows to tokenize the visual details of an image from coarse to fine details by ordering the information starting with the most significant bits of the most significant wavelet coefficients. The second is a variant of a language transformer whose…
▽ More
In this paper, we take a new approach to autoregressive image generation that is based on two main ingredients. The first is wavelet image coding, which allows to tokenize the visual details of an image from coarse to fine details by ordering the information starting with the most significant bits of the most significant wavelet coefficients. The second is a variant of a language transformer whose architecture is re-designed and optimized for token sequences in this 'wavelet language'. The transformer learns the significant statistical correlations within a token sequence, which are the manifestations of well-known correlations between the wavelet subbands at various resolutions. We show experimental results with conditioning on the generation process.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Reverse Engineering Self-Supervised Learning
Authors:
Ido Ben-Shaul,
Ravid Shwartz-Ziv,
Tomer Galanti,
Shai Dekel,
Yann LeCun
Abstract:
Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge. This paper presents an in-depth empirical analysis of SSL-trained representations, encompassing diverse models, architectures, and hyperparameters. Our study reveals an intriguing aspect of the SSL training process: it inherently…
▽ More
Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge. This paper presents an in-depth empirical analysis of SSL-trained representations, encompassing diverse models, architectures, and hyperparameters. Our study reveals an intriguing aspect of the SSL training process: it inherently facilitates the clustering of samples with respect to semantic labels, which is surprisingly driven by the SSL objective's regularization term. This clustering process not only enhances downstream classification but also compresses the data information. Furthermore, we establish that SSL-trained representations align more closely with semantic classes rather than random classes. Remarkably, we show that learned representations align with semantic classes across various hierarchical levels, and this alignment increases during training and when moving deeper into the network. Our findings provide valuable insights into SSL's representation learning mechanisms and their impact on performance across different sets of classes.
△ Less
Submitted 31 May, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Deep Convolutional Tables: Deep Learning without Convolutions
Authors:
Shay Dekel,
Yosi Keller,
Aharon Bar-Hillel
Abstract:
We propose a novel formulation of deep networks that do not use dot-product neurons and rely on a hierarchy of voting tables instead, denoted as Convolutional Tables (CT), to enable accelerated CPU-based inference. Convolutional layers are the most time-consuming bottleneck in contemporary deep learning techniques, severely limiting their use in Internet of Things and CPU-based devices. The propos…
▽ More
We propose a novel formulation of deep networks that do not use dot-product neurons and rely on a hierarchy of voting tables instead, denoted as Convolutional Tables (CT), to enable accelerated CPU-based inference. Convolutional layers are the most time-consuming bottleneck in contemporary deep learning techniques, severely limiting their use in Internet of Things and CPU-based devices. The proposed CT performs a fern operation at each image location: it encodes the location environment into a binary index and uses the index to retrieve the desired local output from a table. The results of multiple tables are combined to derive the final output. The computational complexity of a CT transformation is independent of the patch (filter) size and grows gracefully with the number of channels, outperforming comparable convolutional layers. It is shown to have a better capacity:compute ratio than dot-product neurons, and that deep CT networks exhibit a universal approximation property similar to neural networks. As the transformation involves computing discrete indices, we derive a soft relaxation and gradient-based approach for training the CT hierarchy. Deep CT networks have been experimentally shown to have accuracy comparable to that of CNNs of similar architectures. In the low compute regime, they enable an error:speed trade-off superior to alternative efficient CNN architectures.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Estimating Extreme 3D Image Rotation with Transformer Cross-Attention
Authors:
Shay Dekel,
Yosi Keller,
Martin Cadik
Abstract:
The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a non-overlap** field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attention-based approach t…
▽ More
The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a non-overlap** field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attention-based approach that utilizes CNN feature maps and a Transformer-Encoder, to compute the cross-attention between the activation maps of the image pairs, which is shown to be an improved equivalent of the 4D correlation volume, used in previous works. In the suggested approach, higher attention scores are associated with image regions that encode visual cues of rotation. Our approach is end-to-end trainable and optimizes a simple regression loss. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks, and establishes a new state-of-the-art accuracy on these datasets. We make our code publicly available.
△ Less
Submitted 8 March, 2024; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Numerical Methods For PDEs Over Manifolds Using Spectral Physics Informed Neural Networks
Authors:
Yuval Zelig,
Shai Dekel
Abstract:
We introduce an approach for solving PDEs over manifolds using physics informed neural networks whose architecture aligns with spectral methods. The networks are trained to take in as input samples of an initial condition, a time stamp and point(s) on the manifold and then output the solution's value at the given time and point(s). We provide proofs of our method for the heat equation on the inter…
▽ More
We introduce an approach for solving PDEs over manifolds using physics informed neural networks whose architecture aligns with spectral methods. The networks are trained to take in as input samples of an initial condition, a time stamp and point(s) on the manifold and then output the solution's value at the given time and point(s). We provide proofs of our method for the heat equation on the interval and examples of unique network architectures that are adapted to nonlinear equations on the sphere and the torus. We also show that our spectral-inspired neural network architectures outperform the standard physics informed architectures. Our extensive experimental results include generalization studies where the testing dataset of initial conditions is randomly sampled from a significantly larger space than the training set.
△ Less
Submitted 3 September, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Exploring the Approximation Capabilities of Multiplicative Neural Networks for Smooth Functions
Authors:
Ido Ben-Shaul,
Tomer Galanti,
Shai Dekel
Abstract:
Multiplication layers are a key component in various influential neural network modules, including self-attention and hypernetwork layers. In this paper, we investigate the approximation capabilities of deep neural networks with intermediate neurons connected by simple multiplication operations. We consider two classes of target functions: generalized bandlimited functions, which are frequently us…
▽ More
Multiplication layers are a key component in various influential neural network modules, including self-attention and hypernetwork layers. In this paper, we investigate the approximation capabilities of deep neural networks with intermediate neurons connected by simple multiplication operations. We consider two classes of target functions: generalized bandlimited functions, which are frequently used to model real-world signals with finite bandwidth, and Sobolev-Type balls, which are embedded in the Sobolev Space $\mathcal{W}^{r,2}$. Our results demonstrate that multiplicative neural networks can approximate these functions with significantly fewer layers and neurons compared to standard ReLU neural networks, with respect to both input dimension and approximation error. These findings suggest that multiplicative gates can outperform standard feed-forward layers and have potential for improving neural network design.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
PR-DAD: Phase Retrieval Using Deep Auto-Decoders
Authors:
Leon Gugel,
Shai Dekel
Abstract:
Phase retrieval is a well known ill-posed inverse problem where one tries to recover images given only the magnitude values of their Fourier transform as input. In recent years, new algorithms based on deep learning have been proposed, providing breakthrough results that surpass the results of the classical methods. In this work we provide a novel deep learning architecture PR-DAD (Phase Retrieval…
▽ More
Phase retrieval is a well known ill-posed inverse problem where one tries to recover images given only the magnitude values of their Fourier transform as input. In recent years, new algorithms based on deep learning have been proposed, providing breakthrough results that surpass the results of the classical methods. In this work we provide a novel deep learning architecture PR-DAD (Phase Retrieval Using Deep Auto- Decoders), whose components are carefully designed based on mathematical modeling of the phase retrieval problem. The architecture provides experimental results that surpass all current results.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Nearest Class-Center Simplification through Intermediate Layers
Authors:
Ido Ben-Shaul,
Shai Dekel
Abstract:
Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold -- where the training error reaches zero. We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these proces…
▽ More
Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold -- where the training error reaches zero. We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these processes occur both in vision and language model architectures. Lastly, we propose a Stochastic Variability-Simplification Loss (SVSL) that encourages better geometrical features in intermediate layers, and improves both train metrics and generalization.
△ Less
Submitted 11 June, 2022; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Sparsity-Probe: Analysis tool for Deep Learning Models
Authors:
Ido Ben-Shaul,
Shai Dekel
Abstract:
We propose a probe for the analysis of deep learning architectures that is based on machine learning and approximation theoretical principles. Given a deep learning architecture and a training set, during or after training, the Sparsity Probe allows to analyze the performance of intermediate layers by quantifying the geometrical features of representations of the training set. We show how the Spar…
▽ More
We propose a probe for the analysis of deep learning architectures that is based on machine learning and approximation theoretical principles. Given a deep learning architecture and a training set, during or after training, the Sparsity Probe allows to analyze the performance of intermediate layers by quantifying the geometrical features of representations of the training set. We show how the Sparsity Probe enables measuring the contribution of adding depth to a given architecture, to detect under-performing layers, etc., all this without any auxiliary test data set.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Wavelet Decomposition of Gradient Boosting
Authors:
Shai Dekel,
Oren Elisha,
Ohad Morgan
Abstract:
In this paper we introduce a significant improvement to the popular tree-based Stochastic Gradient Boosting algorithm using a wavelet decomposition of the trees. This approach is based on harmonic analysis and approximation theoretical elements, and as we show through extensive experimentation, our wavelet based method generally outperforms existing methods, particularly in difficult scenarios of…
▽ More
In this paper we introduce a significant improvement to the popular tree-based Stochastic Gradient Boosting algorithm using a wavelet decomposition of the trees. This approach is based on harmonic analysis and approximation theoretical elements, and as we show through extensive experimentation, our wavelet based method generally outperforms existing methods, particularly in difficult scenarios of class unbalance and mislabeling in the training data.
△ Less
Submitted 3 May, 2019; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Function space analysis of deep learning representation layers
Authors:
Oren Elisha,
Shai Dekel
Abstract:
In this paper we propose a function space approach to Representation Learning and the analysis of the representation layers in deep learning architectures. We show how to compute a weak-type Besov smoothness index that quantifies the geometry of the clustering in the feature space. This approach was already applied successfully to improve the performance of machine learning algorithms such as the…
▽ More
In this paper we propose a function space approach to Representation Learning and the analysis of the representation layers in deep learning architectures. We show how to compute a weak-type Besov smoothness index that quantifies the geometry of the clustering in the feature space. This approach was already applied successfully to improve the performance of machine learning algorithms such as the Random Forest and tree-based Gradient Boosting. Our experiments demonstrate that in well-known and well-performing trained networks, the Besov smoothness of the training set, measured in the corresponding hidden layer feature map representation, increases from layer to layer. We also contribute to the understanding of generalization by showing how the Besov smoothness of the representations, decreases as we add more mis-labeling to the training data. We hope this approach will contribute to the de-mystification of some aspects of deep learning.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Machine olfaction using time scattering of sensor multiresolution graphs
Authors:
Leonid Gugel,
Yoel Shkolnisky,
Shai Dekel
Abstract:
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to cr…
▽ More
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies.
△ Less
Submitted 13 February, 2016;
originally announced February 2016.
-
Stable Support Recovery of Stream of Pulses with Application to Ultrasound Imaging
Authors:
Tamir Bendory,
Avinoam Bar-Zion,
Dan Adam,
Shai Dekel,
Arie Feuer
Abstract:
This paper considers the problem of estimating the delays of a weighted superposition of pulses, called stream of pulses, in a noisy environment. We show that the delays can be estimated using a tractable convex optimization problem with a localization error proportional to the square root of the noise level. Furthermore, all false detections produced by the algorithm have small amplitudes. Numeri…
▽ More
This paper considers the problem of estimating the delays of a weighted superposition of pulses, called stream of pulses, in a noisy environment. We show that the delays can be estimated using a tractable convex optimization problem with a localization error proportional to the square root of the noise level. Furthermore, all false detections produced by the algorithm have small amplitudes. Numerical and in-vitro ultrasound experiments corroborate the theoretical results and demonstrate their applicability for the ultrasound imaging signal processing.
△ Less
Submitted 29 December, 2015; v1 submitted 26 July, 2015;
originally announced July 2015.
-
Unified Convex Optimization Approach to Super-Resolution Based on Localized Kernels
Authors:
Tamir Bendory,
Shai Dekel,
Arie Feuer
Abstract:
The problem of resolving the fine details of a signal from its coarse scale measurements or, as it is commonly referred to in the literature, the super-resolution problem arises naturally in engineering and physics in a variety of settings. We suggest a unified convex optimization approach for super-resolution. The key is the construction of an interpolating polynomial based on localized kernels.…
▽ More
The problem of resolving the fine details of a signal from its coarse scale measurements or, as it is commonly referred to in the literature, the super-resolution problem arises naturally in engineering and physics in a variety of settings. We suggest a unified convex optimization approach for super-resolution. The key is the construction of an interpolating polynomial based on localized kernels. We also show that the localized kernels act as the connecting thread to another wide-spread problem of stream of pulses.
△ Less
Submitted 12 April, 2015; v1 submitted 8 January, 2015;
originally announced January 2015.
-
Exact recovery of non-uniform splines from the projection onto spaces of algebraic polynomials
Authors:
Tamir Bendory,
Shai Dekel,
Arie Feuer
Abstract:
In this work we consider the problem of recovering non-uniform splines from their projection onto spaces of algebraic polynomials. We show that under a certain Chebyshev-type separation condition on its knots, a spline whose inner-products with a polynomial basis and boundary conditions are known, can be recovered using Total Variation norm minimization. The proof of the uniqueness of the solution…
▽ More
In this work we consider the problem of recovering non-uniform splines from their projection onto spaces of algebraic polynomials. We show that under a certain Chebyshev-type separation condition on its knots, a spline whose inner-products with a polynomial basis and boundary conditions are known, can be recovered using Total Variation norm minimization. The proof of the uniqueness of the solution uses the method of `dual' interpolating polynomials and is based on \cite{SR}, where the theory was developed for trigonometric polynomials. We also show results for the multivariate case.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Exact recovery of Dirac ensembles from the projection onto spaces of spherical harmonics
Authors:
Tamir Bendory,
Shai Dekel,
Arie Feuer
Abstract:
In this work we consider the problem of recovering an ensemble of Diracs on the sphere from its projection onto spaces of spherical harmonics. We show that under an appropriate separation condition on the unknown locations of the Diracs, the ensemble can be recovered through Total Variation norm minimization. The proof of the uniqueness of the solution uses the method of `dual' interpolating polyn…
▽ More
In this work we consider the problem of recovering an ensemble of Diracs on the sphere from its projection onto spaces of spherical harmonics. We show that under an appropriate separation condition on the unknown locations of the Diracs, the ensemble can be recovered through Total Variation norm minimization. The proof of the uniqueness of the solution uses the method of `dual' interpolating polynomials and is based on [8], where the theory was developed for trigonometric polynomials. We also show that in the special case of non-negative ensembles, a sparsity condition is sufficient for exact recovery.
△ Less
Submitted 10 December, 2014;
originally announced December 2014.
-
Super-resolution on the Sphere using Convex Optimization
Authors:
Tamir Bendory,
Shai Dekel,
Arie Feuer
Abstract:
This paper considers the problem of recovering an ensemble of Diracs on a sphere from its low resolution measurements. The Diracs can be located at any location on the sphere, not necessarily on a grid. We show that under a separation condition, one can recover the ensemble with high precision by a three-stage algorithm, which consists of solving a semi-definite program, root finding and least-squ…
▽ More
This paper considers the problem of recovering an ensemble of Diracs on a sphere from its low resolution measurements. The Diracs can be located at any location on the sphere, not necessarily on a grid. We show that under a separation condition, one can recover the ensemble with high precision by a three-stage algorithm, which consists of solving a semi-definite program, root finding and least-square fitting. The algorithm's computation time depends solely on the number of measurements, and not on the required solution accuracy. We also show that in the special case of non-negative ensembles, a sparsity condition is sufficient for recovery. Furthermore, in the discrete setting, we estimate the recovery error in the presence of noise as a function of the noise level and the super-resolution factor.
△ Less
Submitted 7 January, 2015; v1 submitted 10 December, 2014;
originally announced December 2014.
-
Robust Recovery of Stream of Pulses using Convex Optimization
Authors:
Tamir Bendory,
Shai Dekel,
Arie Feuer
Abstract:
This paper considers the problem of recovering the delays and amplitudes of a weighted superposition of pulses. This problem is motivated by a variety of applications such as ultrasound and radar. We show that for univariate and bivariate stream of pulses, one can recover the delays and weights to any desired accuracy by solving a tractable convex optimization problem, provided that a pulse-depend…
▽ More
This paper considers the problem of recovering the delays and amplitudes of a weighted superposition of pulses. This problem is motivated by a variety of applications such as ultrasound and radar. We show that for univariate and bivariate stream of pulses, one can recover the delays and weights to any desired accuracy by solving a tractable convex optimization problem, provided that a pulse-dependent separation condition is satisfied. The main result of this paper states that the recovery is robust to additive noise or model mismatch.
△ Less
Submitted 27 April, 2016; v1 submitted 10 December, 2014;
originally announced December 2014.