Skip to main content

Showing 1–26 of 26 results for author: Galanti, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14105  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    Distributed Speculative Inference of Large Language Models

    Authors: Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel

    Abstract: Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI al… ▽ More

    Submitted 28 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2306.01610  [pdf, other

    cs.LG

    Centered Self-Attention Layers

    Authors: Ameen Ali, Tomer Galanti, Lior Wolf

    Abstract: The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied within deep learning architectures. We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers for different tokens in transformers and different nodes in graph neural networks. Based on our analysis, we present a… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  3. arXiv:2305.15614  [pdf, other

    cs.LG cs.AI

    Reverse Engineering Self-Supervised Learning

    Authors: Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann LeCun

    Abstract: Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge. This paper presents an in-depth empirical analysis of SSL-trained representations, encompassing diverse models, architectures, and hyperparameters. Our study reveals an intriguing aspect of the SSL training process: it inherently… ▽ More

    Submitted 31 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  4. arXiv:2303.13093  [pdf, other

    cs.LG math.OC physics.data-an

    Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent

    Authors: Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda

    Abstract: Characterizing and understanding the dynamics of stochastic gradient descent (SGD) around saddle points remains an open problem. We first show that saddle points in neural networks can be divided into two types, among which the Type-II saddles are especially difficult to escape from because the gradient noise vanishes at the saddle. The dynamics of SGD around these saddles are thus to leading orde… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: preprint

  5. arXiv:2301.12033  [pdf, other

    cs.LG

    Norm-based Generalization Bounds for Compositionally Sparse Neural Networks

    Authors: Tomer Galanti, Mengjia Xu, Liane Galanti, Tomaso Poggio

    Abstract: In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks. These bounds differ from previous ones, as they consider the norms of the convolutional filters instead of the norms of the associated Toepli… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  6. arXiv:2301.04605  [pdf, ps, other

    cs.LG cs.NE math.FA

    Exploring the Approximation Capabilities of Multiplicative Neural Networks for Smooth Functions

    Authors: Ido Ben-Shaul, Tomer Galanti, Shai Dekel

    Abstract: Multiplication layers are a key component in various influential neural network modules, including self-attention and hypernetwork layers. In this paper, we investigate the approximation capabilities of deep neural networks with intermediate neurons connected by simple multiplication operations. We consider two classes of target functions: generalized bandlimited functions, which are frequently us… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    MSC Class: 41A25; 68Q32; 68T07

  7. arXiv:2212.12532  [pdf, other

    cs.LG

    Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers

    Authors: Tomer Galanti, András György, Marcus Hutter

    Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer a theoretical expl… ▽ More

    Submitted 16 July, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2112.15121

  8. arXiv:2206.05794  [pdf, other

    cs.LG stat.ML

    Characterizing the Implicit Bias of Regularized SGD in Rank Minimization

    Authors: Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

    Abstract: We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep neural networks. Our results show that training neural networks with mini-batch SGD and weight decay causes a bias towards rank minimization over the weight matrices. Specifically, we show, both theoretically and empirically, that this bias is more pronounced when using smaller batch sizes,… ▽ More

    Submitted 25 October, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

  9. arXiv:2202.09028  [pdf, other

    cs.LG

    On the Implicit Bias Towards Minimal Depth of Deep Neural Networks

    Authors: Tomer Galanti, Liane Galanti, Ido Ben-Shaul

    Abstract: Recent results in the literature suggest that the penultimate (second-to-last) layer representations of neural networks that are trained for classification exhibit a clustering property called neural collapse (NC). We study the implicit bias of stochastic gradient descent (SGD) in favor of low-depth solutions when training deep neural networks. We characterize a notion of effective depth that meas… ▽ More

    Submitted 27 September, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

  10. arXiv:2112.15121  [pdf, other

    cs.LG

    On the Role of Neural Collapse in Transfer Learning

    Authors: Tomer Galanti, András György, Marcus Hutter

    Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an… ▽ More

    Submitted 3 January, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

  11. arXiv:2110.02900  [pdf, other

    cs.CV

    Meta Internal Learning

    Authors: Raphael Bensadoun, Shir Gur, Tomer Galanti, Lior Wolf

    Abstract: Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image. Since these models are trained on a single image, they are limited in their scale and application. To overcome these issues, we propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  12. arXiv:2106.04180  [pdf, other

    cs.CV cs.AI cs.RO

    Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

    Authors: Chenfeng Xu, Shijia Yang, Tomer Galanti, Bichen Wu, Xiangyu Yue, Bohan Zhai, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

    Abstract: 3D point-clouds and 2D images are different visual representations of the physical world. While human vision can understand both representations, computer vision models designed for 2D image and 3D point-cloud understanding are quite different. Our paper explores the potential of transferring 2D model architectures and weights to understand 3D point-clouds, by empirically investigating the feasibi… ▽ More

    Submitted 23 April, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: The code is avaliable at: \url{https://github.com/chenfengxu714/image2point}

  13. arXiv:2103.11888  [pdf, other

    cs.LG

    Weakly Supervised Recovery of Semantic Attributes

    Authors: Ameen Ali, Tomer Galanti, Evgeniy Zheltonozhskiy, Chaim Baskin, Lior Wolf

    Abstract: We consider the problem of the extraction of semantic attributes, supervised only with classification labels. For example, when learning to classify images of birds into species, we would like to observe the emergence of features that zoologists use to classify birds. To tackle this problem, we propose training a neural network with discrete features in the last layer, which is followed by two hea… ▽ More

    Submitted 11 June, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  14. arXiv:2004.12361  [pdf, other

    cs.CV cs.LG eess.IV

    Evaluation Metrics for Conditional Image Generation

    Authors: Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fre'chet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterpart… ▽ More

    Submitted 8 February, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: To be published in "INTERNATIONAL JOURNAL OF COMPUTER VISION"

  15. arXiv:2003.12193  [pdf, other

    cs.LG stat.ML

    On Infinite-Width Hypernetworks

    Authors: Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang

    Abstract: {\em Hypernetworks} are architectures that produce the weights of a task-specific {\em primary network}. A notable application of hypernetworks in the recent literature involves learning to output functional representations. In these scenarios, the hypernetwork learns a representation corresponding to the weights of a shallow MLP, which typically encodes shape or image information. While such repr… ▽ More

    Submitted 22 February, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: The first two authors contributed equally

  16. arXiv:2002.10007  [pdf, other

    cs.LG cs.AI stat.ML

    A Critical View of the Structural Causal Model

    Authors: Tomer Galanti, Ofir Nabati, Lior Wolf

    Abstract: In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  17. arXiv:2002.10006  [pdf, other

    cs.LG stat.ML

    On the Modularity of Hypernetworks

    Authors: Tomer Galanti, Lior Wolf

    Abstract: In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $θ_I$ of the function $h_I(x) = g(x;θ_I)$ ar… ▽ More

    Submitted 2 November, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

    Comments: Accepted to Advances in Neural Information Processing Systems (NeurIPS) 2020

  18. arXiv:2001.10460  [pdf, other

    cs.LG stat.ML

    On Random Kernels of Residual Architectures

    Authors: Etai Littwin, Tomer Galanti, Lior Wolf

    Abstract: We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our analysis reveals that finite size residual architectures are initialized much closer to the "kernel regime" than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix the depth, while increasing the layers' width. Our fi… ▽ More

    Submitted 17 June, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

  19. arXiv:2001.05207  [pdf, ps, other

    cs.LG stat.ML

    A Formal Approach to Explainability

    Authors: Lior Wolf, Tomer Galanti, Tamir Hazan

    Abstract: We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these properties and between explanation-generating functions and intermediate representations of learned models and are able to show, for example, that if the activations of… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Journal ref: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, January 2019, Pages 255-261

  20. arXiv:2001.05026  [pdf, other

    cs.LG stat.ML

    Unsupervised Learning of the Set of Local Maxima

    Authors: Lior Wolf, Sagie Benaim, Tomer Galanti

    Abstract: This paper describes a new form of unsupervised learning, whose input is a set of unlabeled points that are assumed to be local maxima of an unknown value function v in an unknown subset of the vector space. Two functions are learned: (i) a set indicator c, which is a binary classifier, and (ii) a comparator function h that given two nearby samples, predicts which sample has the higher value of th… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: ICLR 2019

  21. arXiv:2001.05017  [pdf, other

    cs.CV cs.LG

    Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer

    Authors: Ori Press, Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: We study the problem of learning to map, in an unsupervised way, between domains A and B, such that the samples b in B contain all the information that exists in samples a in A and some additional information. For example, ignoring occlusions, B can be people with glasses, A people without, and the glasses, would be the added information. When map** a sample a from the first domain to the other… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Journal ref: ICLR 2019

  22. arXiv:1908.11628  [pdf, other

    cs.CV

    Domain Intersection and Domain Difference

    Authors: Sagie Benaim, Michael Khaitov, Tomer Galanti, Lior Wolf

    Abstract: We present a method for recovering the shared content between two visual domains as well as the content that is unique to each domain. This allows us to map from one domain to the other, in a way in which the content that is specific for the first domain is removed and the content that is specific for the second is imported from any image in the second domain. In addition, our method enables gener… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Journal ref: ICCV 2019

  23. arXiv:1807.08501  [pdf, other

    cs.LG stat.ML

    Risk Bounds for Unsupervised Cross-Domain Map** with IPMs

    Authors: Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: The recent empirical success of unsupervised cross-domain map** algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such map**s. We work with adversarial training methods based on IPMs and derive a novel risk bound, which upper bounds the risk between the lear… ▽ More

    Submitted 2 November, 2020; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: arXiv admin note: text overlap with arXiv:1709.00074

  24. arXiv:1712.07886  [pdf, other

    cs.LG

    Estimating the Success of Unsupervised Image to Image Translation

    Authors: Sagie Benaim, Tomer Galanti, Lior Wolf

    Abstract: While in supervised learning, the validation error is an unbiased estimator of the generalization (test) error and complexity-based generalization bounds are abundant, no such bounds exist for learning a map** in an unsupervised way. As a result, when training GANs and specifically when using GANs for learning to map between domains in a completely unsupervised way, one is forced to select the h… ▽ More

    Submitted 22 March, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: The first and second authors contributed equally

  25. arXiv:1709.00074  [pdf, other

    cs.LG

    The Role of Minimal Complexity Functions in Unsupervised Learning of Semantic Map**s

    Authors: Tomer Galanti, Lior Wolf, Sagie Benaim

    Abstract: We discuss the feasibility of the following learning problem: given unmatched samples from two domains and nothing else, learn a map** between the two, which preserves semantics. Due to the lack of paired samples and without any definition of the semantic information, the problem might seem ill-posed. Specifically, in typical cases, it seems possible to build infinitely many alternative map**s… ▽ More

    Submitted 15 January, 2020; v1 submitted 31 August, 2017; originally announced September 2017.

  26. arXiv:1703.01606  [pdf, ps, other

    cs.LG stat.ML

    A Theory of Output-Side Unsupervised Domain Adaptation

    Authors: Tomer Galanti, Lior Wolf

    Abstract: When learning a map** from an input space to an output space, the assumption that the sample distribution of the training data is the same as that of the test data is often violated. Unsupervised domain shift methods adapt the learned function in order to correct for this shift. Previous work has focused on utilizing unlabeled samples from the target distribution. We consider the complementary p… ▽ More

    Submitted 5 March, 2017; originally announced March 2017.