-
Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice
Authors:
Nir Ailon,
Omer Leibovich,
Vineet Nair
Abstract:
A butterfly network consists of logarithmically many layers, each with a linear number of non-zero weights (pre-specified). The fast Johnson-Lindenstrauss transform (FJLT) can be represented as a butterfly network followed by a projection onto a random subset of the coordinates. Moreover, a random matrix based on FJLT with high probability approximates the action of any matrix on a vector. Motivat…
▽ More
A butterfly network consists of logarithmically many layers, each with a linear number of non-zero weights (pre-specified). The fast Johnson-Lindenstrauss transform (FJLT) can be represented as a butterfly network followed by a projection onto a random subset of the coordinates. Moreover, a random matrix based on FJLT with high probability approximates the action of any matrix on a vector. Motivated by these facts, we propose to replace a dense linear layer in any neural network by an architecture based on the butterfly network. The proposed architecture significantly improves upon the quadratic number of weights required in a standard dense layer to nearly linear with little compromise in expressibility of the resulting operator. In a collection of wide variety of experiments, including supervised prediction on both the NLP and vision data, we show that this not only produces results that match and at times outperform existing well-known architectures, but it also offers faster training and prediction in deployment. To understand the optimization problems posed by neural networks with a butterfly network, we also study the optimization landscape of the encoder-decoder network, where the encoder is replaced by a butterfly network followed by a dense linear layer in smaller dimension. Theoretical result presented in the paper explains why the training speed and outcome are not compromised by our proposed approach.
△ Less
Submitted 4 July, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Spatial contrasting for deep unsupervised learning
Authors:
Elad Hoffer,
Itay Hubara,
Nir Ailon
Abstract:
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training me…
▽ More
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training methods. In this work we present a novel approach for unsupervised training of Convolutional networks that is based on contrasting between spatial regions within images. This criterion can be employed within conventional neural networks and trained using standard techniques such as SGD and back-propagation, thus complementing supervised methods.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Deep unsupervised learning through spatial contrasting
Authors:
Elad Hoffer,
Itay Hubara,
Nir Ailon
Abstract:
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training me…
▽ More
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training methods. In this work we present a novel approach for unsupervised training of Convolutional networks that is based on contrasting between spatial regions within images. This criterion can be employed within conventional neural networks and trained using standard techniques such as SGD and back-propagation, thus complementing supervised methods.
△ Less
Submitted 4 December, 2018; v1 submitted 2 October, 2016;
originally announced October 2016.
-
Deep metric learning using Triplet network
Authors:
Elad Hoffer,
Nir Ailon
Abstract:
Deep learning has proven itself as a successful set of models for learning useful semantic representations of data. These, however, are mostly implicitly learned as part of a classification task. In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons. A similar model was defined by Wang et al. (2014), tailor made for learning a rankin…
▽ More
Deep learning has proven itself as a successful set of models for learning useful semantic representations of data. These, however, are mostly implicitly learned as part of a classification task. In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons. A similar model was defined by Wang et al. (2014), tailor made for learning a ranking for image information retrieval. Here we demonstrate using various datasets that our model learns a better representation than that of its immediate competitor, the Siamese network. We also discuss future possible usage as a framework for unsupervised learning.
△ Less
Submitted 4 December, 2018; v1 submitted 20 December, 2014;
originally announced December 2014.
-
Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback
Authors:
Nir Ailon
Abstract:
Given a set $V$ of $n$ objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in $V$, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset…
▽ More
Given a set $V$ of $n$ objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in $V$, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset $U$ of at most $k$ elements in $V$, and the loss is the sum of the positions of those elements.
We present an algorithm of expected regret $O(n^{3/2}\sqrt{Tk})$ over a time horizon of $T$ steps with respect to the best single ranking in hindsight. This improves previous algorithms and analyses either by a factor of either $Ω(\sqrt{k})$, a factor of $Ω(\sqrt{\log n})$ or by improving running time from quadratic to $O(n\log n)$ per round. We also prove a matching lower bound. Our techniques also imply an improved regret bound for online rank aggregation over the Spearman correlation measure, and to other more complex ranking loss functions.
△ Less
Submitted 14 October, 2013; v1 submitted 30 August, 2013;
originally announced August 2013.
-
Breaking the Small Cluster Barrier of Graph Clustering
Authors:
Nir Ailon,
Yudong Chen,
Xu Huan
Abstract:
This paper investigates graph clustering in the planted cluster model in the presence of {\em small clusters}. Traditional results dictate that for an algorithm to provably correctly recover the clusters, {\em all} clusters must be sufficiently large (in particular, $\tildeΩ(\sqrt{n})$ where $n$ is the number of nodes of the graph). We show that this is not really a restriction: by a more refined…
▽ More
This paper investigates graph clustering in the planted cluster model in the presence of {\em small clusters}. Traditional results dictate that for an algorithm to provably correctly recover the clusters, {\em all} clusters must be sufficiently large (in particular, $\tildeΩ(\sqrt{n})$ where $n$ is the number of nodes of the graph). We show that this is not really a restriction: by a more refined analysis of the trace-norm based recovery approach proposed in Jalali et al. (2011) and Chen et al. (2012), we prove that small clusters, under certain mild assumptions, do not hinder recovery of large ones.
Based on this result, we further devise an iterative algorithm to recover {\em almost all clusters} via a "peeling strategy", i.e., recover large clusters first, leading to a reduced problem, and repeat this procedure. These results are extended to the {\em partial observation} setting, in which only a (chosen) part of the graph is observed.The peeling strategy gives rise to an active learning algorithm, in which edges adjacent to smaller clusters are queried more often as large clusters are learned (and removed).
From a high level, this paper sheds novel insights on high-dimensional statistics and learning structured data, by presenting a structured matrix learning problem for which a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxationsdoes the job.
△ Less
Submitted 20 February, 2013; v1 submitted 19 February, 2013;
originally announced February 2013.