-
Fronts under arrest II: analytical foundations
Authors:
James H. von Brecht,
Scott G. McCalla,
Eun Heui Kim
Abstract:
We study a class of minimal geometric partial differential equations that serves as a framework to understand the evolution of boundaries between states in different pattern forming systems. The framework combines normal growth, curvature flow and nonlocal interaction terms to track the motion of these interfaces. This approach was first developed to understand arrested fronts in a bacterial syste…
▽ More
We study a class of minimal geometric partial differential equations that serves as a framework to understand the evolution of boundaries between states in different pattern forming systems. The framework combines normal growth, curvature flow and nonlocal interaction terms to track the motion of these interfaces. This approach was first developed to understand arrested fronts in a bacterial system. These are fronts that become stationary as they grow into each other. This paper establishes analytic foundations and geometric insight for studying this class of equations. In so doing, an efficient numerical scheme is developed and employed to gain further insight into the dynamics of these general pattern forming systems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Feature Collapse
Authors:
Thomas Laurent,
James H. von Brecht,
Xavier Bresson
Abstract:
We formalize and study a phenomenon called feature collapse that makes precise the intuitive idea that entities playing a similar role in a learning task receive similar representations. As feature collapse requires a notion of task, we leverage a simple but prototypical NLP task to study it. We start by showing experimentally that feature collapse goes hand in hand with generalization. We then pr…
▽ More
We formalize and study a phenomenon called feature collapse that makes precise the intuitive idea that entities playing a similar role in a learning task receive similar representations. As feature collapse requires a notion of task, we leverage a simple but prototypical NLP task to study it. We start by showing experimentally that feature collapse goes hand in hand with generalization. We then prove that, in the large sample limit, distinct words that play identical roles in this NLP task receive identical local feature representations in a neural network. This analysis reveals the crucial role that normalization mechanisms, such as LayerNorm, play in feature collapse and in generalization.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Long-Tailed Learning Requires Feature Learning
Authors:
Thomas Laurent,
James H. von Brecht,
Xavier Bresson
Abstract:
We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it ident…
▽ More
We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it identifies the correct features, and moreover derive non-asymptotic generalization error bounds that precisely quantify the penalty that one must pay for not learning features.
△ Less
Submitted 28 December, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
The Multilinear Structure of ReLU Networks
Authors:
Thomas Laurent,
James von Brecht
Abstract:
We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly…
▽ More
We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly flat. Non-differentiable minima are therefore not technicalities or pathologies; they are heart of the problem when investigating the loss of ReLU networks. As a consequence, we must employ techniques from nonsmooth analysis to study these loss surfaces. We show how to apply these techniques in some illustrative cases.
△ Less
Submitted 23 July, 2018; v1 submitted 29 December, 2017;
originally announced December 2017.
-
Deep linear neural networks with arbitrary loss: All local minima are global
Authors:
Thomas Laurent,
James von Brecht
Abstract:
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differenti…
▽ More
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.
△ Less
Submitted 24 July, 2018; v1 submitted 4 December, 2017;
originally announced December 2017.
-
Dynamics of Embedded Curves by Doubly-Nonlocal Reaction-Diffusion Systems
Authors:
James H. von Brecht,
Ryan Blair
Abstract:
We study a class of nonlocal, energy-driven dynamical models that govern the motion of closed, embedded curves from both an energetic and dynamical perspective. Our energetic results provide a variety of ways to understand physically motivated energetic models in terms of more classical, combinatorial measures of complexity for embedded curves. This line of investigation culminates in a family of…
▽ More
We study a class of nonlocal, energy-driven dynamical models that govern the motion of closed, embedded curves from both an energetic and dynamical perspective. Our energetic results provide a variety of ways to understand physically motivated energetic models in terms of more classical, combinatorial measures of complexity for embedded curves. This line of investigation culminates in a family of complexity bounds that relate a rather broad class of models to a generalized, or weighted, variant of the crossing number. Our dynamic results include global well-posedness of the associated partial differential equations, regularity of equilibria for these flows as well as a more detailed investigation of dynamics near such equilibria. Finally, we explore a few global dynamical properties of these models numerically.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
A recurrent neural network without chaos
Authors:
Thomas Laurent,
James von Brecht
Abstract:
We introduce an exceptionally simple gated recurrent neural network (RNN) that achieves performance comparable to well-known gated architectures, such as LSTMs and GRUs, on the word-level language modeling task. We prove that our model has simple, predicable and non-chaotic dynamics. This stands in stark contrast to more standard gated architectures, whose underlying dynamical systems exhibit chao…
▽ More
We introduce an exceptionally simple gated recurrent neural network (RNN) that achieves performance comparable to well-known gated architectures, such as LSTMs and GRUs, on the word-level language modeling task. We prove that our model has simple, predicable and non-chaotic dynamics. This stands in stark contrast to more standard gated architectures, whose underlying dynamical systems exhibit chaotic behavior.
△ Less
Submitted 19 December, 2016;
originally announced December 2016.
-
Estimating perimeter using graph cuts
Authors:
Nicolás García Trillos,
Dejan Slepčev,
James von Brecht
Abstract:
We investigate the estimation of the perimeter of a set by a graph cut of a random geometric graph. For $Ω\subset D = (0,1)^d$, with $d \geq 2$, we are given $n$ random i.i.d. points on $D$ whose membership in $Ω$ is known. We consider the sample as a random geometric graph with connection distance $\varepsilon>0$. We estimate the perimeter of $Ω$ (relative to $D$) by the, appropriately rescaled,…
▽ More
We investigate the estimation of the perimeter of a set by a graph cut of a random geometric graph. For $Ω\subset D = (0,1)^d$, with $d \geq 2$, we are given $n$ random i.i.d. points on $D$ whose membership in $Ω$ is known. We consider the sample as a random geometric graph with connection distance $\varepsilon>0$. We estimate the perimeter of $Ω$ (relative to $D$) by the, appropriately rescaled, graph cut between the vertices in $Ω$ and the vertices in $D \backslash Ω$. We obtain bias and variance estimates on the error, which are optimal in scaling with respect to $n$ and $\varepsilon$. We consider two scaling regimes: the dense (when the average degree of the vertices goes to $\infty$) and the sparse one (when the degree goes to $0$). In the dense regime there is a crossover in the nature of approximation at dimension $d=5$: we show that in low dimensions $d=2,3,4$ one can obtain confidence intervals for the approximation error, while in higher dimensions one can only obtain error estimates for testing the hypothesis that the perimeter is less than a given number.
△ Less
Submitted 14 August, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.
-
Enhanced Lasso Recovery on Graph
Authors:
Xavier Bresson,
Thomas Laurent,
James von Brecht
Abstract:
This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperfo…
▽ More
This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperforms the standard convex Lasso technique. We carry out numerical experiments on three benchmark graph datasets.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
Consistency of Cheeger and Ratio Graph Cuts
Authors:
Nicolas Garcia Trillos,
Dejan Slepcev,
James von Brecht,
Thomas Laurent,
Xavier Bresson
Abstract:
This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We s…
▽ More
This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We show that minimizers of the these cuts converge as the sample size increases to a minimizer of a corresponding continuum cut (which partitions the ground truth measure). Moreover, we obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the consistency to hold. We provide results for two-way and for multiway cuts. Furthermore we provide numerical experiments that illustrate the results and explore the optimality of scaling in dimension two.
△ Less
Submitted 24 November, 2014;
originally announced November 2014.
-
An Incremental Reseeding Strategy for Clustering
Authors:
Xavier Bresson,
Huiyi Hu,
Thomas Laurent,
Arthur Szlam,
James von Brecht
Abstract:
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves…
▽ More
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and METIS that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy.
△ Less
Submitted 15 June, 2014;
originally announced June 2014.
-
Multiclass Total Variation Clustering
Authors:
Xavier Bresson,
Thomas Laurent,
David Uminsky,
James H. von Brecht
Abstract:
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on…
▽ More
Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
Stability Analysis of Flock and Mill rings for 2nd Order Models in Swarming
Authors:
G. Albi,
D. Balagué,
J. A. Carrillo,
J. von Brecht
Abstract:
We study the linear stability of flock and mill ring solutions of two individual based models for biological swarming. The individuals interact via a nonlocal interaction potential that is repulsive in the short range and attractive in the long range. We relate the instability of the flock rings with the instability of the ring solution of the first order model. We observe that repulsive-attractiv…
▽ More
We study the linear stability of flock and mill ring solutions of two individual based models for biological swarming. The individuals interact via a nonlocal interaction potential that is repulsive in the short range and attractive in the long range. We relate the instability of the flock rings with the instability of the ring solution of the first order model. We observe that repulsive-attractive interactions lead to new configurations for the flock rings such as clustering and fattening formation. Finally, we numerically explore mill patterns arising from this kind of interactions together with the asymptotic speed of the system.
△ Less
Submitted 19 April, 2013;
originally announced April 2013.
-
An Adaptive Total Variation Algorithm for Computing the Balanced Cut of a Graph
Authors:
Xavier Bresson,
Thomas Laurent,
David Uminsky,
James H. von Brecht
Abstract:
We propose an adaptive version of the total variation algorithm proposed in [3] for computing the balanced cut of a graph. The algorithm from [3] used a sequence of inner total variation minimizations to guarantee descent of the balanced cut energy as well as convergence of the algorithm. In practice the total variation minimization step is never solved exactly. Instead, an accuracy parameter is s…
▽ More
We propose an adaptive version of the total variation algorithm proposed in [3] for computing the balanced cut of a graph. The algorithm from [3] used a sequence of inner total variation minimizations to guarantee descent of the balanced cut energy as well as convergence of the algorithm. In practice the total variation minimization step is never solved exactly. Instead, an accuracy parameter is specified and the total variation minimization terminates once this level of accuracy is reached. The choice of this parameter can vastly impact both the computational time of the overall algorithm as well as the accuracy of the result. Moreover, since the total variation minimization step is not solved exactly, the algorithm is not guarantied to be monotonic. In the present work we introduce a new adaptive stop** condition for the total variation minimization that guarantees monotonicity. This results in an algorithm that is actually monotonic in practice and is also significantly faster than previous, non-adaptive algorithms.
△ Less
Submitted 12 February, 2013;
originally announced February 2013.
-
Convergence of a Steepest Descent Algorithm for Ratio Cut Clustering
Authors:
Xavier Bresson,
Thomas Laurent,
David Uminsky,
James H. von Brecht
Abstract:
Unsupervised clustering of scattered, noisy and high-dimensional data points is an important and difficult problem. Tight continuous relaxations of balanced cut problems have recently been shown to provide excellent clustering results. In this paper, we present an explicit-implicit gradient flow scheme for the relaxed ratio cut problem, and prove that the algorithm converges to a critical point of…
▽ More
Unsupervised clustering of scattered, noisy and high-dimensional data points is an important and difficult problem. Tight continuous relaxations of balanced cut problems have recently been shown to provide excellent clustering results. In this paper, we present an explicit-implicit gradient flow scheme for the relaxed ratio cut problem, and prove that the algorithm converges to a critical point of the energy. We also show the efficiency of the proposed algorithm on the two moons dataset.
△ Less
Submitted 29 April, 2012;
originally announced April 2012.