Search | arXiv e-print repository

Fronts under arrest II: analytical foundations

Authors: James H. von Brecht, Scott G. McCalla, Eun Heui Kim

Abstract: We study a class of minimal geometric partial differential equations that serves as a framework to understand the evolution of boundaries between states in different pattern forming systems. The framework combines normal growth, curvature flow and nonlocal interaction terms to track the motion of these interfaces. This approach was first developed to understand arrested fronts in a bacterial syste… ▽ More We study a class of minimal geometric partial differential equations that serves as a framework to understand the evolution of boundaries between states in different pattern forming systems. The framework combines normal growth, curvature flow and nonlocal interaction terms to track the motion of these interfaces. This approach was first developed to understand arrested fronts in a bacterial system. These are fronts that become stationary as they grow into each other. This paper establishes analytic foundations and geometric insight for studying this class of equations. In so doing, an efficient numerical scheme is developed and employed to gain further insight into the dynamics of these general pattern forming systems. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Code can be found at: https://github.com/scottgmccalla/AnalyticFoundations

MSC Class: 35A99

arXiv:2305.16162 [pdf, other]

Feature Collapse

Authors: Thomas Laurent, James H. von Brecht, Xavier Bresson

Abstract: We formalize and study a phenomenon called feature collapse that makes precise the intuitive idea that entities playing a similar role in a learning task receive similar representations. As feature collapse requires a notion of task, we leverage a simple but prototypical NLP task to study it. We start by showing experimentally that feature collapse goes hand in hand with generalization. We then pr… ▽ More We formalize and study a phenomenon called feature collapse that makes precise the intuitive idea that entities playing a similar role in a learning task receive similar representations. As feature collapse requires a notion of task, we leverage a simple but prototypical NLP task to study it. We start by showing experimentally that feature collapse goes hand in hand with generalization. We then prove that, in the large sample limit, distinct words that play identical roles in this NLP task receive identical local feature representations in a neural network. This analysis reveals the crucial role that normalization mechanisms, such as LayerNorm, play in feature collapse and in generalization. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2205.14553 [pdf, other]

Long-Tailed Learning Requires Feature Learning

Authors: Thomas Laurent, James H. von Brecht, Xavier Bresson

Abstract: We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it ident… ▽ More We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it identifies the correct features, and moreover derive non-asymptotic generalization error bounds that precisely quantify the penalty that one must pay for not learning features. △ Less

Submitted 28 December, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

arXiv:1712.10132 [pdf, other]

The Multilinear Structure of ReLU Networks

Authors: Thomas Laurent, James von Brecht

Abstract: We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly… ▽ More We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly flat. Non-differentiable minima are therefore not technicalities or pathologies; they are heart of the problem when investigating the loss of ReLU networks. As a consequence, we must employ techniques from nonsmooth analysis to study these loss surfaces. We show how to apply these techniques in some illustrative cases. △ Less

Submitted 23 July, 2018; v1 submitted 29 December, 2017; originally announced December 2017.

arXiv:1712.01473 [pdf, ps, other]

Deep linear neural networks with arbitrary loss: All local minima are global

Authors: Thomas Laurent, James von Brecht

Abstract: We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differenti… ▽ More We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima. △ Less

Submitted 24 July, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

arXiv:1711.08104 [pdf, other]

doi 10.1088/1751-8121/aa9109

Dynamics of Embedded Curves by Doubly-Nonlocal Reaction-Diffusion Systems

Authors: James H. von Brecht, Ryan Blair

Abstract: We study a class of nonlocal, energy-driven dynamical models that govern the motion of closed, embedded curves from both an energetic and dynamical perspective. Our energetic results provide a variety of ways to understand physically motivated energetic models in terms of more classical, combinatorial measures of complexity for embedded curves. This line of investigation culminates in a family of… ▽ More We study a class of nonlocal, energy-driven dynamical models that govern the motion of closed, embedded curves from both an energetic and dynamical perspective. Our energetic results provide a variety of ways to understand physically motivated energetic models in terms of more classical, combinatorial measures of complexity for embedded curves. This line of investigation culminates in a family of complexity bounds that relate a rather broad class of models to a generalized, or weighted, variant of the crossing number. Our dynamic results include global well-posedness of the associated partial differential equations, regularity of equilibria for these flows as well as a more detailed investigation of dynamics near such equilibria. Finally, we explore a few global dynamical properties of these models numerically. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Comments: 49 pages, 3 figures

MSC Class: 35K57; 57M27

arXiv:1612.06212 [pdf, other]

A recurrent neural network without chaos

Authors: Thomas Laurent, James von Brecht

Abstract: We introduce an exceptionally simple gated recurrent neural network (RNN) that achieves performance comparable to well-known gated architectures, such as LSTMs and GRUs, on the word-level language modeling task. We prove that our model has simple, predicable and non-chaotic dynamics. This stands in stark contrast to more standard gated architectures, whose underlying dynamical systems exhibit chao… ▽ More We introduce an exceptionally simple gated recurrent neural network (RNN) that achieves performance comparable to well-known gated architectures, such as LSTMs and GRUs, on the word-level language modeling task. We prove that our model has simple, predicable and non-chaotic dynamics. This stands in stark contrast to more standard gated architectures, whose underlying dynamical systems exhibit chaotic behavior. △ Less

Submitted 19 December, 2016; originally announced December 2016.

arXiv:1602.04102 [pdf, other]

Estimating perimeter using graph cuts

Authors: Nicolás García Trillos, Dejan Slepčev, James von Brecht

Abstract: We investigate the estimation of the perimeter of a set by a graph cut of a random geometric graph. For $Ω\subset D = (0,1)^d$, with $d \geq 2$, we are given $n$ random i.i.d. points on $D$ whose membership in $Ω$ is known. We consider the sample as a random geometric graph with connection distance $\varepsilon>0$. We estimate the perimeter of $Ω$ (relative to $D$) by the, appropriately rescaled,… ▽ More We investigate the estimation of the perimeter of a set by a graph cut of a random geometric graph. For $Ω\subset D = (0,1)^d$, with $d \geq 2$, we are given $n$ random i.i.d. points on $D$ whose membership in $Ω$ is known. We consider the sample as a random geometric graph with connection distance $\varepsilon>0$. We estimate the perimeter of $Ω$ (relative to $D$) by the, appropriately rescaled, graph cut between the vertices in $Ω$ and the vertices in $D \backslash Ω$. We obtain bias and variance estimates on the error, which are optimal in scaling with respect to $n$ and $\varepsilon$. We consider two scaling regimes: the dense (when the average degree of the vertices goes to $\infty$) and the sparse one (when the degree goes to $0$). In the dense regime there is a crossover in the nature of approximation at dimension $d=5$: we show that in low dimensions $d=2,3,4$ one can obtain confidence intervals for the approximation error, while in higher dimensions one can only obtain error estimates for testing the hypothesis that the perimeter is less than a given number. △ Less

Submitted 14 August, 2016; v1 submitted 12 February, 2016; originally announced February 2016.

MSC Class: 60D05; 62G20; 68R10

arXiv:1506.05985 [pdf, ps, other]

Enhanced Lasso Recovery on Graph

Authors: Xavier Bresson, Thomas Laurent, James von Brecht

Abstract: This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperfo… ▽ More This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperforms the standard convex Lasso technique. We carry out numerical experiments on three benchmark graph datasets. △ Less

Submitted 19 June, 2015; originally announced June 2015.

arXiv:1411.6590 [pdf, other]

Consistency of Cheeger and Ratio Graph Cuts

Authors: Nicolas Garcia Trillos, Dejan Slepcev, James von Brecht, Thomas Laurent, Xavier Bresson

Abstract: This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We s… ▽ More This paper establishes the consistency of a family of graph-cut-based algorithms for clustering of data clouds. We consider point clouds obtained as samples of a ground-truth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We show that minimizers of the these cuts converge as the sample size increases to a minimizer of a corresponding continuum cut (which partitions the ground truth measure). Moreover, we obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the consistency to hold. We provide results for two-way and for multiway cuts. Furthermore we provide numerical experiments that illustrate the results and explore the optimality of scaling in dimension two. △ Less

Submitted 24 November, 2014; originally announced November 2014.

MSC Class: 62H30; 62G20; 49J55; 91C20; 68R10; 60D05

arXiv:1406.3837 [pdf, other]

An Incremental Reseeding Strategy for Clustering

Authors: Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James von Brecht

Abstract: In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves… ▽ More In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and METIS that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy. △ Less

Submitted 15 June, 2014; originally announced June 2014.

arXiv:1306.1185 [pdf, other]

Multiclass Total Variation Clustering

Authors: Xavier Bresson, Thomas Laurent, David Uminsky, James H. von Brecht

Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on… ▽ More Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. △ Less

Submitted 5 June, 2013; originally announced June 2013.

arXiv:1304.5459 [pdf, other]

Stability Analysis of Flock and Mill rings for 2nd Order Models in Swarming

Authors: G. Albi, D. Balagué, J. A. Carrillo, J. von Brecht

Abstract: We study the linear stability of flock and mill ring solutions of two individual based models for biological swarming. The individuals interact via a nonlocal interaction potential that is repulsive in the short range and attractive in the long range. We relate the instability of the flock rings with the instability of the ring solution of the first order model. We observe that repulsive-attractiv… ▽ More We study the linear stability of flock and mill ring solutions of two individual based models for biological swarming. The individuals interact via a nonlocal interaction potential that is repulsive in the short range and attractive in the long range. We relate the instability of the flock rings with the instability of the ring solution of the first order model. We observe that repulsive-attractive interactions lead to new configurations for the flock rings such as clustering and fattening formation. Finally, we numerically explore mill patterns arising from this kind of interactions together with the asymptotic speed of the system. △ Less

Submitted 19 April, 2013; originally announced April 2013.

arXiv:1302.2717 [pdf, ps, other]

An Adaptive Total Variation Algorithm for Computing the Balanced Cut of a Graph

Authors: Xavier Bresson, Thomas Laurent, David Uminsky, James H. von Brecht

Abstract: We propose an adaptive version of the total variation algorithm proposed in [3] for computing the balanced cut of a graph. The algorithm from [3] used a sequence of inner total variation minimizations to guarantee descent of the balanced cut energy as well as convergence of the algorithm. In practice the total variation minimization step is never solved exactly. Instead, an accuracy parameter is s… ▽ More We propose an adaptive version of the total variation algorithm proposed in [3] for computing the balanced cut of a graph. The algorithm from [3] used a sequence of inner total variation minimizations to guarantee descent of the balanced cut energy as well as convergence of the algorithm. In practice the total variation minimization step is never solved exactly. Instead, an accuracy parameter is specified and the total variation minimization terminates once this level of accuracy is reached. The choice of this parameter can vastly impact both the computational time of the overall algorithm as well as the accuracy of the result. Moreover, since the total variation minimization step is not solved exactly, the algorithm is not guarantied to be monotonic. In the present work we introduce a new adaptive stop** condition for the total variation minimization that guarantees monotonicity. This results in an algorithm that is actually monotonic in practice and is also significantly faster than previous, non-adaptive algorithms. △ Less

Submitted 12 February, 2013; originally announced February 2013.

arXiv:1204.6545 [pdf, ps, other]

Convergence of a Steepest Descent Algorithm for Ratio Cut Clustering

Authors: Xavier Bresson, Thomas Laurent, David Uminsky, James H. von Brecht

Abstract: Unsupervised clustering of scattered, noisy and high-dimensional data points is an important and difficult problem. Tight continuous relaxations of balanced cut problems have recently been shown to provide excellent clustering results. In this paper, we present an explicit-implicit gradient flow scheme for the relaxed ratio cut problem, and prove that the algorithm converges to a critical point of… ▽ More Unsupervised clustering of scattered, noisy and high-dimensional data points is an important and difficult problem. Tight continuous relaxations of balanced cut problems have recently been shown to provide excellent clustering results. In this paper, we present an explicit-implicit gradient flow scheme for the relaxed ratio cut problem, and prove that the algorithm converges to a critical point of the energy. We also show the efficiency of the proposed algorithm on the two moons dataset. △ Less

Submitted 29 April, 2012; originally announced April 2012.

Showing 1–15 of 15 results for author: von Brecht, J