-
Towards a theory of how the structure of language is acquired by deep neural networks
Authors:
Francesco Cagnetta,
Matthieu Wyart
Abstract:
How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model that captures the tree-like structure of natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a repre…
▽ More
How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model that captures the tree-like structure of natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a representation of the grammar's hidden variables, the longer the range the deeper the variable. In addition, a finite training set limits the resolution of correlations to an effective range, whose size grows with that of the training set. As a result, a Language Model trained with increasingly many examples can build a deeper representation of the grammar's structure, thus reaching good performance despite the high dimensionality of the problem. We conjecture that the relationship between training set size and effective range of correlations holds beyond our synthetic datasets. In particular, our conjecture predicts how the scaling law for the test loss behaviour with training set size depends on the length of the context window, which we confirm empirically for a collection of lines from Shakespeare's plays.
△ Less
Submitted 28 May, 2024;
originally announced June 2024.
-
Density and geometry of excitations in supercooled liquids up to the activation energy
Authors:
Wencheng Ji,
Massimo Pica Ciamarra,
Matthieu Wyart
Abstract:
We introduce an algorithm to uncover the activated particle rearrangements, or excitations, regulating structural relaxation in glasses at much higher energies than previously achieved. We use it to investigate the density and geometric properties of excitations in a model system. We find that the density of excitations behaves as a shifted power-law, and confirm that this shift accounts for the i…
▽ More
We introduce an algorithm to uncover the activated particle rearrangements, or excitations, regulating structural relaxation in glasses at much higher energies than previously achieved. We use it to investigate the density and geometric properties of excitations in a model system. We find that the density of excitations behaves as a shifted power-law, and confirm that this shift accounts for the increase in the activation energy controlling the relaxation dynamics. Remarkably, we find that excitations comprise a core whose properties, including the displacement of the particle moving the most, scale as a power-law of their activation energy and do not depend on temperature. Excitations also present an outer deformation field that depends on the material stability and, hence, on temperature. Our analysis suggests that while excitations suppress the transition of dynamical arrest predicted by mean-field theories, they are strongly influenced by it.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
Authors:
Umberto Tomasini,
Matthieu Wyart
Abstract:
Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invari…
▽ More
Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
△ Less
Submitted 2 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data
Authors:
Antonio Sclocchi,
Alessandro Favero,
Matthieu Wyart
Abstract:
Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underl…
▽ More
Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underlying structure. We study this phenomenon in a hierarchical generative model of data. We find that the backward diffusion process acting after a time $t$ is governed by a phase transition at some threshold time, where the probability of reconstructing high-level features, like the class of an image, suddenly drops. Instead, the reconstruction of low-level features, such as specific details of an image, evolves smoothly across the whole diffusion process. This result implies that at times beyond the transition, the class has changed but the generated sample may still be composed of low-level elements of the initial image. We validate these theoretical insights through numerical experiments on class-unconditional ImageNet diffusion models. Our analysis characterises the relationship between time and scale in diffusion models and puts forward generative models as powerful tools to model combinatorial data properties.
△ Less
Submitted 4 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Dynamical heterogeneities of thermal creep in pinned interfaces
Authors:
Tom W. J. de Geus,
Alberto Rosso,
Matthieu Wyart
Abstract:
Disordered systems under applied loading display slow creep flows at finite temperature, which can lead to the material rupture. Renormalization group arguments predicted that creep proceeds via thermal avalanches of activated events. Recently, thermal avalanches were argued to control the dynamics of liquids near their glass transition. Both theoretical approaches are markedly different. Here we…
▽ More
Disordered systems under applied loading display slow creep flows at finite temperature, which can lead to the material rupture. Renormalization group arguments predicted that creep proceeds via thermal avalanches of activated events. Recently, thermal avalanches were argued to control the dynamics of liquids near their glass transition. Both theoretical approaches are markedly different. Here we provide a scaling description that seeks to unify dynamical heterogeneities in both phenomena, confirm it in simple models of pinned elastic interfaces, and discuss its experimental implications.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Ductile-to-brittle transition and yielding in soft amorphous materials: perspectives and open questions
Authors:
Thibaut Divoux,
Elisabeth Agoritsas,
Stefano Aime,
Catherine Barentin,
Jean-Louis Barrat,
Roberto Benzi,
Ludovic Berthier,
Dapeng Bi,
Giulio Biroli,
Daniel Bonn,
Philippe Bourrianne,
Mehdi Bouzid,
Emanuela Del Gado,
Hélène Delanoë-Ayari,
Kasra Farain,
Suzanne Fielding,
Matthias Fuchs,
Jasper van der Gucht,
Silke Henkes,
Maziyar Jalaal,
Yogesh M. Joshi,
Anaël Lemaître,
Robert L. Leheny,
Sébastien Manneville,
Kirsten Martens
, et al. (15 additional authors not shown)
Abstract:
Soft amorphous materials are viscoelastic solids ubiquitously found around us, from clays and cementitious pastes to emulsions and physical gels encountered in food or biomedical engineering. Under an external deformation, these materials undergo a noteworthy transition from a solid to a liquid state that reshapes the material microstructure. This yielding transition was the main theme of a worksh…
▽ More
Soft amorphous materials are viscoelastic solids ubiquitously found around us, from clays and cementitious pastes to emulsions and physical gels encountered in food or biomedical engineering. Under an external deformation, these materials undergo a noteworthy transition from a solid to a liquid state that reshapes the material microstructure. This yielding transition was the main theme of a workshop held from January 9 to 13, 2023 at the Lorentz Center in Leiden. The manuscript presented here offers a critical perspective on the subject, synthesizing insights from the various brainstorming sessions and informal discussions that unfolded during this week of vibrant exchange of ideas. The result of these exchanges takes the form of a series of open questions that represent outstanding experimental, numerical, and theoretical challenges to be tackled in the near future.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
On the different regimes of Stochastic Gradient Descent
Authors:
Antonio Sclocchi,
Matthieu Wyart
Abstract:
Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $η$. For small $B$ and large $η$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the ''temperature'' $T\equiv η/B$. Yet this description is observed t…
▽ More
Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $η$. For small $B$ and large $η$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the ''temperature'' $T\equiv η/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here, we resolve these questions for a teacher-student perceptron classification model and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$η$ plane that separates three dynamical phases: (i) a noise-dominated SGD governed by temperature, (ii) a large-first-step-dominated SGD and (iii) GD. These different phases also correspond to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes (i) and (ii) scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
△ Less
Submitted 27 February, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Testing theories of the glass transition with the same liquid, but many kinetic rules
Authors:
Cristina Gavazzoni,
Carolina Brito,
Matthieu Wyart
Abstract:
We study the glass transition by exploring a broad class of kinetic rules that can significantly modify the normal dynamics of super-cooled liquids, while maintaining thermal equilibrium. Beyond the usual dynamics of liquids, this class includes dynamics in which a fraction $(1-f_R)$ of the particles can perform pairwise exchange or 'swap moves', while a fraction $f_P$ of the particles can only mo…
▽ More
We study the glass transition by exploring a broad class of kinetic rules that can significantly modify the normal dynamics of super-cooled liquids, while maintaining thermal equilibrium. Beyond the usual dynamics of liquids, this class includes dynamics in which a fraction $(1-f_R)$ of the particles can perform pairwise exchange or 'swap moves', while a fraction $f_P$ of the particles can only move along restricted directions. We find that (i) the location of the glass transition varies greatly but smoothly as $f_P$ and $f_R$ change and (ii) it is governed by a linear combination of $f_P$ and $f_R$. (iii) Dynamical heterogeneities are not governed by the static structure of the material. Instead, they are similar at the glass transition across the ($f_R$, $f_P$) diagram. These observations are negative items for some existing theories of the glass transition, particularly those reliant on growing thermodynamic order or locally favored structure, and open new avenues to test other approaches.
△ Less
Submitted 27 February, 2024; v1 submitted 31 July, 2023;
originally announced August 2023.
-
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
Authors:
Francesco Cagnetta,
Leonardo Petrini,
Umberto M. Tomasini,
Alessandro Favero,
Matthieu Wyart
Abstract:
Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we in…
▽ More
Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images. The model is a classification task where each class corresponds to a group of high-level features, chosen among several equivalent groups associated with the same class. In turn, each feature corresponds to a group of sub-features chosen among several equivalent ones and so on, following a hierarchy of composition rules. We find that deep networks learn the task by develo** internal representations invariant to exchanging equivalent groups. Moreover, the number of data required corresponds to the point where correlations between low-level features and classes become detectable. Overall, our results indicate how deep networks overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a hierarchical task.
△ Less
Submitted 3 July, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Scaling Description of Dynamical Heterogeneity and Avalanches of Relaxation in Glass-Forming Liquids
Authors:
Ali Tahaei,
Giulio Biroli,
Misaki Ozawa,
Marko Popović,
Matthieu Wyart
Abstract:
We provide a theoretical description of dynamical heterogeneities in glass-forming liquids, based on the premise that relaxation occurs via local rearrangements coupled by elasticity. In our framework, the growth of the dynamical correlation length $ξ$ and of the correlation volume $χ_4$ are controlled by a zero-temperature fixed point. We connect this critical behavior to the properties of the di…
▽ More
We provide a theoretical description of dynamical heterogeneities in glass-forming liquids, based on the premise that relaxation occurs via local rearrangements coupled by elasticity. In our framework, the growth of the dynamical correlation length $ξ$ and of the correlation volume $χ_4$ are controlled by a zero-temperature fixed point. We connect this critical behavior to the properties of the distribution of local energy barriers at zero temperature. Our description makes a direct connection between dynamical heterogeneities and avalanche-type relaxation associated to dynamic facilitation, allowing us to relate the size distribution of heterogeneities to their time evolution. Within an avalanche, a local region relaxes multiple times, the more the larger is the avalanche. This property, related to the nature of the zero-temperature fixed point, directly leads to decoupling of particle diffusion and relaxation time (the so-called Stokes-Einstein violation). Our most salient predictions are tested and confirmed by numerical simulations of scalar and tensorial thermal elasto-plastic models.
△ Less
Submitted 3 August, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Local vs. Cooperative: Unraveling Glass Transition Mechanisms with SEER
Authors:
Massimo Pica Ciamarra,
Wencheng Ji,
Matthieu Wyart
Abstract:
Which phenomenon slows down the dynamics in super-cooled liquids and turns them into glasses is a long-standing question of condensed-matter. Most popular theories posit that as the temperature decreases, many events must occur in a coordinated fashion on a growing length scale for relaxation to occur. Instead, other approaches consider that local barriers associated with the elementary rearrangem…
▽ More
Which phenomenon slows down the dynamics in super-cooled liquids and turns them into glasses is a long-standing question of condensed-matter. Most popular theories posit that as the temperature decreases, many events must occur in a coordinated fashion on a growing length scale for relaxation to occur. Instead, other approaches consider that local barriers associated with the elementary rearrangement of a few particles or `excitations' govern the dynamics. To resolve this conundrum, our central result is to introduce an algorithm, SEER, which can systematically extract hundreds of excitations and their energy from any given configuration. We also provide a novel measurement of the activation energy, characterizing the liquid dynamics, based on fast quenching and reheating. We use these two methods in a popular liquid model of polydisperse particles. Such polydisperse models are known to capture the hallmarks of the glass transition and can be equilibrated efficiently up to millisecond time scales. The analysis reveals that cooperative effects do not control the fragility of such liquids: the change of energy of local barriers determines the change of activation energy. More generally, these methods can now be used to measure the degree of cooperativity of any liquid model.
△ Less
Submitted 21 March, 2024; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Armouring of a frictional interface by mechanical noise
Authors:
Elisa El Sergany,
Matthieu Wyart,
Tom W. J. de Geus
Abstract:
A dry frictional interface loaded in shear often displays stick-slip. The amplitude of this cycle depends on the probability that a slip event nucleates into a rupture, and on the rate at which slip events are triggered. This rate is determined by the distribution $P(x)$ of soft spots which yields if the shear stress is increased by some amount $x$. In minimal models of a frictional interface that…
▽ More
A dry frictional interface loaded in shear often displays stick-slip. The amplitude of this cycle depends on the probability that a slip event nucleates into a rupture, and on the rate at which slip events are triggered. This rate is determined by the distribution $P(x)$ of soft spots which yields if the shear stress is increased by some amount $x$. In minimal models of a frictional interface that include disorder, inertia and long-range elasticity, we discovered an 'armouring' mechanism, by which the interface is greatly stabilised after a large slip event: $P(x)$ then vanishes at small arguments, as $P(x)\sim x^θ$ [1]. The exponent $θ>0$, which exists only in the presence of inertia (otherwise $θ=0$), was found to depend on the statistics of the disorder in the model, a phenomenon that was not explained. Here, we show that a single-particle toy model with inertia and disorder captures the existence of a non-trivial exponent $θ>0$, which we can analytically relate to the statistics of the disorder.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Authors:
Antonio Sclocchi,
Mario Geiger,
Matthieu Wyart
Abstract:
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a k…
▽ More
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a key parameter that controls if the network is `lazy'($α\gg1$) or instead learns features ($α\ll1$). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the $(α,T)$ plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing $T$ or decreasing $α$ both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that the characteristic temperature $T_c$ where the noise of SGD starts affecting the trained model (and eventually performance) is a power law of $P$. We relate this finding with the observation that key dynamical quantities, such as the total variation of weights during training, depend on both $T$ and $P$ as power laws. These results indicate that a key effect of SGD noise occurs late in training by affecting the stop** process whereby all data are fitted. Indeed, we argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. A stronger signal and a longer training time are also required when the size of the training set $P$ increases. We confirm these views in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.
△ Less
Submitted 30 May, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
How deep convolutional neural networks lose spatial information with training
Authors:
Umberto M. Tomasini,
Leonardo Petrini,
Francesco Cagnetta,
Matthieu Wyart
Abstract:
A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitiv…
▽ More
A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the net. This loss of sensitivity correlates with performance, and surprisingly correlates with a gain of sensitivity to white noise acquired during training. These facts are unexplained, and as we demonstrate still hold when white noise is added to the images of the training set. Here, we (i) show empirically for various architectures that stability to image diffeomorphisms is achieved by both spatial and channel pooling, (ii) introduce a model scale-detection task which reproduces our empirical observations on spatial pooling and (iii) compute analitically how the sensitivity to diffeomorphisms and noise scales with depth due to spatial pooling. The scalings are found to depend on the presence of strides in the net architecture. We find that the increased sensitivity to noise is due to the perturbing noise piling up during pooling, after being rectified by ReLU units.
△ Less
Submitted 23 November, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Avalanches and deformation in glasses and disordered systems
Authors:
Alberto Rosso,
James P. Sethna,
Matthieu Wyart
Abstract:
In this chapter, we discuss avalanches in glasses and disordered systems, and the macroscopic dynamical behavior that they mediate. We briefly review three classes of systems where avalanches are observed: depinning transition of disordered interfaces, yielding of amorphous materials, and the jamming transition. Without extensive formalism, we discuss results gleaned from theoretical approaches --…
▽ More
In this chapter, we discuss avalanches in glasses and disordered systems, and the macroscopic dynamical behavior that they mediate. We briefly review three classes of systems where avalanches are observed: depinning transition of disordered interfaces, yielding of amorphous materials, and the jamming transition. Without extensive formalism, we discuss results gleaned from theoretical approaches -- mean-field theory, scaling and exponent relations, the renormalization group, and a few results from replica theory. We focus both on the remarkably sophisticated physics of avalanches and on relatively new approaches to the macroscopic flow behavior exhibited past the depinning/yielding transition.
△ Less
Submitted 1 September, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
What Can Be Learnt With Wide Convolutional Neural Networks?
Authors:
Francesco Cagnetta,
Alessandro Favero,
Matthieu Wyart
Abstract:
Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the nu…
▽ More
Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the number of training samples. In this paper, we study infinitely-wide deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the target function depends on the full set of input variables, then the error decay is controlled by the input dimension. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that, despite their hierarchical structure, the functions generated by infinitely-wide deep CNNs are too rich to be efficiently learnable in high dimension.
△ Less
Submitted 31 May, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Learning sparse features can lead to overfitting in neural networks
Authors:
Leonardo Petrini,
Francesco Cagnetta,
Eric Vanden-Eijnden,
Matthieu Wyart
Abstract:
It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, it is beneficial for modern architectures trained to classify images, whereas it is detrimental for fully-connected networks trained for the same t…
▽ More
It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, it is beneficial for modern architectures trained to classify images, whereas it is detrimental for fully-connected networks trained for the same task on the same data. Here we propose an explanation for this puzzle, by showing that feature learning can perform worse than lazy training (via random feature kernel or the NTK) as the former can lead to a sparser neural representation. Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth along certain directions of input space. We illustrate this phenomenon in two settings: (i) regression of Gaussian random functions on the d-dimensional unit sphere and (ii) classification of benchmark datasets of images. For (i), we compute the scaling of the generalization error with number of training points, and show that methods that do not learn features generalize better, even when the dimension of the input space is large. For (ii), we show empirically that learning features can indeed lead to sparse and thereby less smooth representations of the image predictors. This fact is plausibly responsible for deteriorating the performance, which is known to be correlated with smoothness along diffeomorphisms.
△ Less
Submitted 12 October, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Scaling theory for the statistics of slip at frictional interfaces
Authors:
Tom W. J. de Geus,
Matthieu Wyart
Abstract:
Slip at a frictional interface occurs via intermittent events. Understanding how these events are nucleated, can propagate, or stop spontaneously remains a challenge, central to earthquake science and tribology. In the absence of disorder, rate-and-state approaches predict a diverging nucleation length at some stress $σ^*$, beyond which cracks can propagate. Here we argue for a flat interface that…
▽ More
Slip at a frictional interface occurs via intermittent events. Understanding how these events are nucleated, can propagate, or stop spontaneously remains a challenge, central to earthquake science and tribology. In the absence of disorder, rate-and-state approaches predict a diverging nucleation length at some stress $σ^*$, beyond which cracks can propagate. Here we argue for a flat interface that disorder is a relevant perturbation to this description. We justify why the distribution of slip contains two parts: a powerlaw corresponding to `avalanches', and a `narrow' distribution of system-spanning `fracture' events. We derive novel scaling relations for avalanches, including a relation between the stress drop and the spatial extension of a slip event. We compute the cut-off length beyond which avalanches cannot be stopped by disorder, leading to a system-spanning fracture, and successfully test these predictions in a minimal model of frictional interfaces.
△ Less
Submitted 8 October, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data
Authors:
Umberto M. Tomasini,
Antonio Sclocchi,
Matthieu Wyart
Abstract:
Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a `spectral bias': decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This predicti…
▽ More
Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a `spectral bias': decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This prediction works very well on benchmark data sets such as images, yet the assumptions these approaches make on the data are never satisfied in practice. To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. Our predictions include the classification case $f(x)=$sign$(x_1)$ with a data distribution that vanishes at the decision boundary $p(x)\sim x_1^χ$. For $χ>0$ and a Laplace kernel, we find that (i) there exists a cross-over ridge $λ^*_{d,χ}(P)\sim P^{-\frac{1}{d+χ}}$ such that for $λ\gg λ^*_{d,χ}(P)$, the replica method applies, but not for $λ\llλ^*_{d,χ}(P)$, (ii) in the ridge-less case, spectral bias predicts the correct training curve exponent only in the limit $d\rightarrow\infty$.
△ Less
Submitted 16 February, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Scaling description of creep flow in amorphous solids
Authors:
Marko Popović,
Tom W. J. de Geus,
Wencheng Ji,
Alberto Rosso,
Matthieu Wyart
Abstract:
Amorphous solids such as coffee foam, toothpaste or mayonnaise display a transient creep flow when a stress $Σ$ is suddenly imposed. The associated strain rate is commonly found to decay in time as $\dotγ \sim t^{-ν}$, followed either by arrest or by a sudden fluidisation. Various empirical laws have been suggested for the creep exponent $ν$ and fluidisation time $τ_f$ in experimental and numerica…
▽ More
Amorphous solids such as coffee foam, toothpaste or mayonnaise display a transient creep flow when a stress $Σ$ is suddenly imposed. The associated strain rate is commonly found to decay in time as $\dotγ \sim t^{-ν}$, followed either by arrest or by a sudden fluidisation. Various empirical laws have been suggested for the creep exponent $ν$ and fluidisation time $τ_f$ in experimental and numerical studies. Here, we postulate that plastic flow is governed by the difference between $Σ$ and the transient yield stress $Σ_t(γ)$ that characterises the stability of configurations visited by the system at strain $γ$. Assuming the analyticity of $Σ_t(γ)$ allows us to predict $ν$ and asymptotic behaviours of $τ_f$ in terms of properties of stationary flows. We test successfully our predictions using elastoplastic models and published experimental results.
△ Less
Submitted 6 October, 2022; v1 submitted 7 November, 2021;
originally announced November 2021.
-
Mean-field description for the architecture of low-energy excitations in glasses
Authors:
Wencheng Ji,
Tom W. J. de Geus,
Elisabeth Agoritsas,
Matthieu Wyart
Abstract:
In amorphous materials, groups of particles can rearrange locally into a new stable configuration. Such elementary excitations are key as they determine the response to external stresses, as well as to thermal and quantum fluctuations. Yet, understanding what controls their geometry remains a challenge. Here we build a scaling description of the geometry and energy of low-energy excitations in ter…
▽ More
In amorphous materials, groups of particles can rearrange locally into a new stable configuration. Such elementary excitations are key as they determine the response to external stresses, as well as to thermal and quantum fluctuations. Yet, understanding what controls their geometry remains a challenge. Here we build a scaling description of the geometry and energy of low-energy excitations in terms of the distance to an instability, as predicted for instance at the dynamical transition in mean field approaches of supercooled liquids. We successfully test our predictions in ultrastable computer glasses, with a gapped and ungapped (regular) spectrum. Overall, our approach explains why excitations become less extended, with a higher energy and displacement scale upon cooling.
△ Less
Submitted 24 April, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem
Authors:
Mario Geiger,
Christophe Eloy,
Matthieu Wyart
Abstract:
Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance…
▽ More
Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in a two-hypothesis testing problem, akin to the two-arm bandit problem. We compare two extreme cases: (i) the random access memory where any transitions between $M$ memory states are allowed and (ii) a fixed memory where the agent can access its last $m$ actions and rewards. For (i), the probability $q$ to play the worst arm is known to be exponentially small in $M$ for the optimal policy. Our main result is to show that similar performance can be reached for (ii) as well, despite the simplicity of the memory architecture: using a conjecture on Gray-ordered binary necklaces, we find policies for which $q$ is exponentially small in $2^m$, i.e. $q\simα^{2^m}$ with $α< 1$. In addition, we observe empirically that training from random initialization leads to very poor results for (i), and significantly better results for (ii) thanks to the constraints on the memory architecture.
△ Less
Submitted 18 November, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
Authors:
Alessandro Favero,
Francesco Cagnetta,
Matthieu Wyart
Abstract:
Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using…
▽ More
Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent $β$ (that relates the test error $ε_t\sim P^{-β}$ to the size of the training set $P$), whereas translational invariance is not. In particular, if the filter size of the teacher $t$ is smaller than that of the student $s$, $β$ is a function of $s$ only and does not depend on the input dimension. We confirm our predictions on $β$ empirically. We conclude by proving, using a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
△ Less
Submitted 12 November, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Relative stability toward diffeomorphisms indicates performance in deep nets
Authors:
Leonardo Petrini,
Alessandro Favero,
Mario Geiger,
Matthieu Wyart
Abstract:
Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm t…
▽ More
Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $ε_t$. It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures, we find $ε_t\approx 0.2\sqrt{R_f}$, suggesting that obtaining a small $R_f$ is important to achieve good performance. We study how $R_f$ depends on the size of the training set and compare it to a simple model of invariant learning.
△ Less
Submitted 4 November, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Non-local effects reflect the jamming criticality in granular flows of frictionless particles
Authors:
Hugo Perrin,
Matthieu Wyart,
Bloen Metzger,
Yoël Forterre
Abstract:
The jamming transition is accompanied by a rich phenomenology, such as hysteresis or non-local effects, which is still not well understood. Here we experimentally investigate a model frictionless granular layer flowing down an inclined plane, as a way to disentangle generic collective effects from those arising from frictional interactions. We find that thin frictionless granular layers are devoid…
▽ More
The jamming transition is accompanied by a rich phenomenology, such as hysteresis or non-local effects, which is still not well understood. Here we experimentally investigate a model frictionless granular layer flowing down an inclined plane, as a way to disentangle generic collective effects from those arising from frictional interactions. We find that thin frictionless granular layers are devoid of hysteresis, yet the layer stability is increased as it gets thinner. Rheological laws obtained for different layer thicknesses can be collapsed into a unique master curve, supporting that non-local effects are the consequence of the usual finite-size effects associated to the presence of a critical point. This collapse indicates that the so-called isostatic length $l^*$ governs the effect of boundaries on flow, and rules out other propositions made in the past.
△ Less
Submitted 29 January, 2021; v1 submitted 5 January, 2021;
originally announced January 2021.
-
Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training
Authors:
Mario Geiger,
Leonardo Petrini,
Matthieu Wyart
Abstract:
Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure,…
▽ More
Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,α)$ plane where $h$ is the network width and $α$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Thermally activated flow in models of amorphous solids
Authors:
Marko Popović,
Tom W. J. de Geus,
Wencheng Ji,
Matthieu Wyart
Abstract:
Amorphous solids yield at a critical value $Σ_c$ of the imposed stress $Σ$ through a dynamical phase transition. While sharp in athermal systems, the presence of thermal fluctuations leads to the rounding of the transition and thermally activated flow even below $Σ_c$. Here, we study the steady state thermal flow of amorphous solids using a mesoscopic elasto-plastic model. In the Hebraud-Lequex (H…
▽ More
Amorphous solids yield at a critical value $Σ_c$ of the imposed stress $Σ$ through a dynamical phase transition. While sharp in athermal systems, the presence of thermal fluctuations leads to the rounding of the transition and thermally activated flow even below $Σ_c$. Here, we study the steady state thermal flow of amorphous solids using a mesoscopic elasto-plastic model. In the Hebraud-Lequex (HL) model we provide an analytical solution of the thermally activated flow at low temperature. We then propose a general scaling law that also describes the transition rounding. Finally, we find that the scaling law holds in numerical simulations of the HL model, a 2D elasto-plastic model, and in previously published molecular dynamics simulations of 2D Lennard-Jones glass.
△ Less
Submitted 22 September, 2020; v1 submitted 10 September, 2020;
originally announced September 2020.
-
Geometric compression of invariant manifolds in neural nets
Authors:
Jonas Paccolat,
Leonardo Petrini,
Mario Geiger,
Kevin Tyloo,
Matthieu Wyart
Abstract:
We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insens…
▽ More
We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the $d_\perp=d-d_\parallel$ uninformative directions. These are effectively compressed by a factor $λ\sim \sqrt{p}$, where $p$ is the size of the training set. We quantify the benefit of such a compression on the test error $ε$. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that $ε\sim p^{-β}$, with $β_\text{Lazy} = d / (3d-2)$. Compression improves the learning curves so that $β_\text{Feature} = (2d-1)/(3d-2)$ if $d_\parallel = 1$ and $β_\text{Feature} = (d + d_\perp/2)/(3d-2)$ if $d_\parallel > 1$. We test these predictions for a stripe model where boundaries are parallel interfaces ($d_\parallel=1$) as well as for a cylindrical boundary ($d_\parallel=2$). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST, for which we also find $β_\text{Feature}>β_\text{Lazy}$.
△ Less
Submitted 11 March, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
How isotropic kernels perform on simple invariants
Authors:
Jonas Paccolat,
Stefano Spigler,
Matthieu Wyart
Abstract:
We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of…
▽ More
We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of the training set. We find that $β\sim 1/d$ independently of $d_\parallel$, supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the stripe model where the data label depends on a single coordinate $y(\underline{x}) = y(x_1)$, corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, $β= \frac{d-1+ξ}{3d-3+ξ}$, where $ξ\in (0,2)$ is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since $β\rightarrow 1 / 3$ as $d\rightarrow\infty$. (iii) We confirm these findings for the spherical model for which $y(\underline{x}) = y(|\underline{x}|)$. (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor $λ$ (an operation believed to take place in deep networks), the test error is reduced by a factor $λ^{-\frac{2(d-1)}{3d-3+ξ}}$.
△ Less
Submitted 14 December, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Inferring the flow properties of epithelial tissues from their geometry
Authors:
Marko Popović,
Valentin Druelle,
Natalie A. Dye,
Frank Jülicher,
Matthieu Wyart
Abstract:
Amorphous materials exhibit complex material proprteties with strongly nonlinear behaviors. Below a yield stress they behave as plastic solids, while they start to yield above a critical stress $Σ_c$. A key quantity controlling plasticity which is, however, hard to measure is the density $P(x)$ of weak spots, where $x$ is the additional stress required for local plastic failure. In the thermodynam…
▽ More
Amorphous materials exhibit complex material proprteties with strongly nonlinear behaviors. Below a yield stress they behave as plastic solids, while they start to yield above a critical stress $Σ_c$. A key quantity controlling plasticity which is, however, hard to measure is the density $P(x)$ of weak spots, where $x$ is the additional stress required for local plastic failure. In the thermodynamic limit $P(x)\sim x^θ$ is singular at $x= 0$ in the solid phase below the yield stress $Σ_c$. This singularity is related to the presence of system spannig avalanches of plastic events. Here we address the question if the density of weak spots and the flow properties of a material can be determined from the geometry of an amporphous structure alone. We show that a vertex model for cell packings in tissues exhibits the phenomenology of plastic amorphous systems. As the yield stress is approached from above, the strain rate vanishes and the avalanches size $S$ and their duration $τ$ diverge. We then show that in general, in materials where the energy functional depend on topology, the value $x$ is proportional to the length $L$ of a bond that vanishes in a plastic event. For this class of models $P(x)$ is therefore readily measurable from geometry alone. Applying this approach to a quantification of the cell packing geometry in the develo** wing epithelium of the fruit fly, we find that in this tissue $P(L)$ exhibits a power law with exponents similar to those found numerically for a vertex model in its solid phase. This suggests that this tissue exhibits plasticity and non-linear material properties that emerge from collective cell behaviors and that these material properties govern developmental processes. Our approach based on the relation between topology and energetics suggests a new route to outstanding questions associated with the yielding transition.
△ Less
Submitted 5 May, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Jamming with tunable roughness
Authors:
Harukuni Ikeda,
Carolina Brito,
Matthieu Wyart,
Francesco Zamponi
Abstract:
We introduce a new model to study the effect of surface roughness on the jamming transition. By performing numerical simulations, we show that for a smooth surface, the jamming transition density and the contact number at the transition point both increase upon increasing asphericity, as for ellipsoids and spherocylinders. Conversely, for a rough surface, both quantities decrease, in quantitative…
▽ More
We introduce a new model to study the effect of surface roughness on the jamming transition. By performing numerical simulations, we show that for a smooth surface, the jamming transition density and the contact number at the transition point both increase upon increasing asphericity, as for ellipsoids and spherocylinders. Conversely, for a rough surface, both quantities decrease, in quantitative agreement with the behavior of frictional particles. Furthermore, in the limit corresponding to the Coulomb friction law, the model satisfies a generalized isostaticity criterion proposed in previous studies. We introduce a counting argument that justifies this criterion and interprets it geometrically. Finally, we propose a simple theory to predict the contact number at finite friction from the knowledge of the force distribution in the infinite friction limit.
△ Less
Submitted 18 May, 2020; v1 submitted 27 January, 2020;
originally announced January 2020.
-
Thermal origin of quasi-localised excitations in glasses
Authors:
Wencheng Ji,
Tom W. J. de Geus,
Marko Popović,
Elisabeth Agoritsas,
Matthieu Wyart
Abstract:
Key aspects of glasses are controlled by the presence of excitations in which a group of particles can rearrange. Surprisingly, recent observations indicate that their density is dramatically reduced and their size decreases as the temperature of the supercooled liquid is lowered. Some theories predict these excitations to cause a gap in the spectrum of quasi-localised modes of the Hessian that gr…
▽ More
Key aspects of glasses are controlled by the presence of excitations in which a group of particles can rearrange. Surprisingly, recent observations indicate that their density is dramatically reduced and their size decreases as the temperature of the supercooled liquid is lowered. Some theories predict these excitations to cause a gap in the spectrum of quasi-localised modes of the Hessian that grows upon cooling, while others predict a pseudo-gap ${D_L(ω)} \sim ω^α$. To unify these views and observations, we generate glassy configurations of controlled gap magnitude $ω_c$ at temperature ${T=0}$, using so-called `breathing' particles, and study how such gapped states respond to thermal fluctuations. We find that \textit{(i)}~the gap always fills up at finite $T$ with ${D_L(ω) \approx A_4(T) \, ω^4}$ and ${A_4 \sim \exp(-E_a / T)}$ at low $T$, \textit{(ii)}~$E_a$ rapidly grows with $ω_c$, in reasonable agreement with a simple scaling prediction ${E_a\sim ω_c^4}$ and \textit{(iii)}~at larger $ω_c$ excitations involve fewer particles, as we rationalise, and eventually become string-like. We propose an interpretation of mean-field theories of the glass transition, in which the modes beyond the gap act as an excitation reservoir, from which a pseudo-gap distribution is populated with its magnitude rapidly decreasing at lower $T$. We discuss how this picture unifies the rarefaction as well as the decreasing size of excitations upon cooling, together with a string-like relaxation occurring near the glass transition.
△ Less
Submitted 20 December, 2020; v1 submitted 22 December, 2019;
originally announced December 2019.
-
Infinitesimal asphericity changes the universality of the jamming transition
Authors:
Harukuni Ikeda,
Carolina Brito,
Matthieu Wyart
Abstract:
The jamming transition of non-spherical particles is fundamentally different from the spherical case. Non-spherical particles are hypostatic at their jamming points, while isostaticity is ensured in the case of the jamming of spherical particles. This structural difference implies that the presence of asphericity affects the critical exponents related to the contact number and the vibrational dens…
▽ More
The jamming transition of non-spherical particles is fundamentally different from the spherical case. Non-spherical particles are hypostatic at their jamming points, while isostaticity is ensured in the case of the jamming of spherical particles. This structural difference implies that the presence of asphericity affects the critical exponents related to the contact number and the vibrational density of states. Moreover, while the force and gap distributions of isostatic jamming present power-law behaviors, even an infinitesimal asphericity is enough to smooth out these singularities. In a recent work [PNAS 115(46), 11736], we have used a combination of marginal stability arguments and the replica method to explain these observations. We argued that systems with internal degrees of freedom, like the rotations in ellipsoids, or the variation of the radii in the case of the \textit{breathing} particles fall in the same universality class. In this paper, we review comprehensively the results about the jamming with internal degrees of freedom in addition to the translational degrees of freedom. We use a variational argument to derive the critical exponents of the contact number, shear modulus, and the characteristic frequencies of the density of states. Moreover, we present additional numerical data supporting the theoretical results, which were not shown in the previous work.
△ Less
Submitted 12 November, 2019; v1 submitted 6 August, 2019;
originally announced August 2019.
-
Disentangling feature and lazy training in deep neural networks
Authors:
Mario Geiger,
Stefano Spigler,
Arthur Jacot,
Matthieu Wyart
Abstract:
Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet…
▽ More
Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as $αh^{-1/2}$ at initialization. By varying $α$ and $h$, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an $α^*$ that scales as $h^{-1/2}$. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations $δF$ induced on the learned function by initial conditions decay as $δF\sim 1/\sqrt{h}$, leading to a performance that increases with $h$. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale $t_1\sim\sqrt{h}α$, such that for $t\ll t_1$ the dynamics is linear.
△ Less
Submitted 4 October, 2020; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Mechanics of allostery: contrasting the induced fit and population shift scenarios
Authors:
Riccardo Ravasio,
Solange Flatt,
Le Yan,
Stefano Zamuner,
Carolina Brito,
Matthieu Wyart
Abstract:
In allosteric proteins, binding a ligand can affect function at a distant location, for example by changing the binding affinity of a substrate at the active site. The induced fit and population shift models, which differ by the assumed number of stable configurations, explain such cooperative binding from a thermodynamic viewpoint. Yet, understanding what mechanical principles constrain these mod…
▽ More
In allosteric proteins, binding a ligand can affect function at a distant location, for example by changing the binding affinity of a substrate at the active site. The induced fit and population shift models, which differ by the assumed number of stable configurations, explain such cooperative binding from a thermodynamic viewpoint. Yet, understanding what mechanical principles constrain these models remains a challenge. Here we provide an empirical study on 34 proteins supporting the idea that allosteric conformational change generally occurs along a soft elastic mode presenting extended regions of high shear. We argue, based on a detailed analysis of how the energy profile along such a mode depends on binding, that in the induced fit scenario there is an optimal stiffness $k_a^*\sim 1/N$ for cooperative binding, where $N$ is the number of residues involved in the allosteric response. We find that the population shift scenario is more robust to mutation affecting stiffness, as binding becomes more and more cooperative with stiffness up to the same characteristic value $k_a^*$, beyond which cooperativity saturates instead of decaying. We confirm numerically these findings in a non-linear mechanical model. Dynamical considerations suggest that a stiffness of order $k_a^*$ is favorable in that scenario as well, supporting that for proper function proteins must evolve a functional elastic mode that is softer as their size increases. In consistency with this view, we find a significant anticorrelation between the stiffness of the allosteric response and protein size in our data set.
△ Less
Submitted 25 October, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
Authors:
Stefano Spigler,
Mario Geiger,
Matthieu Wyart
Abstract:
How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression…
▽ More
How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically $β$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $β$ depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than $n$. Using this idea we predict relate the exponent $β$ to an exponent $a$ describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract $a$ from real data by performing kernel PCA, leading to $β\approx0.36$ for MNIST and $β\approx0.07$ for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.
△ Less
Submitted 18 August, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
How collective asperity detachments nucleate slip at frictional interfaces
Authors:
Tom W. J. de Geus,
Marko Popović,
Wencheng Ji,
Alberto Rosso,
Matthieu Wyart
Abstract:
Sliding at a quasi-statically loaded frictional interface can occur via macroscopic slip events, which nucleate locally before propagating as rupture fronts very similar to fracture. We introduce a novel microscopic model of a frictional interface that includes asperity-level disorder, elastic interaction between local slip events, and inertia. For a perfectly flat and homogeneously loaded interfa…
▽ More
Sliding at a quasi-statically loaded frictional interface can occur via macroscopic slip events, which nucleate locally before propagating as rupture fronts very similar to fracture. We introduce a novel microscopic model of a frictional interface that includes asperity-level disorder, elastic interaction between local slip events, and inertia. For a perfectly flat and homogeneously loaded interface, we find that slip is nucleated by avalanches of asperity detachments of extension larger than a critical radius $A_c$ governed by a Griffith criterion. We find that after slip, the density of asperities at a local distance to yielding $x_σ$ presents a pseudo-gap $P(x_σ) \sim (x_σ)^θ$, where $θ$ is a non-universal exponent that depends on the statistics of the disorder. This result makes a link between friction and the plasticity of amorphous materials where a pseudo-gap is also present. For friction, we find that a consequence is that stick-slip is an extremely slowly decaying finite size effect, while the slip nucleation radius $A_c$ diverges as a $θ$-dependent power law of the system size. We discuss how these predictions can be tested experimentally.
△ Less
Submitted 20 December, 2019; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Interparticle friction leads to non-monotonic flow curves and hysteresis in viscous suspensions
Authors:
Hugo Perrin,
Cécile Clavaud,
Matthieu Wyart,
Bloen Metzger,
Yoël Forterre
Abstract:
Hysteresis is a major feature of the solid-liquid transition in granular materials. This property, by allowing metastable states, can potentially yield catastrophic phenomena such as earthquakes or aerial landslides. The origin of hysteresis in granular flows is still debated. However, most mechanisms put forward so far rely on the presence of inertia at the particle level. In this paper, we study…
▽ More
Hysteresis is a major feature of the solid-liquid transition in granular materials. This property, by allowing metastable states, can potentially yield catastrophic phenomena such as earthquakes or aerial landslides. The origin of hysteresis in granular flows is still debated. However, most mechanisms put forward so far rely on the presence of inertia at the particle level. In this paper, we study the avalanche dynamics of non-Brownian suspensions in slowly rotating drums and reveal large hysteresis of the avalanche angle even in the absence of inertia. By using micro-silica particles whose interparticle friction coefficient can be turned off, we show that microscopic friction, conversely to inertia, is key to triggering hysteresis in granular suspensions. To understand this link between friction and hysteresis, we use the rotating drum as a rheometer to extract the suspension rheology close to the flow onset for both frictional and frictionless suspensions. This analysis shows that the flow rule for frictionless particles is monotonous and follows a power law of exponent $α\!= \! 0.37 \pm 0.05$, in close agreement with the previous theoretical prediction, $α\!=\! 0.35$. By contrast, the flow rule for frictional particles suggests a velocity-weakening behavior, thereby explaining the flow instability and the emergence of hysteresis. These findings show that hysteresis can also occur in particulate media without inertia, questioning the intimate nature of this phenomenon. By highlighting the role of microscopic friction, our results may be of interest in the geophysical context to understand the failure mechanism at the origin of undersea landslides.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Scaling description of generalization with number of parameters in deep learning
Authors:
Mario Geiger,
Arthur Jacot,
Stefano Spigler,
Franck Gabriel,
Levent Sagun,
Stéphane d'Ascoli,
Giulio Biroli,
Clément Hongler,
Matthieu Wyart
Abstract:
Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over…
▽ More
Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with $N$. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations $\|f_{N}-\bar{f}_{N}\|\sim N^{-1/4}$ of the neural net output function $f_{N}$ around its expectation $\bar{f}_{N}$. These affect the generalization error $ε_{N}$ for classification: under natural assumptions, it decays to a plateau value $ε_{\infty}$ in a power-law fashion $\sim N^{-1/2}$. This description breaks down at a so-called jamming transition $N=N^{*}$. At this threshold, we argue that $\|f_{N}\|$ diverges. This result leads to a plausible explanation for the cusp in test error known to occur at $N^{*}$. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond $N^{*}$, and averaging their outputs.
△ Less
Submitted 8 October, 2019; v1 submitted 6 January, 2019;
originally announced January 2019.
-
Direct Coupling Analysis of Epistasis in Allosteric Materials
Authors:
Barbara Bravi,
Riccardo Ravasio,
Carolina Brito,
Matthieu Wyart
Abstract:
In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such allosteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating…
▽ More
In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such allosteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in models where an allosteric material is evolved in silico to perform a mechanical task. We find in our model the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and find that DCA predicts well the cost of point mutations but is a rather poor generative model. Strikingly, it can predict short-range epistasis but fails to capture long-range epistasis, in consistence with empirical findings. We propose that such failure is generic when function requires subparts to work in concert. We illustrate this idea with a simple model, which suggests that other methods may be better suited to capture long-range effects.
△ Less
Submitted 13 March, 2020; v1 submitted 26 November, 2018;
originally announced November 2018.
-
A jamming transition from under- to over-parametrization affects loss landscape and generalization
Authors:
Stefano Spigler,
Mario Geiger,
Stéphane d'Ascoli,
Levent Sagun,
Giulio Biroli,
Matthieu Wyart
Abstract:
We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h…
▽ More
We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.
△ Less
Submitted 18 June, 2019; v1 submitted 22 October, 2018;
originally announced October 2018.
-
The jamming transition as a paradigm to understand the loss landscape of deep neural networks
Authors:
Mario Geiger,
Stefano Spigler,
Stéphane d'Ascoli,
Levent Sagun,
Marco Baity-Jesi,
Giulio Biroli,
Matthieu Wyart
Abstract:
Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal…
▽ More
Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in FC networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random data is independent of their depth. Our observations suggests that this independence also holds for real data. We also study a quantity $Δ$ which characterizes how well ($Δ<0$) or badly ($Δ>0$) a datum is learned. At the critical point it is power-law distributed, $P_+(Δ)\simΔ^θ$ for $Δ>0$ and $P_-(Δ)\sim(-Δ)^{-γ}$ for $Δ<0$, with $θ\approx0.3$ and $γ\approx0.2$. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.
△ Less
Submitted 17 June, 2019; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Fast generation of ultrastable computer glasses by minimization of an augmented potential energy
Authors:
Geert Kapteijns,
Wencheng Ji,
Carolina Brito,
Matthieu Wyart,
Edan Lerner
Abstract:
We present a model and protocol that enable the generation of extremely stable computer glasses at minimal computational cost. The protocol consists of an instantaneous quench in an augmented potential energy landscape, with particle radii as additional degrees of freedom. We demonstrate how our glasses' mechanical stability, which is readily tunable in our approach, is reflected both in microscop…
▽ More
We present a model and protocol that enable the generation of extremely stable computer glasses at minimal computational cost. The protocol consists of an instantaneous quench in an augmented potential energy landscape, with particle radii as additional degrees of freedom. We demonstrate how our glasses' mechanical stability, which is readily tunable in our approach, is reflected both in microscopic and macroscopic observables. Our observations indicate that the stability of our computer glasses is at least comparable to that of computer glasses generated by the celebrated Swap Monte Carlo algorithm. Strikingly, some key properties support even qualitatively enhanced stability in our scheme: the density of quasilocalized excitations displays a gap in our most stable computer glasses, whose magnitude scales with the polydispersity of the particles. We explain this observation, which is consistent with the lack of plasticity we observe at small stress. It also suggests that these glasses are depleted from two-level systems, similarly to experimental vapor-deposited ultrastable glasses.
△ Less
Submitted 4 January, 2019; v1 submitted 31 July, 2018;
originally announced August 2018.
-
Universality of jamming of non-spherical particles
Authors:
Carolina Brito,
Harukuni Ikeda,
Pierfrancesco Urbani,
Matthieu Wyart,
Francesco Zamponi
Abstract:
Amorphous packings of non-spherical particles such as ellipsoids and spherocylinders are known to be hypostatic: the number of mechanical contacts between particles is smaller than the number of degrees of freedom, thus violating Maxwell's mechanical stability criterion. In this work, we propose a general theory of hypostatic amorphous packings and the associated jamming transition. First, we show…
▽ More
Amorphous packings of non-spherical particles such as ellipsoids and spherocylinders are known to be hypostatic: the number of mechanical contacts between particles is smaller than the number of degrees of freedom, thus violating Maxwell's mechanical stability criterion. In this work, we propose a general theory of hypostatic amorphous packings and the associated jamming transition. First, we show that many systems fall into a same universality class. As an example, we explicitly map ellipsoids into a system of `breathing' particles. We show by using a marginal stability argument that in both cases jammed packings are hypostatic, and that the critical exponents related to the contact number and the vibrational density of states are the same. Furthermore, we introduce a generalized perceptron model which can be solved analytically by the replica method. The analytical solution predicts critical exponents in the same hypostatic jamming universality class. Our analysis further reveals that the force and gap distributions of hypostatic jamming do not show power-law behavior, in marked contrast to the isostatic jamming of spherical particles. Finally, we confirm our theoretical predictions by numerical simulations.
△ Less
Submitted 3 November, 2018; v1 submitted 5 July, 2018;
originally announced July 2018.
-
Theory for the density of interacting quasi-localised modes in amorphous solids
Authors:
Wencheng Ji,
Marko Popović,
Tom W. J. de Geus,
Edan Lerner,
Matthieu Wyart
Abstract:
Quasi-localised modes appear in the vibrational spectrum of amorphous solids at low-frequency. Though never formalised, these modes are believed to have a close relationship with other important local excitations, including shear transformations and two-level systems. We provide a theory for their frequency density, $D_{L}(ω)\simω^α$, that establishes this link for systems at zero temperature unde…
▽ More
Quasi-localised modes appear in the vibrational spectrum of amorphous solids at low-frequency. Though never formalised, these modes are believed to have a close relationship with other important local excitations, including shear transformations and two-level systems. We provide a theory for their frequency density, $D_{L}(ω)\simω^α$, that establishes this link for systems at zero temperature under quasi-static loading. It predicts two regimes depending on the density of shear transformations $P(x)\sim x^θ$ (with $x$ the additional stress needed to trigger a shear transformation). If $θ>1/4$, $α=4$ and a finite fraction of quasi-localised modes form shear transformations, whose amplitudes vanish at low frequencies. If $θ<1/4$, $α=3+ 4 θ$ and all quasi-localised modes form shear transformations with a finite amplitude at vanishing frequencies. We confirm our predictions numerically.
△ Less
Submitted 4 March, 2019; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Spatial structure of quasi-localized vibrations in nearly jammed amorphous solids
Authors:
Masanari Shimada,
Hideyuki Mizuno,
Matthieu Wyart,
Atsushi Ikeda
Abstract:
The low-temperature properties of amorphous solids are widely believed to be controlled by low-frequency quasi-localized modes. What governs their spatial structure and density is however debated. We study these questions numerically in very large systems as the jamming transition is approached and the pressure p vanishes. We find that these modes consist of an unstable core in which particles und…
▽ More
The low-temperature properties of amorphous solids are widely believed to be controlled by low-frequency quasi-localized modes. What governs their spatial structure and density is however debated. We study these questions numerically in very large systems as the jamming transition is approached and the pressure p vanishes. We find that these modes consist of an unstable core in which particles undergo the buckling motions and decrease the energy, and a stable far-field component which increases the energy and prevents the buckling of the core. The size of the core diverges as $p^{-1/4}$ and its characteristic volume as $p^{-1/2}$ These features are precisely those of the anomalous modes known to cause the Boson peak in the vibrational spectrum of weakly-coordinated materials. From this correspondence we deduce that the density of quasi-localized modes must go as $g_{\mathrm{loc}}(ω) \sim ω^4/p^2$ , in agreement with previous observations. Our analysis thus unravels the nature of quasi-localized modes in a class of amorphous materials.
△ Less
Submitted 25 April, 2018; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Elasto-plastic description of brittle failure in amorphous materials
Authors:
Marko Popović,
Tom W. J. de Geus,
Matthieu Wyart
Abstract:
The response of amorphous materials to an applied strain can be continuous, or instead display a macroscopic stress drop when a shear band nucleates. Such discontinuous response can be observed if the initial configuration is very stable. We study theoretically how such brittleness emerges in athermal, quasi-statically driven, materials as their initial stability is increased. We show that this em…
▽ More
The response of amorphous materials to an applied strain can be continuous, or instead display a macroscopic stress drop when a shear band nucleates. Such discontinuous response can be observed if the initial configuration is very stable. We study theoretically how such brittleness emerges in athermal, quasi-statically driven, materials as their initial stability is increased. We show that this emergence is well reproduced by elasto-plastic models and is predicted by a mean field approximation, where it corresponds to a continuous transition. In mean field, failure can be forecasted from the avalanche statistics. We show that this is not the case for very brittle materials in finite dimensions due to rare weak regions where a shear band nucleates. Their critical radius is predicted to follow $a_c\sim (Σ-Σ_b)^{-2}$, where $Σ$ is the stress and $Σ_b$ the stress a shear band can carry.
△ Less
Submitted 30 March, 2018;
originally announced March 2018.
-
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Authors:
M. Baity-Jesi,
L. Sagun,
M. Geiger,
S. Spigler,
G. Ben Arous,
C. Cammarota,
Y. LeCun,
M. Wyart,
G. Biroli
Abstract:
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur…
▽ More
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
△ Less
Submitted 7 June, 2018; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Theory for Swap Acceleration near the Glass and Jamming Transitions
Authors:
Carolina Brito,
Edan Lerner,
Matthieu Wyart
Abstract:
Swap algorithms can shift the glass transition to lower temperatures, a recent unexplained observation constraining the nature of this phenomenon. Here we show that swap dynamic is governed by an effective potential describing both particle interactions as well as their ability to change size. Requiring its stability is more demanding than for the potential energy alone. This result implies that s…
▽ More
Swap algorithms can shift the glass transition to lower temperatures, a recent unexplained observation constraining the nature of this phenomenon. Here we show that swap dynamic is governed by an effective potential describing both particle interactions as well as their ability to change size. Requiring its stability is more demanding than for the potential energy alone. This result implies that stable configurations appear at lower energies with swap dynamics, and thus at lower temperatures when the liquid is cooled. \maa{ The magnitude of this effect is proportional to the width of the radii distribution, and decreases with compression for finite-range purely repulsive interaction potentials.} We test these predictions numerically and discuss the implications of these findings for the glass transition.We extend these results to the case of hard spheres where swap is argued to destroy meta-stable states of the free energy coarse-grained on vibrational time scales. Our analysis unravels the soft elastic modes responsible for the speed up swap induces, and allows us to predict the structure and the vibrational properties of glass configurations reachable with swap. In particular for continuously poly-disperse systems we predict the jamming transition to be dramatically altered, as we confirm numerically. A surprising practical outcome of our analysis is new algorithm that generates ultra-stable glasses by simple descent in an appropriate effective potential.
△ Less
Submitted 10 April, 2018; v1 submitted 11 January, 2018;
originally announced January 2018.
-
Constitutive relations for shear fronts in shear-thickening suspensions
Authors:
Endao Han,
Matthieu Wyart,
Ivo R. Peters,
Heinrich M. Jaeger
Abstract:
We study the fronts that appear when a shear-thickening suspension is submitted to a sudden driving force at a boundary. Using a quasi-one-dimensional experimental geometry, we extract the front shape and the propagation speed from the suspension flow field and map out their dependence on applied shear. We find that the relation between stress and velocity is quadratic, as is generally true for in…
▽ More
We study the fronts that appear when a shear-thickening suspension is submitted to a sudden driving force at a boundary. Using a quasi-one-dimensional experimental geometry, we extract the front shape and the propagation speed from the suspension flow field and map out their dependence on applied shear. We find that the relation between stress and velocity is quadratic, as is generally true for inertial effects in liquids, but with a pre-factor that can be much larger than the material density. We show that these experimental findings can be explained by an extension of the Wyart-Cates model, which was originally developed to describe steady-state shear-thickening. This is achieved by introducing a sole additional parameter: the characteristic strain scale that controls the crossover from start-up response to steady-state behavior. The theoretical framework we obtain unifies both transient and steady-state properties of shear-thickening materials.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.