-
Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems
Authors:
Maria Chiara Angelini,
Angelo Giorgio Cavaliere,
Raffaele Marino,
Federico Ricci-Tersenghi
Abstract:
Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Me…
▽ More
Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
△ Less
Submitted 30 May, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Biased thermodynamics can explain the behaviour of smart optimization algorithms that work above the dynamical threshold
Authors:
Angelo Giorgio Cavaliere,
Federico Ricci-Tersenghi
Abstract:
Random constraint satisfaction problems can display a very rich structure in the space of solutions, with often an ergodicity breaking -- also known as clustering or dynamical -- transition preceding the satisfiability threshold when the constraint-to-variables ratio $α$ is increased. However, smart algorithms start to fail finding solutions in polynomial time at some threshold $α_{\rm alg}$ which…
▽ More
Random constraint satisfaction problems can display a very rich structure in the space of solutions, with often an ergodicity breaking -- also known as clustering or dynamical -- transition preceding the satisfiability threshold when the constraint-to-variables ratio $α$ is increased. However, smart algorithms start to fail finding solutions in polynomial time at some threshold $α_{\rm alg}$ which is algorithmic dependent and generally bigger than the dynamical one $α_d$. The reason for this discrepancy is due to the fact that $α_d$ is traditionally computed according to the uniform measure over all the solutions. Thus, while bounding the region where a uniform sampling of the solutions is easy, it cannot predict the performance of off-equilibrium processes, that are still able of finding atypical solutions even beyond $α_d$. Here we show that a reconciliation between algorithmic behaviour and thermodynamic prediction is nonetheless possible at least up to some threshold $α_d^{\rm opt}\geqα_d$, which is defined as the maximum value of the dynamical threshold computed on all possible probability measures over the solutions. We consider a simple Monte Carlo-based optimization algorithm, which is restricted to the solution space, and we demonstrate that sampling the equilibrium distribution of a biased measure improving on $α_d$ is still possible even beyond the ergodicity breaking point for the uniform measure, where other algorithms hopelessly enter the out-of-equilibrium regime. The conjecture we put forward is that many smart algorithms sample the solution space according to a biased measure: once this measure is identified, the algorithmic threshold is given by the corresponding ergodicity-breaking transition.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Optimization of the dynamic transition in the continuous coloring problem
Authors:
Angelo Giorgio Cavaliere,
Thibault Lesieur,
Federico Ricci-Tersenghi
Abstract:
Random constraint satisfaction problems can exhibit a phase where the number of constraints per variable $α$ makes the system solvable in theory on the one hand, but also makes the search for a solution hard, meaning that common algorithms such as Monte-Carlo method fail to find a solution. The onset of this hardness is deeply linked to the appearance of a dynamical phase transition where the phas…
▽ More
Random constraint satisfaction problems can exhibit a phase where the number of constraints per variable $α$ makes the system solvable in theory on the one hand, but also makes the search for a solution hard, meaning that common algorithms such as Monte-Carlo method fail to find a solution. The onset of this hardness is deeply linked to the appearance of a dynamical phase transition where the phase space of the problem breaks into an exponential number of clusters. The exact position of this dynamical phase transition is not universal with respect to the details of the Hamiltonian one chooses to represent a given problem. In this paper, we develop some theoretical tools in order to find a systematic way to build a Hamiltonian that maximizes the dynamic $α_{\rm d}$ threshold. To illustrate our techniques, we will concentrate on the problem of continuous coloring, where one tries to set an angle $x_i \in [0;2π]$ on each node of a network in such a way that no adjacent nodes are closer than some threshold angle $θ$, that is $\cos(x_i - x_j) \leq \cosθ$. This problem can be both seen as a continuous version of the discrete graph coloring problem or as a one-dimensional version of the the Mari-Krzakala-Kurchan (MKK) model. The relevance of this model stems from the fact that continuous constraint satisfaction problems on sparse random graphs remain largely unexplored in statistical physics. We show that for sufficiently small angle $θ$ this model presents a random first order transition and compute the dynamical, condensation and Kesten-Stigum transitions; we also compare the analytical predictions with Monte Carlo simulations for values of $θ= 2π/q$, $q \in \mathbb{N}$. Choosing such values of $q$ allows us to easily compare our results with the renowned problem of discrete coloring.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Disordered Ising model with correlated frustration
Authors:
Angelo Giorgio Cavaliere,
Andrea Pelissetto
Abstract:
We consider the $\pm J$ Ising model on a cubic lattice with a gauge-invariant disorder distribution. Disorder depends on a parameter $β_G$ that plays the role of a chemical potential for the amount of frustration. We study the model at a specific value of the disorder parameter $β_G$, where frustration shows long-range correlations. We characterize the universality class, obtaining accurate estima…
▽ More
We consider the $\pm J$ Ising model on a cubic lattice with a gauge-invariant disorder distribution. Disorder depends on a parameter $β_G$ that plays the role of a chemical potential for the amount of frustration. We study the model at a specific value of the disorder parameter $β_G$, where frustration shows long-range correlations. We characterize the universality class, obtaining accurate estimates of the critical exponents: $ν= 0.655(15)$ and $η_q = 1.05(5)$, where $η_q$ is the overlap susceptibility exponent.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.