Search | arXiv e-print repository

doi 10.1038/s41598-024-62625-8

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems

Authors: Maria Chiara Angelini, Angelo Giorgio Cavaliere, Raffaele Marino, Federico Ricci-Tersenghi

Abstract: Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Me… ▽ More Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems. △ Less

Submitted 30 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 19 pages, 9 figures

Journal ref: Scientific Reports 14, 11638 (2024)

arXiv:2303.14879 [pdf, other]

Biased thermodynamics can explain the behaviour of smart optimization algorithms that work above the dynamical threshold

Authors: Angelo Giorgio Cavaliere, Federico Ricci-Tersenghi

Abstract: Random constraint satisfaction problems can display a very rich structure in the space of solutions, with often an ergodicity breaking -- also known as clustering or dynamical -- transition preceding the satisfiability threshold when the constraint-to-variables ratio $α$ is increased. However, smart algorithms start to fail finding solutions in polynomial time at some threshold $α_{\rm alg}$ which… ▽ More Random constraint satisfaction problems can display a very rich structure in the space of solutions, with often an ergodicity breaking -- also known as clustering or dynamical -- transition preceding the satisfiability threshold when the constraint-to-variables ratio $α$ is increased. However, smart algorithms start to fail finding solutions in polynomial time at some threshold $α_{\rm alg}$ which is algorithmic dependent and generally bigger than the dynamical one $α_d$. The reason for this discrepancy is due to the fact that $α_d$ is traditionally computed according to the uniform measure over all the solutions. Thus, while bounding the region where a uniform sampling of the solutions is easy, it cannot predict the performance of off-equilibrium processes, that are still able of finding atypical solutions even beyond $α_d$. Here we show that a reconciliation between algorithmic behaviour and thermodynamic prediction is nonetheless possible at least up to some threshold $α_d^{\rm opt}\geqα_d$, which is defined as the maximum value of the dynamical threshold computed on all possible probability measures over the solutions. We consider a simple Monte Carlo-based optimization algorithm, which is restricted to the solution space, and we demonstrate that sampling the equilibrium distribution of a biased measure improving on $α_d$ is still possible even beyond the ergodicity breaking point for the uniform measure, where other algorithms hopelessly enter the out-of-equilibrium regime. The conjecture we put forward is that many smart algorithms sample the solution space according to a biased measure: once this measure is identified, the algorithmic threshold is given by the corresponding ergodicity-breaking transition. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2109.13645 [pdf, other]

doi 10.1088/1742-5468/ac382e

Optimization of the dynamic transition in the continuous coloring problem

Authors: Angelo Giorgio Cavaliere, Thibault Lesieur, Federico Ricci-Tersenghi

Abstract: Random constraint satisfaction problems can exhibit a phase where the number of constraints per variable $α$ makes the system solvable in theory on the one hand, but also makes the search for a solution hard, meaning that common algorithms such as Monte-Carlo method fail to find a solution. The onset of this hardness is deeply linked to the appearance of a dynamical phase transition where the phas… ▽ More Random constraint satisfaction problems can exhibit a phase where the number of constraints per variable $α$ makes the system solvable in theory on the one hand, but also makes the search for a solution hard, meaning that common algorithms such as Monte-Carlo method fail to find a solution. The onset of this hardness is deeply linked to the appearance of a dynamical phase transition where the phase space of the problem breaks into an exponential number of clusters. The exact position of this dynamical phase transition is not universal with respect to the details of the Hamiltonian one chooses to represent a given problem. In this paper, we develop some theoretical tools in order to find a systematic way to build a Hamiltonian that maximizes the dynamic $α_{\rm d}$ threshold. To illustrate our techniques, we will concentrate on the problem of continuous coloring, where one tries to set an angle $x_i \in [0;2π]$ on each node of a network in such a way that no adjacent nodes are closer than some threshold angle $θ$, that is $\cos(x_i - x_j) \leq \cosθ$. This problem can be both seen as a continuous version of the discrete graph coloring problem or as a one-dimensional version of the the Mari-Krzakala-Kurchan (MKK) model. The relevance of this model stems from the fact that continuous constraint satisfaction problems on sparse random graphs remain largely unexplored in statistical physics. We show that for sufficiently small angle $θ$ this model presents a random first order transition and compute the dynamical, condensation and Kesten-Stigum transitions; we also compare the analytical predictions with Monte Carlo simulations for values of $θ= 2π/q$, $q \in \mathbb{N}$. Choosing such values of $q$ allows us to easily compare our results with the renowned problem of discrete coloring. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Journal ref: J. Stat. Mech. 113302 (2021)

arXiv:1904.12725 [pdf, other]

doi 10.1088/1751-8121/ab10f9

Disordered Ising model with correlated frustration

Authors: Angelo Giorgio Cavaliere, Andrea Pelissetto

Abstract: We consider the $\pm J$ Ising model on a cubic lattice with a gauge-invariant disorder distribution. Disorder depends on a parameter $β_G$ that plays the role of a chemical potential for the amount of frustration. We study the model at a specific value of the disorder parameter $β_G$, where frustration shows long-range correlations. We characterize the universality class, obtaining accurate estima… ▽ More We consider the $\pm J$ Ising model on a cubic lattice with a gauge-invariant disorder distribution. Disorder depends on a parameter $β_G$ that plays the role of a chemical potential for the amount of frustration. We study the model at a specific value of the disorder parameter $β_G$, where frustration shows long-range correlations. We characterize the universality class, obtaining accurate estimates of the critical exponents: $ν= 0.655(15)$ and $η_q = 1.05(5)$, where $η_q$ is the overlap susceptibility exponent. △ Less

Submitted 29 April, 2019; originally announced April 2019.

Journal ref: 2019 J. Phys. A: Math. Theor. 52 174002

Showing 1–4 of 4 results for author: Cavaliere, A G