Search | arXiv e-print repository

Training neural networks using Metropolis Monte Carlo and an adaptive variant

Authors: Stephen Whitelam, Viktor Selin, Ian Benlolo, Corneel Casert, Isaac Tamblyn

Abstract: We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail au… ▽ More We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles. △ Less

Submitted 9 August, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

arXiv:2008.06643 [pdf, other]

doi 10.1038/s41467-021-26568-2

Correspondence between neuroevolution and gradient descent

Authors: Stephen Whitelam, Viktor Selin, Sang-Won Park, Isaac Tamblyn

Abstract: We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerica… ▽ More We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations,for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different. △ Less

Submitted 10 September, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:nucl-ex/0512041 [pdf, ps, other]

doi 10.1103/PhysRevC.73.045805

Measurement of the response of a Ga solar neutrino experiment to neutrinos from an 37Ar source

Authors: J. N. Abdurashitov, V. N. Gavrin, S. V. Girin, V. V. Gorbachev, P. P. Gurkina, T. V. Ibragimova, A. V. Kalikhov, N. G. Khairnasov, T. V. Knodel, V. A. Matveev, I. N. Mirmov, A. A. Shikhin, E. P. Veretenkin, V. M. Vermul, V. E. Yants, G. T. Zatsepin, T. J. Bowles, S. R. Elliott, W. A. Teasdale, B. T. Cleveland, W. C. Haxton, J. F. Wilkerson, J. S. Nico, A. Suzuki, K. Lande , et al. (20 additional authors not shown)

Abstract: An intense source of 37Ar was produced by the (n,alpha) reaction on 40Ca by irradiating 330 kg of calcium oxide in the fast neutron breeder reactor at Zarechny, Russia. The 37Ar was released from the solid target by dissolution in acid, collected from this solution, purified, sealed into a small source, and brought to the Baksan Neutrino Observatory where it was used to irradiate 13 tonnes of ga… ▽ More An intense source of 37Ar was produced by the (n,alpha) reaction on 40Ca by irradiating 330 kg of calcium oxide in the fast neutron breeder reactor at Zarechny, Russia. The 37Ar was released from the solid target by dissolution in acid, collected from this solution, purified, sealed into a small source, and brought to the Baksan Neutrino Observatory where it was used to irradiate 13 tonnes of gallium metal in the Russian-American gallium solar neutrino experiment SAGE. Ten exposures of the gallium to the source, whose initial strength was 409 +/- 2 kCi, were carried out during the period April to September 2004. The 71Ge produced by the reaction 71Ga(nu_e,e^-)71Ge was extracted, purified, and counted. The measured production rate was 11.0 ^+1.0 _-0.9 (stat) +/- 0.6 (syst) atoms of 71Ge/d, which is 0.79 ^+0.09_-0.10 of the theoretically calculated production rate. When all neutrino source experiments with gallium are considered together, there is an indication the theoretical cross section has been overestimated. △ Less

Submitted 25 December, 2005; originally announced December 2005.

Journal ref: Phys.Rev.C73:045805,2006

arXiv:math/0512545 [pdf, ps, other]

doi 10.1137/06065667X

The a priori tanθtheorem for eigenvectors

Authors: Sergio Albeverio, Alexander K. Motovilov, Alexei V. Selin

Abstract: Let $A$ be a self-adjoint operator on a Hilbert space $\fH$. Assume that the spectrum of $A$ consists of two disjoint components $σ_0$ and $σ_1$ such that the convex hull of the set $σ_0$ does not intersect the set $σ_1$. Let $V$ be a bounded self-adjoint operator on $\fH$ off-diagonal with respect to the orthogonal decomposition $\fH=\fH_0\oplus\fH_1$ where $\fH_0$ and $\fH_1$ are the spectral… ▽ More Let $A$ be a self-adjoint operator on a Hilbert space $\fH$. Assume that the spectrum of $A$ consists of two disjoint components $σ_0$ and $σ_1$ such that the convex hull of the set $σ_0$ does not intersect the set $σ_1$. Let $V$ be a bounded self-adjoint operator on $\fH$ off-diagonal with respect to the orthogonal decomposition $\fH=\fH_0\oplus\fH_1$ where $\fH_0$ and $\fH_1$ are the spectral subspaces of $A$ associated with the spectral sets $σ_0$ and $σ_1$, respectively. It is known that if $\|V\|<\sqrt{2}d$ where $d=\dist(σ_0,σ_1)>0$ then the perturbation $V$ does not close the gaps between $σ_0$ and $σ_1$. Assuming that $f$ is an eigenvector of the perturbed operator $A+V$ associated with its eigenvalue in the interval $(\min(σ_0)-d,\max(σ_0)+d)$ we prove that under the condition $\|V\|<\sqrt{2}d$ the (acute) angle $θ$ between $f$ and the orthogonal projection of $f$ onto $\fH_0$ satisfies the bound $\tanθ\leq\frac{\|V\|}{d}$ and this bound is sharp. △ Less

Submitted 23 December, 2005; originally announced December 2005.

MSC Class: 47A55; 47B25

Journal ref: SIAM J. Matrix Anal. Appl. (SIMAX) 29 (2007), 685-697

arXiv:math/0409558 [pdf, ps, other]

doi 10.1007/s00020-006-1437-1

Some sharp norm estimates in the subspace perturbation problem

Authors: Alexander K. Motovilov, Alexei V. Selin

Abstract: We discuss the spectral subspace perturbation problem for a self-adjoint operator. Assuming that the convex hull of a part of its spectrum does not intersect the remainder of the spectrum, we establish an \textit{a priori} sharp bound on variation of the corresponding spectral subspace under off-diagonal perturbations. This bound represents a new, \textit{a priori}, $\tanΘ$ Theorem. We also exte… ▽ More We discuss the spectral subspace perturbation problem for a self-adjoint operator. Assuming that the convex hull of a part of its spectrum does not intersect the remainder of the spectrum, we establish an \textit{a priori} sharp bound on variation of the corresponding spectral subspace under off-diagonal perturbations. This bound represents a new, \textit{a priori}, $\tanΘ$ Theorem. We also extend the Davis--Kahan $\tan 2Θ$ Theorem to the case of some unbounded perturbations. △ Less

Submitted 8 April, 2006; v1 submitted 28 September, 2004; originally announced September 2004.

Report number: JINR E5-2004-154 MSC Class: Primary 47A55; Secondary 47B25

Journal ref: Integral Equations and Operator Theory 56 (2006), 511-542

Showing 1–5 of 5 results for author: Selin, V