-
Training neural networks using Metropolis Monte Carlo and an adaptive variant
Authors:
Stephen Whitelam,
Viktor Selin,
Ian Benlolo,
Corneel Casert,
Isaac Tamblyn
Abstract:
We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail au…
▽ More
We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.
△ Less
Submitted 9 August, 2022; v1 submitted 15 May, 2022;
originally announced May 2022.
-
Correspondence between neuroevolution and gradient descent
Authors:
Stephen Whitelam,
Viktor Selin,
Sang-Won Park,
Isaac Tamblyn
Abstract:
We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerica…
▽ More
We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations,for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.
△ Less
Submitted 10 September, 2021; v1 submitted 14 August, 2020;
originally announced August 2020.
-
Measurement of the response of a Ga solar neutrino experiment to neutrinos from an 37Ar source
Authors:
J. N. Abdurashitov,
V. N. Gavrin,
S. V. Girin,
V. V. Gorbachev,
P. P. Gurkina,
T. V. Ibragimova,
A. V. Kalikhov,
N. G. Khairnasov,
T. V. Knodel,
V. A. Matveev,
I. N. Mirmov,
A. A. Shikhin,
E. P. Veretenkin,
V. M. Vermul,
V. E. Yants,
G. T. Zatsepin,
T. J. Bowles,
S. R. Elliott,
W. A. Teasdale,
B. T. Cleveland,
W. C. Haxton,
J. F. Wilkerson,
J. S. Nico,
A. Suzuki,
K. Lande
, et al. (20 additional authors not shown)
Abstract:
An intense source of 37Ar was produced by the (n,alpha) reaction on 40Ca by irradiating 330 kg of calcium oxide in the fast neutron breeder reactor at Zarechny, Russia. The 37Ar was released from the solid target by dissolution in acid, collected from this solution, purified, sealed into a small source, and brought to the Baksan Neutrino Observatory where it was used to irradiate 13 tonnes of ga…
▽ More
An intense source of 37Ar was produced by the (n,alpha) reaction on 40Ca by irradiating 330 kg of calcium oxide in the fast neutron breeder reactor at Zarechny, Russia. The 37Ar was released from the solid target by dissolution in acid, collected from this solution, purified, sealed into a small source, and brought to the Baksan Neutrino Observatory where it was used to irradiate 13 tonnes of gallium metal in the Russian-American gallium solar neutrino experiment SAGE. Ten exposures of the gallium to the source, whose initial strength was 409 +/- 2 kCi, were carried out during the period April to September 2004. The 71Ge produced by the reaction 71Ga(nu_e,e^-)71Ge was extracted, purified, and counted. The measured production rate was 11.0 ^+1.0 _-0.9 (stat) +/- 0.6 (syst) atoms of 71Ge/d, which is 0.79 ^+0.09_-0.10 of the theoretically calculated production rate. When all neutrino source experiments with gallium are considered together, there is an indication the theoretical cross section has been overestimated.
△ Less
Submitted 25 December, 2005;
originally announced December 2005.
-
The a priori tanθtheorem for eigenvectors
Authors:
Sergio Albeverio,
Alexander K. Motovilov,
Alexei V. Selin
Abstract:
Let $A$ be a self-adjoint operator on a Hilbert space $\fH$. Assume that the spectrum of $A$ consists of two disjoint components $σ_0$ and $σ_1$ such that the convex hull of the set $σ_0$ does not intersect the set $σ_1$. Let $V$ be a bounded self-adjoint operator on $\fH$ off-diagonal with respect to the orthogonal decomposition $\fH=\fH_0\oplus\fH_1$ where $\fH_0$ and $\fH_1$ are the spectral…
▽ More
Let $A$ be a self-adjoint operator on a Hilbert space $\fH$. Assume that the spectrum of $A$ consists of two disjoint components $σ_0$ and $σ_1$ such that the convex hull of the set $σ_0$ does not intersect the set $σ_1$. Let $V$ be a bounded self-adjoint operator on $\fH$ off-diagonal with respect to the orthogonal decomposition $\fH=\fH_0\oplus\fH_1$ where $\fH_0$ and $\fH_1$ are the spectral subspaces of $A$ associated with the spectral sets $σ_0$ and $σ_1$, respectively. It is known that if $\|V\|<\sqrt{2}d$ where $d=\dist(σ_0,σ_1)>0$ then the perturbation $V$ does not close the gaps between $σ_0$ and $σ_1$. Assuming that $f$ is an eigenvector of the perturbed operator $A+V$ associated with its eigenvalue in the interval $(\min(σ_0)-d,\max(σ_0)+d)$ we prove that under the condition $\|V\|<\sqrt{2}d$ the (acute) angle $θ$ between $f$ and the orthogonal projection of $f$ onto $\fH_0$ satisfies the bound $\tanθ\leq\frac{\|V\|}{d}$ and this bound is sharp.
△ Less
Submitted 23 December, 2005;
originally announced December 2005.
-
Some sharp norm estimates in the subspace perturbation problem
Authors:
Alexander K. Motovilov,
Alexei V. Selin
Abstract:
We discuss the spectral subspace perturbation problem for a self-adjoint operator. Assuming that the convex hull of a part of its spectrum does not intersect the remainder of the spectrum, we establish an \textit{a priori} sharp bound on variation of the corresponding spectral subspace under off-diagonal perturbations. This bound represents a new, \textit{a priori}, $\tanΘ$ Theorem. We also exte…
▽ More
We discuss the spectral subspace perturbation problem for a self-adjoint operator. Assuming that the convex hull of a part of its spectrum does not intersect the remainder of the spectrum, we establish an \textit{a priori} sharp bound on variation of the corresponding spectral subspace under off-diagonal perturbations. This bound represents a new, \textit{a priori}, $\tanΘ$ Theorem. We also extend the Davis--Kahan $\tan 2Θ$ Theorem to the case of some unbounded perturbations.
△ Less
Submitted 8 April, 2006; v1 submitted 28 September, 2004;
originally announced September 2004.