Skip to main content

Showing 1–8 of 8 results for author: DeWeese, M R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2110.03922  [pdf, other

    cs.LG stat.ML

    The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

    Authors: James B. Simon, Madeline Dickens, Dhruva Karkada, Michael R. DeWeese

    Abstract: We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.… ▽ More

    Submitted 26 October, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: 12 pages (main text) + 25 pages (refs + appendices). A previous version of this manuscript was entitled "Neural Tangent Kernel Eigenvalues Accurately Predict Generalization."

  2. arXiv:2106.03186  [pdf, other

    cs.LG

    Reverse Engineering the Neural Tangent Kernel

    Authors: James B. Simon, Sajant Anand, Michael R. DeWeese

    Abstract: The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature… ▽ More

    Submitted 13 August, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: 15 pages, 5 figures

  3. arXiv:2007.09240  [pdf, other

    cs.LG stat.ML

    A new method for parameter estimation in probabilistic models: Minimum probability flow

    Authors: Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese

    Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function. We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass. In the latter case it outperforms current tec… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Originally published 2011. Uploaded to arXiv 2020. arXiv admin note: text overlap with arXiv:0906.4779, arXiv:1205.4295

  4. arXiv:2003.10397  [pdf, other

    cs.LG cs.NE stat.ML

    Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

    Authors: Charles G. Frye, James Simon, Neha S. Wadia, Andrew Ligeralde, Michael R. DeWeese, Kristofer E. Bouchard

    Abstract: Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstratin… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 18 pages, 5 figures

  5. arXiv:2001.01681  [pdf, other

    cs.NE cs.ET physics.optics

    Design of optical neural networks with component imprecisions

    Authors: Michael Y. -S. Fang, Sasikanth Manipatruni, Casimir Wierzynski, Amir Khosrowshahi, Michael R. DeWeese

    Abstract: For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -- one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -- to classify handwritten digits. When simulated without any imperfections, GridNet yields a better ac… ▽ More

    Submitted 13 December, 2019; originally announced January 2020.

    Journal ref: Optics express 27.10 (2019): 14009-14029

  6. arXiv:1901.10603  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Numerically Recovering the Critical Points of a Deep Linear Autoencoder

    Authors: Charles G. Frye, Neha S. Wadia, Michael R. DeWeese, Kristofer E. Bouchard

    Abstract: Numerically locating the critical points of non-convex surfaces is a long-standing problem central to many fields. Recently, the loss surfaces of deep neural networks have been explored to gain insight into outstanding questions in optimization, generalization, and network architecture design. However, the degree to which recently-proposed methods for numerically recovering critical points actuall… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

  7. arXiv:1504.04756  [pdf, other

    q-bio.NC cond-mat.dis-nn cs.NE math.PR nlin.CD

    Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality

    Authors: Sarah E. Marzen, Michael R. DeWeese, James P. Crutchfield

    Abstract: The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy r… ▽ More

    Submitted 18 April, 2015; originally announced April 2015.

    Comments: 20 pages, 6 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/trdctim.htm

  8. arXiv:0906.4779  [pdf, other

    cs.LG physics.data-an stat.ML

    Minimum Probability Flow Learning

    Authors: Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese

    Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data… ▽ More

    Submitted 24 September, 2011; v1 submitted 25 June, 2009; originally announced June 2009.

    Comments: Updated to match ICML conference proceedings