Skip to main content

Showing 51–88 of 88 results for author: Sohl-dickstein, J

.
  1. arXiv:1806.09597  [pdf, other

    cs.LG cs.AI stat.ML

    Stochastic natural gradient descent draws posterior samples in function space

    Authors: Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

    Abstract: Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima. In this work we develop a similar correspondence for minibatch natural gradient descent (NGD). We prove that for sufficiently small learning rates, if the model predictions on the training set approach the true conditional distribution of labels given inputs, the… ▽ More

    Submitted 28 November, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Workshop on Bayesian Deep Learning (NeurIPS 2018)

  2. arXiv:1806.08805  [pdf, ps, other

    stat.ML cs.LG

    PCA of high dimensional random walks with comparison to neural network training

    Authors: Joseph M. Antognini, Jascha Sohl-Dickstein

    Abstract: One technique to visualize the training of neural networks is to perform PCA on the parameters over the course of training and to project to the subspace spanned by the first few PCA components. In this paper we compare this technique to the PCA of a high dimensional random walk. We compute the eigenvalues and eigenvectors of the covariance of the trajectory and prove that in the long trajectory a… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

  3. arXiv:1806.05393  [pdf, other

    stat.ML cs.LG

    Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

    Authors: Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

    Abstract: In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enabl… ▽ More

    Submitted 10 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018 Conference Proceedings

  4. arXiv:1804.00222  [pdf, other

    cs.LG cs.NE stat.ML

    Meta-Learning Update Rules for Unsupervised Representation Learning

    Authors: Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

    Abstract: A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose… ▽ More

    Submitted 26 February, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

  5. arXiv:1802.08760  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Sensitivity and Generalization in Neural Networks: an Empirical Study

    Authors: Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of… ▽ More

    Submitted 18 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  6. arXiv:1802.08195  [pdf, other

    cs.LG cs.CV q-bio.NC stat.ML

    Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

    Authors: Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

    Abstract: Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with… ▽ More

    Submitted 21 May, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Journal ref: Advances in Neural Information Processing Systems, 2018

  7. arXiv:1711.09268  [pdf, other

    stat.ML cs.AI cs.LG

    Generalizing Hamiltonian Monte Carlo with Neural Networks

    Authors: Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

    Abstract: We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distribut… ▽ More

    Submitted 2 March, 2018; v1 submitted 25 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  8. arXiv:1711.00165  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Gaussian Processes

    Authors: Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer… ▽ More

    Submitted 2 March, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Published version in ICLR 2018. 10 pages + appendix

  9. arXiv:1710.06570  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    A Correspondence Between Random Neural Networks and Statistical Field Theory

    Authors: Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto la… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

  10. arXiv:1706.05806  [pdf, other

    stat.ML cs.LG

    SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

    Authors: Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein

    Abstract: We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods). We deploy this tool to measure the intrinsic dimensionality of… ▽ More

    Submitted 8 November, 2017; v1 submitted 19 June, 2017; originally announced June 2017.

    Comments: Accepted to NIPS 2017, code: https://github.com/google/svcca/ , new plots on Imagenet

  11. arXiv:1703.07370  [pdf, other

    cs.LG stat.ML

    REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

    Authors: George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

    Abstract: Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient… ▽ More

    Submitted 6 November, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  12. arXiv:1703.04813  [pdf, other

    cs.LG cs.NE stat.ML

    Learned Optimizers that Scale and Generalize

    Authors: Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

    Abstract: Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve… ▽ More

    Submitted 7 September, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

    Comments: Final ICML paper after reviewer suggestions

  13. arXiv:1612.02780  [pdf, other

    cs.LG stat.ML

    Improved generator objectives for GANs

    Authors: Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova

    Abstract: We present a framework to understand GAN training as alternating density ratio estimation and approximate divergence minimization. This provides an interpretation for the mismatched GAN generator and discriminator objectives often used in practice, and explains the problem of poor sample diversity. We also derive a family of generator objectives that target arbitrary $f$-divergences without minimi… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: NIPS 2016 Workshop on Adversarial Training

  14. arXiv:1611.09913  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Capacity and Trainability in Recurrent Neural Networks

    Authors: Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo

    Abstract: Two potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stackin… ▽ More

    Submitted 3 March, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

    Comments: Published as a conference paper at ICLR 2017

  15. arXiv:1611.09434  [pdf, other

    cs.AI cs.CL cs.LG cs.NE

    Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

    Authors: Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo

    Abstract: There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we ca… ▽ More

    Submitted 12 June, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: ICLR 2107 submission: https://openreview.net/forum?id=H1MjAnqxg

  16. arXiv:1611.08083  [pdf, other

    stat.ML cs.LG cs.NE

    Survey of Expressivity in Deep Neural Networks

    Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We survey results on neural network expressivity described in "On the Expressive Power of Deep Neural Networks". The paper motivates and develops three natural measures of expressiveness, which all display an exponential dependence on the depth of the network. In fact, all of these measures are related to a fourth quantity, trajectory length. This quantity grows exponentially in the depth of the n… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  17. arXiv:1611.02163  [pdf, other

    cs.LG stat.ML

    Unrolled Generative Adversarial Networks

    Authors: Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein

    Abstract: We introduce a method to stabilize Generative Adversarial Networks (GANs) by defining the generator objective with respect to an unrolled optimization of the discriminator. This allows training to be adjusted between using the optimal discriminator in the generator's objective, which is ideal but infeasible in practice, and using the current value of the discriminator, which is often unstable and… ▽ More

    Submitted 12 May, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

  18. arXiv:1611.01232  [pdf, other

    stat.ML cs.LG

    Deep Information Propagation

    Authors: Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth sca… ▽ More

    Submitted 4 April, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

  19. arXiv:1606.05340  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Exponential expressivity in deep neural networks through transient chaos

    Authors: Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

    Abstract: We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this gene… ▽ More

    Submitted 17 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Fixed equation references

  20. arXiv:1606.05336  [pdf, other

    stat.ML cs.AI cs.LG

    On the Expressive Power of Deep Neural Networks

    Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute. Our approach is based on an interrelated set of measures of expressivity, unified by the novel notion of trajectory length, which measures how the output of a network changes as the input sweeps along a… ▽ More

    Submitted 18 June, 2017; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Accepted to ICML 2017

  21. arXiv:1605.08803  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Density estimation using Real NVP

    Authors: Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio

    Abstract: Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting… ▽ More

    Submitted 27 February, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

    Comments: 10 pages of main content, 3 pages of bibliography, 18 pages of appendix. Accepted at ICLR 2017

  22. arXiv:1603.07758  [pdf, other

    cond-mat.stat-mech cs.IT physics.bio-ph q-bio.NC stat.ML

    A universal tradeoff between power, precision and speed in physical communication

    Authors: Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli

    Abstract: Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal. Also, biological systems achieve remarkable speed, precision and power efficiency using poorly understood physical design principles. Powerful theories like information theory and thermodynamics do not provide general limits on power, precision and speed. Here we go beyo… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Comments: 15 pages, 3 figures

  23. arXiv:1509.03808  [pdf, other

    stat.ML stat.CO

    A Markov Jump Process for More Efficient Hamiltonian Monte Carlo

    Authors: Andrew B. Berger, Mayur Mudigonda, Michael R. DeWeese, Jascha Sohl-Dickstein

    Abstract: In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1. We derive a Hamiltonian Monte Carlo algorithm using a continuous time Markov jump process, and are thus able to escape this constraint. Transition rates in a Markov jump process… ▽ More

    Submitted 11 October, 2015; v1 submitted 13 September, 2015; originally announced September 2015.

  24. arXiv:1506.05908  [pdf, other

    cs.AI cs.CY cs.LG

    Deep Knowledge Tracing

    Authors: Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein

    Abstract: Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education. Though effectively modeling student knowledge would have high educational impact, the task has many inherent challenges. In this paper we explore the utility of using Recurrent Neural Networks (RNNs) to model student learning. The R… ▽ More

    Submitted 19 June, 2015; originally announced June 2015.

    ACM Class: K.3.1

  25. arXiv:1504.08025  [pdf, ps, other

    cs.LG

    Note on Equivalence Between Recurrent Neural Network Time Series Models and Variational Bayesian Models

    Authors: Jascha Sohl-Dickstein, Diederik P. Kingma

    Abstract: We observe that the standard log likelihood training objective for a Recurrent Neural Network (RNN) model of time series data is equivalent to a variational Bayesian training objective, given the proper choice of generative and inference models. This perspective may motivate extensions to both RNNs and variational Bayesian models. We propose one such extension, where multiple particles are used fo… ▽ More

    Submitted 18 June, 2016; v1 submitted 29 April, 2015; originally announced April 2015.

  26. arXiv:1503.03585  [pdf, other

    cs.LG cond-mat.dis-nn q-bio.NC stat.ML

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Authors: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

    Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physi… ▽ More

    Submitted 18 November, 2015; v1 submitted 12 March, 2015; originally announced March 2015.

  27. arXiv:1409.5191  [pdf, other

    stat.CO stat.ML

    Hamiltonian Monte Carlo Without Detailed Balance

    Authors: Jascha Sohl-Dickstein, Mayur Mudigonda, Michael R. DeWeese

    Abstract: We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection for typical hyperparameters. In situations that would normally lead to rejection, instead a longer trajectory is computed until a new state is reached that can be accepted. This is achieved using Markov chain transitions that satisfy the fixed point equation, but do not satisfy detailed balance. The… ▽ More

    Submitted 25 March, 2016; v1 submitted 18 September, 2014; originally announced September 2014.

    Comments: Accepted conference submission to ICML 2014 and also featured in a special edition of JMLR. Since updated to include additional literature citations

  28. arXiv:1406.1831  [pdf, other

    cs.NE cs.LG

    Analyzing noise in autoencoders and deep networks

    Authors: Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli

    Abstract: Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features. Here we extend existing denoising autoencoders to additionally inject noise before the nonlinearity, and at the hidden unit activations. We show that a wide variety of… ▽ More

    Submitted 6 June, 2014; originally announced June 2014.

  29. arXiv:1311.2115  [pdf, other

    cs.LG

    Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods

    Authors: Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli

    Abstract: We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. We unify these disparate approaches by maintaining an independent Hessian approximation for each contributing function in the sum. We maintain computational tractability and limit mem… ▽ More

    Submitted 29 November, 2014; v1 submitted 8 November, 2013; originally announced November 2013.

    MSC Class: 90C26 ACM Class: G.1.6

  30. arXiv:1301.0050  [pdf, other

    q-bio.NC

    Higher Order Correlations within Cortical Layers Dominate Functional Connectivity in Microcolumns

    Authors: Urs Köster, Jascha Sohl-Dickstein, Charles M. Gray, Bruno A. Olshausen

    Abstract: We report on simultaneous recordings from cells in all layers of visual cortex and models developed to capture the higher order structure of population spiking activity. Specifically, we use Ising, Restricted Boltzmann Machine (RBM) and semi-Restricted Boltzmann Machine (sRBM) models to reveal laminar patterns of activity. While the Ising model describes only pairwise couplings, the RBM and sRBM c… ▽ More

    Submitted 31 December, 2012; originally announced January 2013.

  31. arXiv:1209.3744  [pdf, other

    physics.bio-ph cond-mat.stat-mech q-bio.NC

    Minimum and maximum entropy distributions for binary systems with known means and pairwise correlations

    Authors: Badr F. Albanna, Christopher Hillar, Jascha Sohl-Dickstein, Michael R. DeWeese

    Abstract: Maximum entropy models are increasingly being used to describe the collective activity of neural populations with measured mean neural activities and pairwise correlations, but the full space of probability distributions consistent with these constraints has not been explored. We provide upper and lower bounds on the entropy for the {\em minimum} entropy distribution over arbitrarily large collect… ▽ More

    Submitted 21 August, 2017; v1 submitted 17 September, 2012; originally announced September 2012.

    Comments: 34 pages, 7 figures

    Journal ref: Entropy 2017, 19, 427

  32. arXiv:1205.4295  [pdf, other

    cs.LG cs.AI cs.IT cs.NE physics.data-an

    Efficient Methods for Unsupervised Learning of Probabilistic Models

    Authors: Jascha Sohl-Dickstein

    Abstract: In this thesis I develop a variety of techniques to train, evaluate, and sample from intractable and high dimensional probabilistic models. Abstract exceeds arXiv space limitations -- see PDF.

    Submitted 19 May, 2012; originally announced May 2012.

  33. arXiv:1205.1939  [pdf, other

    physics.data-an cs.LG

    Hamiltonian Monte Carlo with Reduced Momentum Flips

    Authors: Jascha Sohl-Dickstein

    Abstract: Hamiltonian Monte Carlo (or hybrid Monte Carlo) with partial momentum refreshment explores the state space more slowly than it otherwise would due to the momentum reversals which occur on proposal rejection. These cause trajectories to double back on themselves, leading to random walk behavior on timescales longer than the typical rejection time, and leading to slower mixing. I present a technique… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

  34. arXiv:1205.1925  [pdf, other

    cs.LG physics.data-an

    Hamiltonian Annealed Importance Sampling for partition function estimation

    Authors: Jascha Sohl-Dickstein, Benjamin J. Culpepper

    Abstract: We introduce an extension to annealed importance sampling that uses Hamiltonian dynamics to rapidly estimate normalization constants. We demonstrate this method by computing log likelihoods in directed and undirected probabilistic image models. We compare the performance of linear generative models with both Gaussian and Laplace priors, product of experts models with Laplace and Student's t expert… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

  35. arXiv:1205.1828  [pdf, other

    cs.LG stat.ML

    The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use

    Authors: Jascha Sohl-Dickstein

    Abstract: The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parame… ▽ More

    Submitted 8 May, 2012; originally announced May 2012.

  36. arXiv:1204.2916  [pdf, other

    nlin.AO q-bio.NC

    Efficient and optimal binary Hopfield associative memory storage using minimum probability flow

    Authors: Christopher Hillar, Jascha Sohl-Dickstein, Kilian Koepsell

    Abstract: We present an algorithm to store binary memories in a Hopfield neural network using minimum probability flow, a recent technique to fit parameters in energy-based probabilistic models. In the case of memories without noise, our algorithm provably achieves optimal pattern storage (which we show is at least one pattern per neuron) and outperforms classical methods both in speed and memory recovery.… ▽ More

    Submitted 19 May, 2015; v1 submitted 13 April, 2012; originally announced April 2012.

    Comments: 6 pages, 4 figures, 2012 Neural Information Processing Systems (NIPS) workshop on Discrete Optimization in Machine Learning (DISCML)

  37. arXiv:1001.1027  [pdf, other

    cs.CV cs.LG

    An Unsupervised Algorithm For Learning Lie Group Transformations

    Authors: Jascha Sohl-Dickstein, Ching Ming Wang, Bruno A. Olshausen

    Abstract: We present several theoretical contributions which allow Lie groups to be fit to high dimensional datasets. Transformation operators are represented in their eigen-basis, reducing the computational complexity of parameter estimation to that of training a linear transformation model. A transformation specific "blurring" operator is introduced that allows inference to escape local minima via a smoot… ▽ More

    Submitted 7 June, 2017; v1 submitted 7 January, 2010; originally announced January 2010.

  38. arXiv:0906.4779  [pdf, other

    cs.LG physics.data-an stat.ML

    Minimum Probability Flow Learning

    Authors: Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese

    Abstract: Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data… ▽ More

    Submitted 24 September, 2011; v1 submitted 25 June, 2009; originally announced June 2009.

    Comments: Updated to match ICML conference proceedings