-
Self Normalizing Flows
Authors:
T. Anderson Keller,
Jorn W. T. Peters,
Priyank Jaini,
Emiel Hoogeboom,
Patrick Forré,
Max Welling
Abstract:
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models,…
▽ More
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.
△ Less
Submitted 9 June, 2021; v1 submitted 14 November, 2020;
originally announced November 2020.
-
Universal Free Energy Landscape Produces Efficient and Reversible Electron Bifurcation
Authors:
Jonathon L. Yuly,
Peng Zhang,
Carolyn E. Lubner,
John W. Peters,
David N. Beratan
Abstract:
For decades, it was unknown how electron bifurcating systems in Nature prevented energy-wasting short-circuiting reactions that have large driving forces, so synthetic electron bifurcating molecular machines could not be designed and built. The underpinning free energy landscapes for electron bifurcation were also enigmatic. We predict that a simple and universal free energy landscape enables elec…
▽ More
For decades, it was unknown how electron bifurcating systems in Nature prevented energy-wasting short-circuiting reactions that have large driving forces, so synthetic electron bifurcating molecular machines could not be designed and built. The underpinning free energy landscapes for electron bifurcation were also enigmatic. We predict that a simple and universal free energy landscape enables electron bifurcation, and we show that it enables high-efficiency bifurcation with limited short-circuiting (the EB-scheme). The landscape relies on steep free energy slopes in the two redox branches to insulate against short-circuiting without relying on nuanced changes in the microscopic rate constants for the short-circuiting reactions. The EB-scheme thus provides a blueprint for future campaigns to establish synthetic electron bifurcating machines.
△ Less
Submitted 21 July, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Integer Discrete Flows and Lossless Compression
Authors:
Emiel Hoogeboom,
Jorn W. T. Peters,
Rianne van den Berg,
Max Welling
Abstract:
Lossless compression methods shorten the expected representation size of data without loss of information, using a statistical model. Flow-based models are attractive in this setting because they admit exact likelihood optimization, which is equivalent to minimizing the expected number of bits per message. However, conventional flows assume continuous data, which may lead to reconstruction errors…
▽ More
Lossless compression methods shorten the expected representation size of data without loss of information, using a statistical model. Flow-based models are attractive in this setting because they admit exact likelihood optimization, which is equivalent to minimizing the expected number of bits per message. However, conventional flows assume continuous data, which may lead to reconstruction errors when quantized for compression. For that reason, we introduce a flow-based generative model for ordinal discrete data called Integer Discrete Flow (IDF): a bijective integer map that can learn rich transformations on high-dimensional data. As building blocks for IDFs, we introduce a flexible transformation layer called integer discrete coupling. Our experiments show that IDFs are competitive with other flow-based generative models. Furthermore, we demonstrate that IDF based compression achieves state-of-the-art lossless compression rates on CIFAR10, ImageNet32, and ImageNet64. To the best of our knowledge, this is the first lossless compression method that uses invertible neural networks.
△ Less
Submitted 6 December, 2019; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Probabilistic Binary Neural Networks
Authors:
Jorn W. T. Peters,
Max Welling
Abstract:
Low bit-width weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both binary weights and activations, called BLRNet. By embracing stochasticity during training, we circumvent the need to approximate the gradient of non-differentiable…
▽ More
Low bit-width weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both binary weights and activations, called BLRNet. By embracing stochasticity during training, we circumvent the need to approximate the gradient of non-differentiable functions such as sign(), while still obtaining a fully Binary Neural Network at test time. Moreover, it allows for anytime ensemble predictions for improved performance and uncertainty estimates by sampling from the weight distribution. Since all operations in a layer of the BLRNet operate on random variables, we introduce stochastic versions of Batch Normalization and max pooling, which transfer well to a deterministic network at test time. We evaluate the BLRNet on multiple standardized benchmarks.
△ Less
Submitted 10 September, 2018;
originally announced September 2018.
-
HexaConv
Authors:
Emiel Hoogeboom,
Jorn W. T. Peters,
Taco S. Cohen,
Max Welling
Abstract:
The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been…
▽ More
The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been necessary to limit the group convolution to transformations that can be applied to the filters without interpolation. Thus, for images with square pixels, only integer translations, rotations by multiples of 90 degrees, and reflections are admissible.
Whereas the square tiling provides a 4-fold rotational symmetry, a hexagonal tiling of the plane has a 6-fold rotational symmetry. In this paper we show how one can efficiently implement planar convolution and group convolution over hexagonal lattices, by re-using existing highly optimized convolution routines. We find that, due to the reduced anisotropy of hexagonal filters, planar HexaConv provides better accuracy than planar convolution with square filters, given a fixed parameter budget. Furthermore, we find that the increased degree of symmetry of the hexagonal grid increases the effectiveness of group convolutions, by allowing for more parameter sharing. We show that our method significantly outperforms conventional CNNs on the AID aerial scene classification dataset, even outperforming ImageNet pre-trained models.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.