InvertibleNetworks.jl: A Julia package for scalable normalizing flows

Rafael Orozco
Georgia Institute of Technology
[email protected]
\AndPhilipp Witte
Microsoft Research
\AndMathias Louboutin
Devito Codes
\AndAli Siahkoohi
Rice University
\AndGabrio Rizzuti
Shearwater GeoServices
\AndBas Peters
Computational Geosciences Inc
\AndFelix J. Herrmann
Georgia Institute of Technology

Keywords Julia $\cdot$ inverse problems $\cdot$ Bayesian inference $\cdot$ imaging $\cdot$ normalizing flows

1 Summary

Normalizing flows is a density estimation method that provides efficient exact likelihood estimation and sampling [1] from high dimensional distributions. This method depends on the use of the change of variables formula which requires an invertible transform. Thus normalizing flow architectures are built to be invertible by design [1]. In theory, the invertibility of architectures constrains the expressiveness but the use of coupling layers allows normalizing flows to exploit the power of arbitrary neural networks that need not be invertible [2] and layer invertibility means if properly implemented many layers can be stacked to increase expressiveness without creating a training memory bottleneck.

The package we present, InvertibleNetworks.jl, is a pure Julia [3] implementation of normalizing flows. We have implemented many relevant neural network layers, including GLOW 1x1 invertible convolutions [4], affine/additive coupling layers [1], Haar wavelet multiscale transforms [5] and Hierarchical invertible neural transport (HINT) [6] among others. These modular layers are easily composed and modified to create different types of normalizing flows. As starting points, we have implemented RealNVP, GLOW, HINT, Hyperbolic networks [7] and their conditional counterparts for users to quickly implement their individual applications.

2 Statement of need

This software package focuses on memory efficiency. The promise of neural networks is in learning high-dimensional distributions from examples thus normalizing flow packages should allow easy application to large dimensional inputs such as images or 3D volumes. Interestingly, the invertibility of normalizing flows naturally alleviates memory concerns since intermediate networks activations can be recomputed instead of saved in memory, greatly reducing the memory needed during backpropagation. The problem is that directly implementing normalizing flows in automatic differentiation frameworks such as PyTorch [8] will not automatically exploit this invertibility. The available packages for normalizing flows such as: nflows [9], normflows [10] and FrEIA [11] are built depending on automatic differentiation frameworks thus do not exploit invertibility for memory efficiently.

3 Memory efficiency

By implementing gradients by hand instead of depending completely on automatic differentiation, our layers are capable of scaling to large inputs. By scaling, we mean that these codes are not prone to out-of-memory errors when training on GPU accelerators. Indeed, previous literature has described memory problems when using normalizing flows as their invertibility requires the latent code to maintain the same dimensionality as the input [12].

Refer to caption — Figure 1: Our package InvertibleNetworks.jl provides memory frugal implementations of normalizing flows. Here, we compare our implementation of GLOW with an equivalent implementation in a PyTorch package. Using a 40GB A100 GPU, the PyTorch package can not train on image sizes larger than 480x480, while our package can handle sizes larger than 1024x1024.

In Figure 1, we show the relation between input size and the memory required for a gradient calculation in a PyTorch normalizing flow package (normflows [10]) as compared to our package. The two tests were run with identical normalizing flow architectures. We note that the PyTorch implementation quickly increases the memory load and throws an out of memory error on the 40GB A100 GPU at the spatial image size of 480x480 while our InvertibleNetworks.jl implementation still has not run out of memory at spatial size 1024x1024. Note that this is in the context of a typical learning routine, so the images include 3 channels (RGB) and we used a batchsize of 8.

Since traditional normalizing flow architectures need to be invertible these might be less expressive as compared to non-invertible counterparts. In order to increase their expressiveness, practitioners stack many invertible layers to increase the overall expressive power. Increasing the depth of a neural network would in most cases increase the memory consumption of the network but in this case since normalizing flows are invertible, the memory consumption does not increase. Our package displays this phenomena as shown in Figure 2 while the PyTorch (normflows) package that has been implemented with automatic differentiation does not display this constant memory phenomena.

4 Ease of use

Although the normalizing flow layers gradients are hand-written, the package is fully compatibly with ChainRules [13] in order to integrate with automatic differentiation frameworks in Julia such as Zygote [14]. This integration allows users to add arbitrary neural networks which will be differentiated by automatic differentiation while the memory bottleneck created by normalizing flow gradients will be dealt with InvertibleNetworks.jl. The typical use case for this combination is the summary networks used in amortized variational inference such as BayesFlow [15] which has been implemented in our package.

All implemented layers are tested for invertibility and correctness of their gradients with continuous integration testing via GitHub actions. There are many example for layers, networks and application workflows allowing new users to quickly build networks for a variety of applications. The ease of use is demonstrated by the publications that made use of the package.

Many publications have used InvertibleNetworks.jl for diverse applications including: change point detection, [16], acoustic data denoising [17], seismic imaging [18, 19, 20, 21, 22, 23], fluid flow dynamics [24], medical imaging [25, 26, 27, 28] and monitoring CO2 for combating climate change [29].

5 Future work

The neural network primitives (convolutions, non-linearities, pooling etc) are implemented in NNlib.jl abstractions thus support for AMD, Intel and Apple GPU can be trivially extended. Also, while our package currently can handle 3D inputs and has been used on large volume-based medical imaging [30] there are interesting avenues of research regarding the "channel explosion" seen in invertible down and upsampling used in invertible networks [31].

References

[1] Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
[2] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
[3] Jeff Bezanson, Stefan Karpinski, Viral B Shah, and Alan Edelman. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145, 2012.
[4] Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
[5] Alfred Haar. Zur theorie der orthogonalen funktionensysteme. Georg-August-Universitat, Gottingen., 1909.
[6] Jakob Kruse, Gianluca Detommaso, Ullrich Köthe, and Robert Scheichl. Hint: Hierarchical invertible neural transport for density estimation and bayesian inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8191–8199, 2021.
[7] Keegan Lensink, Bas Peters, and Eldad Haber. Fully hyperbolic convolutional neural networks. Research in the Mathematical Sciences, 9(4):60, 2022.
[8] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
[9] Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. nflows: normalizing flows in PyTorch, November 2020.
[10] Vincent Stimper, David Liu, Andrew Campbell, Vincent Berenz, Lukas Ryll, Bernhard Schölkopf, and José Miguel Hernández-Lobato. normflows: A pytorch package for normalizing flows. arXiv preprint arXiv:2302.12014, 2023.
[11] Lynton Ardizzone, Till Bungert, Felix Draxler, Ullrich Köthe, Jakob Kruse, Robert Schmier, and Peter Sorrenson. Framework for Easily Invertible Architectures (FrEIA), 2018-2022.
[12] AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, and Ivan Dokmanić. Conditional injective flows for bayesian imaging. IEEE Transactions on Computational Imaging, 9:224–237, 2023.
[13] Frames White, Michael Abbott, Miha Zgubic, Jarrett Revels, Seth Axen, Alex Arslan, Simeon Schaub, Nick Robinson, Yingbo Ma, Sam, Gaurav Dhingra, Will Tebbutt, David Widmann, Niklas Heim, Niklas Schmitz, Christopher Rackauckas, Carlo Lucibello, Keno Fischer, Rainer Heintzmann, frankschae, Andreas Noack, Alex Robson, cossio, Jerry Ling, mattBrzezinski, Rory Finnegan, Andrei Zhabinski, and Daniel Wennberg. Juliadiff/chainrules.jl: v1.58.0, November 2023.
[14] Mike Innes, Alan Edelman, Keno Fischer, Chris Rackauckas, Elliot Saba, Viral B Shah, and Will Tebbutt. A differentiable programming system to bridge machine learning and scientific computing. arXiv preprint arXiv:1907.07587, 2019.
[15] Stefan T Radev, Ulf K Mertens, Andreas Voss, Lynton Ardizzone, and Ullrich Köthe. Bayesflow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems, 33(4):1452–1466, 2020.
[16] Bas Peters. Point-to-set distance functions for output-constrained neural networks. Journal of Applied & Numerical Optimization, 4(2), 2022.
[17] Rajiv Kumar, Maria Kotsi, Ali Siahkoohi, and Alison Malcolm. Enabling uncertainty quantification for seismic data preprocessing using normalizing flows (nf)—an interpolation example. In First International Meeting for Applied Geoscience & Energy, pages 1515–1519. Society of Exploration Geophysicists, 2021.
[18] Gabrio Rizzuti, Ali Siahkoohi, Philipp A Witte, and Felix J Herrmann. Parameterizing uncertainty by deep invertible networks: An application to reservoir characterization. In SEG International Exposition and Annual Meeting, page D031S057R006. SEG, 2020.
[19] Ali Siahkoohi, Gabrio Rizzuti, Mathias Louboutin, Philipp A Witte, and Felix J Herrmann. Preconditioned training of normalizing flows for variational inference in inverse problems. arXiv preprint arXiv:2101.03709, 2021.
[20] Ali Siahkoohi, Rafael Orozco, Gabrio Rizzuti, and Felix J Herrmann. Wave-equation-based inversion with amortized variational bayesian inference. arXiv preprint arXiv:2203.15881, 2022.
[21] Ali Siahkoohi, Gabrio Rizzuti, Rafael Orozco, and Felix J Herrmann. Reliable amortized variational inference with physics-based latent distribution correction. Geophysics, 88(3):R297–R322, 2023.
[22] Mathias Louboutin, Ziyi Yin, Rafael Orozco, Thomas J Grady, Ali Siahkoohi, Gabrio Rizzuti, Philipp A Witte, Olav Møyner, Gerard J Gorman, and Felix J Herrmann. Learned multiphysics inversion with differentiable programming and machine learning. The Leading Edge, 42(7):474–486, 2023.
[23] Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, and Richard G Baraniuk. Self-consuming generative models go mad. arXiv preprint arXiv:2307.01850, 2023.
[24] Ziyi Yin, Rafael Orozco, Mathias Louboutin, and Felix J Herrmann. Solving multiphysics-based inverse problems with learned surrogates and constraints. Advanced Modeling and Simulation in Engineering Sciences, 10(1):14, 2023.
[25] Rafael Orozco, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix J Herrmann. Adjoint operators enable fast and amortized machine learning based bayesian uncertainty quantification. In Medical Imaging 2023: Image Processing, volume 12464, pages 357–367. SPIE, 2023.
[26] Rafael Orozco, Mathias Louboutin, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix Herrmann. Amortized normalizing flows for transcranial ultrasound with uncertainty quantification. arXiv preprint arXiv:2303.03478, 2023.
[27] Rafael Orozco, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix Johan Herrmann. Photoacoustic imaging with conditional priors from normalizing flows. In NeurIPS 2021 Workshop on Deep Learning and Inverse Problems, 2021.
[28] Rafael Orozco, Ali Siahkoohi, Mathias Louboutin, and Felix J Herrmann. Refining amortized posterior approximations using gradient-based summary statistics. arXiv preprint arXiv:2305.08733, 2023.
[29] Abhinav Prakash Gahlot, Huseyin Tuna Erdinc, Rafael Orozco, Ziyi Yin, and Felix J Herrmann. Inference of co2 flow patterns–a feasibility study. arXiv preprint arXiv:2311.00290, 2023.
[30] Rafael Orozco, Mathias Louboutin, and Felix J Herrmann. Memory efficient invertible neural networks for 3d photoacoustic imaging. arXiv preprint arXiv:2204.11850, 2022.
[31] Bas Peters, Eldad Haber, and Keegan Lensink. Symmetric block-low-rank layers for fully reversible multilevel neural networks. arXiv preprint arXiv:1912.12137, 2019.