Search | arXiv e-print repository

Taming Score-Based Diffusion Priors for Infinite-Dimensional Nonlinear Inverse Problems

Authors: Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier, Knut Solna, Maarten V. de Hoop

Abstract: This work introduces a sampling method capable of solving Bayesian inverse problems in function space. It does not assume the log-concavity of the likelihood, meaning that it is compatible with nonlinear inverse problems. The method leverages the recently defined infinite-dimensional score-based diffusion models as a learning-based prior, while enabling provable posterior sampling through a Langev… ▽ More This work introduces a sampling method capable of solving Bayesian inverse problems in function space. It does not assume the log-concavity of the likelihood, meaning that it is compatible with nonlinear inverse problems. The method leverages the recently defined infinite-dimensional score-based diffusion models as a learning-based prior, while enabling provable posterior sampling through a Langevin-type MCMC algorithm defined on function spaces. A novel convergence analysis is conducted, inspired by the fixed-point methods established for traditional regularization-by-denoising algorithms and compatible with weighted annealing. The obtained convergence bound explicitly depends on the approximation error of the score; a well-approximated score is essential to obtain a well-approximated posterior. Stylized and PDE-based examples are provided, demonstrating the validity of our convergence analysis. We conclude by presenting a discussion of the method's challenges related to learning the score and computational complexity. △ Less

Submitted 24 May, 2024; originally announced May 2024.

MSC Class: 62F15; 65N21; 68Q32; 60Hxx; 60Jxx; 65C05; 82C31

arXiv:2405.15643 [pdf, other]

Reducing the cost of posterior sampling in linear inverse problems via task-dependent score learning

Authors: Fabian Schneider, Duc-Lam Duong, Matti Lassas, Maarten V. de Hoop, Tapio Helin

Abstract: Score-based diffusion models (SDMs) offer a flexible approach to sample from the posterior distribution in a variety of Bayesian inverse problems. In the literature, the prior score is utilized to sample from the posterior by different methods that require multiple evaluations of the forward map** in order to generate a single posterior sample. These methods are often designed with the objective… ▽ More Score-based diffusion models (SDMs) offer a flexible approach to sample from the posterior distribution in a variety of Bayesian inverse problems. In the literature, the prior score is utilized to sample from the posterior by different methods that require multiple evaluations of the forward map** in order to generate a single posterior sample. These methods are often designed with the objective of enabling the direct use of the unconditional prior score and, therefore, task-independent training. In this paper, we focus on linear inverse problems, when evaluation of the forward map** is computationally expensive and frequent posterior sampling is required for new measurement data, such as in medical imaging. We demonstrate that the evaluation of the forward map** can be entirely bypassed during posterior sample generation. Instead, without introducing any error, the computational effort can be shifted to an offline task of training the score of a specific diffusion-like random process. In particular, the training is task-dependent requiring information about the forward map** but not about the measurement data. It is shown that the conditional score corresponding to the posterior can be obtained from the auxiliary score by suitable affine transformations. We prove that this observation generalizes to the framework of infinite-dimensional diffusion models introduced recently and provide numerical analysis of the method. Moreover, we validate our findings with numerical experiments. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 23 pages, 2 figues

MSC Class: 62F15; 65N21; 68Q32; 60Hxx; 60Jxx

arXiv:2404.09101 [pdf, ps, other]

Mixture of Experts Soften the Curse of Dimensionality in Operator Learning

Authors: Anastasis Kratsios, Takashi Furuya, Jose Antonio Lara Benitez, Matti Lassas, Maarten de Hoop

Abstract: In this paper, we construct a mixture of neural operators (MoNOs) between function spaces whose complexity is distributed over a network of expert neural operators (NOs), with each NO satisfying parameter scaling restrictions. Our main result is a \textit{distributed} universal approximation theorem guaranteeing that any Lipschitz non-linear operator between $L^2([0,1]^d)$ spaces can be approximat… ▽ More In this paper, we construct a mixture of neural operators (MoNOs) between function spaces whose complexity is distributed over a network of expert neural operators (NOs), with each NO satisfying parameter scaling restrictions. Our main result is a \textit{distributed} universal approximation theorem guaranteeing that any Lipschitz non-linear operator between $L^2([0,1]^d)$ spaces can be approximated uniformly over the Sobolev unit ball therein, to any given $\varepsilon>0$ accuracy, by an MoNO while satisfying the constraint that: each expert NO has a depth, width, and rank of $\mathcal{O}(\varepsilon^{-1})$. Naturally, our result implies that the required number of experts must be large, however, each NO is guaranteed to be small enough to be loadable into the active memory of most computers for reasonable accuracies $\varepsilon$. During our analysis, we also obtain new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators uniformly on compact subsets of $L^2([0,1]^d)$. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2310.00545 [pdf, other]

Implicit Neural Representations and the Algebra of Complex Wavelets

Authors: T. Mitchell Roddenberry, Vishwanath Saragadam, Maarten V. de Hoop, Richard G. Baraniuk

Abstract: Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processi… ▽ More Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processing and machine learning approaches that were not previously possible. Although INRs using sinusoidal activation functions have been studied in terms of Fourier theory, recent works have shown the advantage of using wavelets instead of sinusoids as activation functions, due to their ability to simultaneously localize in both frequency and space. In this work, we approach such INRs and demonstrate how they resolve high-frequency features of signals from coarse approximations done in the first layer of the MLP. This leads to multiple prescriptions for the design of INR architectures, including the use of complex wavelets, decoupling of low and band-pass approximations, and initialization schemes based on the singularities of the desired signal. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 10 pages, 6 figures. 2 appendix pages, 1 appendix figure

arXiv:2307.07572 [pdf, other]

High-Rate Phase Association with Travel Time Neural Fields

Authors: Cheng Shi, Maarten V. de Hoop, Ivan Dokmanić

Abstract: Our understanding of regional seismicity from multi-station seismograms relies on the ability to associate arrival phases with their originating earthquakes. Deep-learning-based phase detection now detects small, high-rate arrivals from seismicity clouds, even at negative magnitudes. This new data could give important insight into earthquake dynamics, but it is presents a challenging association t… ▽ More Our understanding of regional seismicity from multi-station seismograms relies on the ability to associate arrival phases with their originating earthquakes. Deep-learning-based phase detection now detects small, high-rate arrivals from seismicity clouds, even at negative magnitudes. This new data could give important insight into earthquake dynamics, but it is presents a challenging association task. Existing techniques relying on coarsely approximated, fixed wave speed models fail in this unexplored dense regime where the complexity of unknown wave speed cannot be ignored. We introduce Harpa, a high-rate association framework built on deep generative modeling and neural fields. Harpa incorporates wave physics by using optimal transport to compare arrival sequences. It is thus robust to unknown wave speeds and estimates the wave speed model as a by-product of association. Experiments with realistic, complex synthetic models show that Harpa is the first seismic phase association framework which is accurate in the high-rate regime, paving the way for new avenues in exploratory Earth science and improved understanding of seismicity. △ Less

Submitted 26 March, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.03982 [pdf, ps, other]

Globally injective and bijective neural operators

Authors: Takashi Furuya, Michael Puthawala, Matti Lassas, Maarten V. de Hoop

Abstract: Recently there has been great interest in operator learning, where networks learn operators between function spaces from an essentially infinite-dimensional perspective. In this work we present results for when the operators learned by these networks are injective and surjective. As a warmup, we combine prior work in both the finite-dimensional ReLU and operator learning setting by giving sharp co… ▽ More Recently there has been great interest in operator learning, where networks learn operators between function spaces from an essentially infinite-dimensional perspective. In this work we present results for when the operators learned by these networks are injective and surjective. As a warmup, we combine prior work in both the finite-dimensional ReLU and operator learning setting by giving sharp conditions under which ReLU layers with linear neural operators are injective. We then consider the case the case when the activation function is pointwise bijective and obtain sufficient conditions for the layer to be injective. We remark that this question, while trivial in the finite-rank case, is subtler in the infinite-rank case and is proved using tools from Fredholm theory. Next, we prove that our supplied injective neural operators are universal approximators and that their implementation, with finite-rank neural networks, are still injective. This ensures that injectivity is not `lost' in the transcription from analytical operators to their finite-rank implementation with networks. Finally, we conclude with an increase in abstraction and consider general conditions when subnetworks, which may be many layers deep, are injective and surjective and provide an exact inversion from a `linearization.' This section uses general arguments from Fredholm theory and Leray-Schauder degree theory for non-linear integral equations to analyze the map** properties of neural operators in function spaces. These results apply to subnetworks formed from the layers considered in this work, under natural conditions. We believe that our work has applications in Bayesian UQ where injectivity enables likelihood estimation and in inverse problems where surjectivity and injectivity corresponds to existence and uniqueness, respectively. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 39 pages

arXiv:2305.19147 [pdf, other]

Conditional score-based diffusion models for Bayesian inference in infinite dimensions

Authors: Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier, Knut Solna, Maarten V. de Hoop

Abstract: Since their initial introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently, primarily through methods… ▽ More Since their initial introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently, primarily through methods that learn the unconditional score. While this approach is advantageous for some inverse problems, it is mostly heuristic and involves numerous computationally costly forward operator evaluations during posterior sampling. To address these limitations, we propose a theoretically grounded method for sampling from the posterior of infinite-dimensional Bayesian linear inverse problems based on amortized conditional SDMs. In particular, we prove that one of the most successful approaches for estimating the conditional score in finite dimensions - the conditional denoising estimator - can also be applied in infinite dimensions. A significant part of our analysis is dedicated to demonstrating that extending infinite-dimensional SDMs to the conditional setting requires careful consideration, as the conditional score typically blows up for small times, contrarily to the unconditional score. We conclude by presenting stylized and large-scale numerical examples that validate our approach, offer additional insights, and demonstrate that our method enables large-scale, discretization-invariant Bayesian inference. △ Less

Submitted 27 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 (Spotlight)

MSC Class: 62F15; 65N21; 68Q32; 60Hxx; 60Jxx

arXiv:2305.16189 [pdf, other]

Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders

Authors: Ali Siahkoohi, Rudy Morel, Randall Balestriero, Erwan Allys, Grégory Sainton, Taichi Kawamura, Maarten V. de Hoop

Abstract: Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources. Existing methods typically rely on a preselected window size that det… ▽ More Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multi-scale sources. To address this issue, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different timescales and (2) independently sample scattering spectra representations associated with each cluster. As the final stage, using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering spectra representation space, aiming to separate sources in the time domain. When applied to the entire seismic dataset recorded during the NASA InSight mission on Mars, containing sources varying greatly in timescale, our multi-scale nested approach proves to be a powerful tool for disentangling such different sources, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena. △ Less

Submitted 19 February, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2304.12231 [pdf, other]

An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning

Authors: Anastasis Kratsios, Chong Liu, Matti Lassas, Maarten V. de Hoop, Ivan Dokmanić

Abstract: Motivated by the develo** mathematics of deep learning, we build universal functions approximators of continuous maps between arbitrary Polish metric spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between Euclidean spaces as building blocks. Earlier results assume that the target space $\mathcal{Y}$ is a topological vector space. We overcome this limitation by ``randomization'… ▽ More Motivated by the develo** mathematics of deep learning, we build universal functions approximators of continuous maps between arbitrary Polish metric spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between Euclidean spaces as building blocks. Earlier results assume that the target space $\mathcal{Y}$ is a topological vector space. We overcome this limitation by ``randomization'': our approximators output discrete probability measures over $\mathcal{Y}$. When $\mathcal{X}$ and $\mathcal{Y}$ are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of $\mathcal{X}$ and $\mathcal{Y}$. For barycentric $\mathcal{Y}$, including Banach spaces, $\mathbb{R}$-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to $\mathcal{Y}$-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning. △ Less

Submitted 24 July, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 14 Figures, 3 Tables, 78 Pages (Main 40, Proofs 26, Acknowledgments and References 12)

MSC Class: 41A65; 68T07; 60L50; 65N21; 46T99

arXiv:2301.11981 [pdf, other]

Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data

Authors: Ali Siahkoohi, Rudy Morel, Maarten V. de Hoop, Erwan Allys, Grégory Sainton, Taichi Kawamura

Abstract: Source separation involves the ill-posed problem of retrieving a set of source signals that have been observed through a mixing operator. Solving this problem requires prior knowledge, which is commonly incorporated by imposing regularity conditions on the source signals, or implicitly learned through supervised or unsupervised methods from existing data. While data-driven methods have shown great… ▽ More Source separation involves the ill-posed problem of retrieving a set of source signals that have been observed through a mixing operator. Solving this problem requires prior knowledge, which is commonly incorporated by imposing regularity conditions on the source signals, or implicitly learned through supervised or unsupervised methods from existing data. While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space$\unicode{x2014}$an interpretable, low-dimensional representation of stationary processes. We present a real-data example in which we remove transient, thermally-induced microtilts$\unicode{x2014}$known as glitches$\unicode{x2014}$from data recorded by a seismometer during NASA's InSight mission on Mars. Thanks to the wavelet scattering covariances' ability to capture non-Gaussian properties of stochastic processes, we are able to separate glitches using only a few glitch-free data snippets. △ Less

Submitted 31 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2301.11509 [pdf, other]

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

Authors: J. Antonio Lara Benitez, Takashi Furuya, Florian Faucher, Anastasis Kratsios, Xavier Tricoche, Maarten V. de Hoop

Abstract: Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator map** wave speed to so… ▽ More Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator map** wave speed to solution, or boundary values for the Helmholtz equation on a bounded domain. The latter operator is commonly referred to as the ''forward'' operator in the study of inverse problems. Our methodology draws inspiration from transformers and techniques such as stochastic depth. Our experiments reveal certain surprises in the generalization and the relevance of introducing stochastic depth. Our NOs show superior performance as compared with standard NOs, not only for testing within the training distribution but also for out-of-distribution scenarios. To delve into this observation, we offer an in-depth analysis of the Rademacher complexity associated with our modified models and prove an upper bound tied to their stochastic depth that existing NOs do not satisfy. Furthermore, we obtain a novel out-of-distribution risk bound tailored to Gaussian measures on Banach spaces, again relating stochastic depth with the bound. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator. △ Less

Submitted 4 July, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2211.02922 [pdf, other]

Beyond Hawkes: Neural Multi-event Forecasting on Spatio-temporal Point Processes

Authors: Negar Erfanian, Santiago Segarra, Maarten de Hoop

Abstract: Predicting discrete events in time and space has many scientific applications, such as predicting hazardous earthquakes and outbreaks of infectious diseases. History-dependent spatio-temporal Hawkes processes are often used to mathematically model these point events. However, previous approaches have faced numerous challenges, particularly when attempting to forecast one or multiple future events.… ▽ More Predicting discrete events in time and space has many scientific applications, such as predicting hazardous earthquakes and outbreaks of infectious diseases. History-dependent spatio-temporal Hawkes processes are often used to mathematically model these point events. However, previous approaches have faced numerous challenges, particularly when attempting to forecast one or multiple future events. In this work, we propose a new neural architecture for simultaneous multi-event forecasting of spatio-temporal point processes, utilizing transformers, augmented with normalizing flows and probabilistic layers. Our network makes batched predictions of complex history-dependent spatio-temporal distributions of future discrete events, achieving state-of-the-art performance on a variety of benchmark datasets including the South California Earthquakes, Citibike, Covid-19, and Hawkes synthetic pinwheel datasets. More generally, we illustrate how our network can be applied to any dataset of discrete events with associated markers, even when no underlying physics is known. △ Less

Submitted 28 January, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

Comments: Submitted to ICML2023

arXiv:2210.00577 [pdf, other]

Deep Invertible Approximation of Topologically Rich Maps between Manifolds

Authors: Michael Puthawala, Matti Lassas, Ivan Dokmanic, Pekka Pankka, Maarten de Hoop

Abstract: How can we design neural networks that allow for stable universal approximation of maps between topologically interesting manifolds? The answer is with a coordinate projection. Neural networks based on topological data analysis (TDA) use tools such as persistent homology to learn topological signatures of data and stabilize training but may not be universal approximators or have stable inverses. O… ▽ More How can we design neural networks that allow for stable universal approximation of maps between topologically interesting manifolds? The answer is with a coordinate projection. Neural networks based on topological data analysis (TDA) use tools such as persistent homology to learn topological signatures of data and stabilize training but may not be universal approximators or have stable inverses. Other architectures universally approximate data distributions on submanifolds but only when the latter are given by a single chart, making them unable to learn maps that change topology. By exploiting the topological parallels between locally bilipschitz maps, covering spaces, and local homeomorphisms, and by using universal approximation arguments from machine learning, we find that a novel network of the form $\mathcal{T} \circ p \circ \mathcal{E}$, where $\mathcal{E}$ is an injective network, $p$ a fixed coordinate projection, and $\mathcal{T}$ a bijective network, is a universal approximator of local diffeomorphisms between compact smooth submanifolds embedded in $\mathbb{R}^n$. We emphasize the case when the target map changes topology. Further, we find that by constraining the projection $p$, multivalued inversions of our networks can be computed without sacrificing universality. As an application, we show that learning a group invariant function with unknown group action naturally reduces to the question of learning local diffeomorphisms for finite groups. Our theory permits us to recover orbits of the group action. We also outline possible extensions of our architecture to address molecular imaging of molecules with symmetries. Finally, our analysis informs the choice of topologically expressive starting spaces in generative problems. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2204.07664 [pdf, other]

doi 10.1109/TCI.2023.3248949

Conditional Injective Flows for Bayesian Imaging

Authors: AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmanić

Abstract: Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns… ▽ More Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns. Recent variational inference methods based on conditional normalizing flows are a promising alternative to traditional MCMC methods, but they come with drawbacks: excessive memory and compute demands for moderate to high resolution images and underwhelming performance on hard nonlinear problems. In this work, we propose C-Trumpets -- conditional injective flows specifically designed for imaging problems, which greatly diminish these challenges. Injectivity reduces memory footprint and training time while low-dimensional latent space together with architectural innovations like fixed-volume-change layers and skip-connection revnet layers, C-Trumpets outperform regular conditional flow models on a variety of imaging and image restoration tasks, including limited-view CT and nonlinear inverse scattering, with a lower compute and memory budget. C-Trumpets enable fast approximation of point estimates like MMSE or MAP as well as physically-meaningful uncertainty quantification. △ Less

Submitted 3 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 23 pages, 23 figures

Journal ref: IEEE Transactions on Computational Imaging, vol. 9, pp. 224-237, 2023

arXiv:2110.04227 [pdf, other]

Universal Joint Approximation of Manifolds and Densities by Simple Injective Flows

Authors: Michael Puthawala, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

Abstract: We study approximation of probability measures supported on $n$-dimensional manifolds embedded in $\mathbb{R}^m$ by injective flows -- neural networks composed of invertible flows and injective layers. We show that in general, injective flows between $\mathbb{R}^n$ and $\mathbb{R}^m$ universally approximate measures supported on images of extendable embeddings, which are a subset of standard embed… ▽ More We study approximation of probability measures supported on $n$-dimensional manifolds embedded in $\mathbb{R}^m$ by injective flows -- neural networks composed of invertible flows and injective layers. We show that in general, injective flows between $\mathbb{R}^n$ and $\mathbb{R}^m$ universally approximate measures supported on images of extendable embeddings, which are a subset of standard embeddings: when the embedding dimension m is small, topological obstructions may preclude certain manifolds as admissible targets. When the embedding dimension is sufficiently large, $m \ge 3n+1$, we use an argument from algebraic topology known as the clean trick to prove that the topological obstructions vanish and injective flows universally approximate any differentiable embedding. Along the way we show that the studied injective flows admit efficient projections on the range, and that their optimality can be established "in reverse," resolving a conjecture made in Brehmer and Cranmer 2020. △ Less

Submitted 27 June, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 26 pages, 5 figures

arXiv:2108.12515 [pdf, other]

doi 10.1137/21M1442942

Convergence Rates for Learning Linear Operators from Noisy Data

Authors: Maarten V. de Hoop, Nikola B. Kovachki, Nicholas H. Nelsen, Andrew M. Stuart

Abstract: This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The training data comprises pairs of random input vectors in a Hilbert space and their noisy images under an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues give… ▽ More This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The training data comprises pairs of random input vectors in a Hilbert space and their noisy images under an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues given the data. Adopting a Bayesian approach, the theoretical analysis establishes posterior contraction rates in the infinite data limit with Gaussian priors that are not directly linked to the forward map of the inverse problem. The main results also include learning-theoretic generalization error guarantees for a wide range of distribution shifts. These convergence rates quantify the effects of data smoothness and true eigenvalue decay or growth, for compact or unbounded operators, respectively, on sample complexity. Numerical evidence supports the theory in diagonal and non-diagonal settings. △ Less

Submitted 2 November, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: To appear in SIAM/ASA Journal on Uncertainty Quantification (JUQ); 34 pages, 5 figures, 2 tables

MSC Class: 62G20; 62C10; 68T05; 47A62

Journal ref: SIAM/ASA J. Uncertainty Quantification Vol. 11 No. 2 (2023) pp. 480-513

arXiv:2102.10461 [pdf, other]

Trumpets: Injective Flows for Inference and Inverse Problems

Authors: Konik Kothari, AmirEhsan Khorashadizadeh, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows… ▽ More We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows such as training based on maximum likelihood and a fast, exact inverse of the generator. Since Trumpets are injective and have fast inverses, they can be effectively used for downstream Bayesian inference. To wit, we use Trumpet priors for maximum a posteriori estimation in the context of image reconstruction from compressive measurements, outperforming competitive baselines in terms of reconstruction quality and speed. We then propose an efficient method for posterior characterization and uncertainty quantification with Trumpets by taking advantage of the low-dimensional latent space. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 16 pages

Journal ref: Uncertainty in Artificial Intelligence (UAI 2021)

arXiv:2006.08464 [pdf, other]

Globally Injective ReLU Networks

Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

Abstract: Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and s… ▽ More Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end -- rather than layerwise -- doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks. △ Less

Submitted 8 October, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 48 pages, 18 figures, submitted to JMLR

arXiv:2006.05854 [pdf, other]

Learning the geometry of wave-based imaging

Authors: Konik Kothari, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an inter… ▽ More We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable neural architecture inspired by Fourier integral operators (FIOs) which approximate the wave physics. FIOs model a wide range of imaging modalities, from seismology and radar to Doppler and ultrasound. We focus on learning the geometry of wave propagation captured by FIOs, which is implicit in the data, via a loss based on optimal transport. The proposed FIONet performs significantly better than the usual baselines on a number of imaging inverse problems, especially in out-of-distribution tests. △ Less

Submitted 10 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted as spotlight presentation to NeurIPS '20

arXiv:1912.11090 [pdf, other]

Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

Authors: Maarten V. de Hoop, Matti Lassas, Christopher A. Wong

Abstract: We develop a theoretical analysis for special neural network architectures, termed operator recurrent neural networks, for approximating nonlinear functions whose inputs are linear operators. Such functions commonly arise in solution algorithms for inverse boundary value problems. Traditional neural networks treat input data as vectors, and thus they do not effectively capture the multiplicative s… ▽ More We develop a theoretical analysis for special neural network architectures, termed operator recurrent neural networks, for approximating nonlinear functions whose inputs are linear operators. Such functions commonly arise in solution algorithms for inverse boundary value problems. Traditional neural networks treat input data as vectors, and thus they do not effectively capture the multiplicative structure associated with the linear operators that correspond to the data in such inverse problems. We therefore introduce a new family that resembles a standard neural network architecture, but where the input data acts multiplicatively on vectors. Motivated by compact operators appearing in boundary control and the analysis of inverse boundary value problems for the wave equation, we promote structure and sparsity in selected weight matrices in the network. After describing this architecture, we study its representation properties as well as its approximation properties. We furthermore show that an explicit regularization can be introduced that can be derived from the mathematical analysis of the mentioned inverse problems, and which leads to certain guarantees on the generalization properties. We observe that the sparsity of the weight matrices improves the generalization estimates. Lastly, we discuss how operator recurrent networks can be viewed as a deep learning analogue to deterministic algorithms such as boundary control for reconstructing the unknown wavespeed in the acoustic wave equation from boundary measurements. △ Less

Submitted 3 January, 2022; v1 submitted 23 December, 2019; originally announced December 2019.

Comments: To appear in Mathematical Statistics and Learning

MSC Class: 68T05; 35R30; 62M45

arXiv:1912.06087 [pdf, other]

Attention network forecasts time-to-failure in laboratory shear experiments

Authors: Hope Jasperson, David C. Bolton, Paul Johnson, Robert Guyer, Chris Marone, Maarten V. de Hoop

Abstract: Rocks under stress deform by creep mechanisms that include formation and slip on small-scale internal cracks. Intragranular cracks and slip along grain contacts release energy as elastic waves termed acoustic emissions (AE). AEs are thought to contain predictive information that can be used for fault failure forecasting. Here we present a method using unsupervised classification and an attention n… ▽ More Rocks under stress deform by creep mechanisms that include formation and slip on small-scale internal cracks. Intragranular cracks and slip along grain contacts release energy as elastic waves termed acoustic emissions (AE). AEs are thought to contain predictive information that can be used for fault failure forecasting. Here we present a method using unsupervised classification and an attention network to forecast labquakes using AE waveform features. Our data were generated in a laboratory setting using a biaxial shearing device with granular fault gouge intended to mimic the conditions of tectonic faults. Here we analyzed the temporal evolution of AEs generated throughout several hundred laboratory earthquake cycles. We used a Conscience Self-Organizing Map (CSOM) to perform topologically ordered vector quantization based on waveform properties. The resulting map was used to interactively cluster AEs. We examined the clusters over time to identify those with predictive ability. Finally, we used a variety of LSTM and attention-based networks to test the predictive power of the AE clusters. By tracking cumulative waveform features over the seismic cycle, the network is able to forecast the time-to-failure (TTF) of lab earthquakes. Our results show that analyzing the data to isolate predictive signals and using a more sophisticated network architecture are key to robustly forecasting labquakes. In the future, this method could be applied on tectonic faults monitor earthquakes and augment current early warning systems. △ Less

Submitted 20 September, 2021; v1 submitted 12 December, 2019; originally announced December 2019.

arXiv:1901.10076 [pdf, other]

Learning Schatten--von Neumann Operators

Authors: Puoya Tabaghi, Maarten de Hoop, Ivan Dokmanić

Abstract: We study the learnability of a class of compact operators known as Schatten--von Neumann operators. These operators between infinite-dimensional function spaces play a central role in a variety of applications in learning theory and inverse problems. We address the question of sample complexity of learning Schatten-von Neumann operators and provide an upper bound on the number of measurements requ… ▽ More We study the learnability of a class of compact operators known as Schatten--von Neumann operators. These operators between infinite-dimensional function spaces play a central role in a variety of applications in learning theory and inverse problems. We address the question of sample complexity of learning Schatten-von Neumann operators and provide an upper bound on the number of measurements required for the empirical risk minimizer to generalize with arbitrary precision and probability, as a function of class parameter $p$. Our results give generalization guarantees for regression of infinite-dimensional signals from infinite-dimensional data. Next, we adapt the representer theorem of Abernethy \emph{et al.} to show that empirical risk minimization over an a priori infinite-dimensional, non-compact set, can be converted to a convex finite dimensional optimization problem over a compact set. In summary, the class of $p$-Schatten--von Neumann operators is probably approximately correct (PAC)-learnable via a practical convex program for any $p < \infty$. △ Less

Submitted 22 February, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1805.11718 [pdf, other]

Random mesh projectors for inverse problems

Authors: Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić

Abstract: We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to direct… ▽ More We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the map** from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler map**s from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse. △ Less

Submitted 5 December, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: S. Gupta and K. Kothari contributed equally

arXiv:1609.05502 [pdf, other]

Inverse Problems with Invariant Multiscale Statistics

Authors: Ivan Dokmanić, Joan Bruna, Stéphane Mallat, Maarten de Hoop

Abstract: We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both… ▽ More We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both constraints may be put into effect by linear projections in their respective spaces. We apply the algorithm to super-resolution and tomography and show that it outperforms ad hoc convex regularizers and stably recovers the missing spectrum. △ Less

Submitted 2 December, 2018; v1 submitted 18 September, 2016; originally announced September 2016.

Showing 1–24 of 24 results for author: de Hoop, M