Search | arXiv e-print repository

arXiv:2406.15354 [pdf, other]

Can Specific THz Fields Induce Collective Base-Flip** in DNA? A Stochastic Averaging and Resonant Enhancement Investigation Based on a New Mesoscopic Model

Authors: Wang Sang Koon, Houman Owhadi, Molei Tao, Tomohiro Yanao

Abstract: We study the metastability, internal frequencies, activation mechanism, energy transfer, and the collective base-flip** in a mesoscopic DNA via resonance with specific electric fields. Our new mesoscopic DNA model takes into account not only the issues of helicity and the coupling of an electric field with the base dipole moments, but also includes environmental effects such as fluid viscosity a… ▽ More We study the metastability, internal frequencies, activation mechanism, energy transfer, and the collective base-flip** in a mesoscopic DNA via resonance with specific electric fields. Our new mesoscopic DNA model takes into account not only the issues of helicity and the coupling of an electric field with the base dipole moments, but also includes environmental effects such as fluid viscosity and thermal noise. And all the parameter values are chosen to best represent the typical values for the opening and closing dynamics of a DNA. Our study shows that while the mesocopic DNA is metastable and robust to environmental effects, it is vulnerable to certain frequencies that could be targeted by specific THz fields for triggering its collective base-flip** dynamics and causing large amplitude separation of base pairs. Based on applying Freidlin-Wentzell method of stochastic averaging and the newly developed theory of resonant enhancement to our mesoscopic DNA model, our semi-analytic estimates show that the required fields should be THz fields with frequencies around 0.28 THz and with amplitudes in the order of 450 kV/cm. These estimates compare well with the experimental data of Titova et al., which have demonstrated that they could affect the function of DNA in human skin tissues by THz pulses with frequencies around 0.5 THz and with a peak electric field at 220 kV/cm. Moreover, our estimates also conform to a number of other experimental results which appeared in the last couple years. △ Less

Submitted 18 March, 2024; originally announced June 2024.

Comments: 37 pages, 8 figures

arXiv:2405.13149 [pdf, other]

Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the condition… ▽ More The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $ξ\mid F\circ φ(ξ)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2402.11126 [pdf, other]

Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics

Authors: Michael Penwarden, Houman Owhadi, Robert M. Kirby

Abstract: Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the… ▽ More Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems. Despite the overall success of this collection of methods, it remains incredibly difficult to analyze, benchmark, and generally compare one approach to another. Using Kolmogorov n-widths as a measure of effectiveness of approximating functions, we judiciously apply this metric in the comparison of various multitask PIML architectures. We compute lower accuracy bounds and analyze the model's learned basis functions on various PDE problems. This is the first objective metric for comparing multitask PIML architectures and helps remove uncertainty in model validation from selective sampling and overfitting. We also identify avenues of improvement for model architectures, such as the choice of activation function, which can drastically affect model generalization to "worst-case" scenarios, which is not observed when reporting task-specific errors. We also incorporate this metric into the optimization process through regularization, which improves the models' generalizability over the multitask PDE problem. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08077 [pdf, other]

Diffeomorphic Measure Matching with Kernels for Generative Modeling

Authors: Biraj Pandey, Bamdad Hosseini, Pau Batlle, Houman Owhadi

Abstract: This article presents a general framework for the transport of probability measures towards minimum divergence generative modeling and sampling using ordinary differential equations (ODEs) and Reproducing Kernel Hilbert Spaces (RKHSs), inspired by ideas from diffeomorphic matching and image registration. A theoretical analysis of the proposed method is presented, giving a priori error bounds in te… ▽ More This article presents a general framework for the transport of probability measures towards minimum divergence generative modeling and sampling using ordinary differential equations (ODEs) and Reproducing Kernel Hilbert Spaces (RKHSs), inspired by ideas from diffeomorphic matching and image registration. A theoretical analysis of the proposed method is presented, giving a priori error bounds in terms of the complexity of the model, the number of samples in the training set, and model misspecification. An extensive suite of numerical experiments further highlights the properties, strengths, and weaknesses of the method and extends its applicability to other tasks, such as conditional simulation and inference. △ Less

Submitted 12 February, 2024; originally announced February 2024.

MSC Class: 35Q68 49Q22 62F15 68T07 62R07

arXiv:2311.17007 [pdf, other]

Computational Hypergraph Discovery, a Gaussian Process framework for connecting the dots

Authors: Théo Bourdais, Pau Batlle, Xian** Yang, Ricardo Baptista, Nicolas Rouquette, Houman Owhadi

Abstract: Most scientific challenges can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given p… ▽ More Most scientific challenges can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. While most Computational Science and Engineering and Scientific Machine Learning challenges can be framed as Type 1 and Type 2 problems, many scientific problems can only be categorized as Type 3. Despite their prevalence, these Type 3 challenges have been largely overlooked due to their inherent complexity. Although Gaussian Process (GP) methods are sometimes perceived as well-founded but old technology limited to Type 1 curve fitting, their scope has recently been expanded to Type 2 problems. In this paper, we introduce an interpretable GP framework for Type 3 problems, targeting the data-driven discovery and completion of computational hypergraphs. Our approach is based on a kernel generalization of Row Echelon Form reduction from linear systems to nonlinear ones and variance-based analysis. Here, variables are linked via GPs and those contributing to the highest data variance unveil the hypergraph's structure. We illustrate the scope and efficiency of the proposed approach with applications to (algebraic) equation discovery, network discovery (gene pathways, chemical, and mechanical) and raw data analysis. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: The code for the algorithm introduced in this paper and its application to various examples are available for download (and as as an installable python library/package) at https://github.com/TheoBourdais/ComputationalHypergraphDiscovery

MSC Class: 62A09; 62H22; 65S05; 90C35; 94C15; 46E22; 62J02; 15A83; 62D20; 68R10

arXiv:2311.12624 [pdf, ps, other]

doi 10.13140/RG.2.2.36344.01285

doi 10.1016/j.physd.2024.134153

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

Authors: Boumediene Hamzi, Marcus Hutter, Houman Owhadi

Abstract: Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commona… ▽ More Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This approach aligns naturally with the MDL principle, offering a more robust theoretical basis than the existing reliance on cross-validation. The study reveals that deriving Sparse Kernel Flows does not require a statistical approach; instead, one can directly engage with code-lengths and complexities, concepts central to AIT. Thereby, this approach opens the door to reformulating algorithms in machine learning using tools from AIT, with the aim of providing them a more solid theoretical foundation. △ Less

Submitted 10 April, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: An earlier version of this paper appeared at https://www.researchgate.net/publication/371875631_A_note_on_learning_kernels_from_data_from_an_Algorithmic_Information_Theoretic_point_of_view. arXiv admin note: text overlap with arXiv:2111.13037, arXiv:2007.05074

Journal ref: Physica D: Nonlinear Phenomena, 2024, 134153

arXiv:2310.02461 [pdf, other]

Optimization-based frequentist confidence intervals for functionals in constrained inverse problems: Resolving the Burrus conjecture

Authors: Pau Batlle, Pratik Patil, Michael Stanley, Houman Owhadi, Mikael Kuusela

Abstract: We present an optimization-based framework to construct confidence intervals for functionals in constrained inverse problems, ensuring valid one-at-a-time frequentist coverage guarantees. Our approach builds upon the now-called strict bounds intervals, originally pioneered by Burrus (1965) and Rust and Burrus (1972), which offer ways to directly incorporate any side information about the parameter… ▽ More We present an optimization-based framework to construct confidence intervals for functionals in constrained inverse problems, ensuring valid one-at-a-time frequentist coverage guarantees. Our approach builds upon the now-called strict bounds intervals, originally pioneered by Burrus (1965) and Rust and Burrus (1972), which offer ways to directly incorporate any side information about the parameters during inference without introducing external biases. This family of methods allows for uncertainty quantification in ill-posed inverse problems without needing to select a regularizing prior. By tying optimization-based intervals to an inversion of a constrained likelihood ratio test, we translate interval coverage guarantees into type I error control and characterize the resulting interval via solutions to optimization problems. Along the way, we refute the Burrus conjecture, which posited that, for possibly rank-deficient linear Gaussian models with positivity constraints, a correction based on the quantile of the chi-squared distribution with one degree of freedom suffices to shorten intervals while maintaining frequentist coverage guarantees. Our framework provides a novel approach to analyzing the conjecture, and we construct a counterexample employing a stochastic dominance argument, which we also use to disprove a general form of the conjecture. We illustrate our framework with several numerical examples and provide directions for extensions beyond the Rust-Burrus method for nonlinear, non-Gaussian settings with general constraints. △ Less

Submitted 16 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: 54 pages, V3: minor changes in related work and discussion

arXiv:2307.11648 [pdf, other]

Sparse Cholesky factorization by greedy conditional selection

Authors: Stephen Huan, Joseph Guinness, Matthias Katzfuss, Houman Owhadi, Florian Schäfer

Abstract: Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct… ▽ More Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct the sparsity pattern. In this work, we instead construct the sparsity pattern by leveraging a greedy selection algorithm that maximizes mutual information with target points, conditional on all points previously selected. For selecting $k$ points out of $N$, the naive time complexity is $\mathcal{O}(N k^4)$, but by maintaining a partial Cholesky factor we reduce this to $\mathcal{O}(N k^2)$. Furthermore, for multiple ($m$) targets we achieve a time complexity of $\mathcal{O}(N k^2 + N m^2 + m^3)$, which is maintained in the setting of aggregated Cholesky factorization where a selected point need not condition every target. We apply the selection algorithm to image classification and recovery of sparse Cholesky factors. By minimizing Kullback-Leibler divergence, we apply the algorithm to Cholesky factorization, Gaussian process regression, and preconditioning with the conjugate gradient, improving over $k$-nearest neighbors selection. △ Less

Submitted 21 July, 2023; originally announced July 2023.

MSC Class: 65F08; 65F55; 62-08

arXiv:2306.00307 [pdf, other]

A Mini-Batch Method for Solving Nonlinear PDEs with Gaussian Processes

Authors: Xian** Yang, Houman Owhadi

Abstract: Gaussian processes (GPs) based methods for solving partial differential equations (PDEs) demonstrate great promise by bridging the gap between the theoretical rigor of traditional numerical algorithms and the flexible design of machine learning solvers. The main bottleneck of GP methods lies in the inversion of a covariance matrix, whose cost grows cubically concerning the size of samples. Drawing… ▽ More Gaussian processes (GPs) based methods for solving partial differential equations (PDEs) demonstrate great promise by bridging the gap between the theoretical rigor of traditional numerical algorithms and the flexible design of machine learning solvers. The main bottleneck of GP methods lies in the inversion of a covariance matrix, whose cost grows cubically concerning the size of samples. Drawing inspiration from neural networks, we propose a mini-batch algorithm combined with GPs to solve nonlinear PDEs. A naive deployment of a stochastic gradient descent method for solving PDEs with GPs is challenging, as the objective function in the requisite minimization problem cannot be depicted as the expectation of a finite-dimensional random function. To address this issue, we employ a mini-batch method to the corresponding infinite-dimensional minimization problem over function spaces. The algorithm takes a mini-batch of samples at each step to update the GP model. Thus, the computational cost is allotted to each iteration. Using stability analysis and convexity arguments, we show that the mini-batch method steadily reduces a natural measure of errors towards zero at the rate of $O(1/K+1/M)$, where $K$ is the number of iterations and $M$ is the batch size. △ Less

Submitted 1 February, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: 19 pages, 3 figures

MSC Class: 68W20; 65M70; 60G15

arXiv:2305.04962 [pdf, other]

Error Analysis of Kernel/GP Methods for Nonlinear and Parametric PDEs

Authors: Pau Batlle, Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PD… ▽ More We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PDE between corresponding Sobolev spaces. The proof is articulated around Sobolev norm error estimates for kernel interpolants and relies on the minimizing norm property of the solution. The error estimates demonstrate dimension-benign convergence rates if the solution space of the PDE is smooth enough. We illustrate these points with applications to high-dimensional nonlinear elliptic PDEs and parametric PDEs. Although some recent machine learning methods have been presented as breaking the curse of dimensionality in solving high-dimensional PDEs, our analysis suggests a more nuanced picture: there is a trade-off between the regularity of the solution and the presence of the curse of dimensionality. Therefore, our results are in line with the understanding that the curse is absent when the solution is regular enough. △ Less

Submitted 8 May, 2023; originally announced May 2023.

MSC Class: 60G15; 65M75; 65N75; 65N35; 47B34; 41A15; 35R30; 34B15

arXiv:2304.13202 [pdf, other]

Kernel Methods are Competitive for Operator Learning

Authors: Pau Batlle, Matthieu Darcy, Bamdad Hosseini, Houman Owhadi

Abstract: We present a general kernel-based framework for learning operators between Banach spaces along with a priori error analysis and comprehensive numerical comparisons with popular neural net (NN) approaches such as Deep Operator Net (DeepONet) [Lu et al.] and Fourier Neural Operator (FNO) [Li et al.]. We consider the setting where the input/output spaces of target operator… ▽ More We present a general kernel-based framework for learning operators between Banach spaces along with a priori error analysis and comprehensive numerical comparisons with popular neural net (NN) approaches such as Deep Operator Net (DeepONet) [Lu et al.] and Fourier Neural Operator (FNO) [Li et al.]. We consider the setting where the input/output spaces of target operator $\mathcal{G}^\dagger\,:\, \mathcal{U}\to \mathcal{V}$ are reproducing kernel Hilbert spaces (RKHS), the data comes in the form of partial observations $φ(u_i), \varphi(v_i)$ of input/output functions $v_i=\mathcal{G}^\dagger(u_i)$ ($i=1,\ldots,N$), and the measurement operators $φ\,:\, \mathcal{U}\to \mathbb{R}^n$ and $\varphi\,:\, \mathcal{V} \to \mathbb{R}^m$ are linear. Writing $ψ\,:\, \mathbb{R}^n \to \mathcal{U}$ and $χ\,:\, \mathbb{R}^m \to \mathcal{V}$ for the optimal recovery maps associated with $φ$ and $\varphi$, we approximate $\mathcal{G}^\dagger$ with $\bar{\mathcal{G}}=χ\circ \bar{f} \circ φ$ where $\bar{f}$ is an optimal recovery approximation of $f^\dagger:=\varphi \circ \mathcal{G}^\dagger \circ ψ\,:\,\mathbb{R}^n \to \mathbb{R}^m$. We show that, even when using vanilla kernels (e.g., linear or Matérn), our approach is competitive in terms of cost-accuracy trade-off and either matches or beats the performance of NN methods on a majority of benchmarks. Additionally, our framework offers several advantages inherited from kernel methods: simplicity, interpretability, convergence guarantees, a priori error estimates, and Bayesian uncertainty quantification. As such, it can serve as a natural benchmark for operator learning. △ Less

Submitted 8 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 35 pages, 10 figures

arXiv:2304.01294 [pdf, other]

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Authors: Yifan Chen, Houman Owhadi, Florian Schäfer

Abstract: In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general non… ▽ More In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its \textit{partial derivatives}, a result of the PDE constraint and for which fast algorithms are scarce. The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute $ε$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/ε))$ in space and $O(N\log^{2d}(N/ε))$ in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations. △ Less

Submitted 8 March, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: typo corrected

MSC Class: 65F30; 60G15; 65N75; 65M75; 65F50; 68W40

arXiv:2301.10321 [pdf, other]

Learning Dynamical Systems from Data: A Simple Cross-Validation Perspective, Part V: Sparse Kernel Flows for 132 Chaotic Dynamical Systems

Authors: Lu Yang, Xiuwen Sun, Boumediene Hamzi, Houman Owhadi, Naiming Xie

Abstract: Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a data-adapted kernel which can be learned by using Kernel Flows. The method of Kernel Flows is a trainable machine learning method that lea… ▽ More Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a data-adapted kernel which can be learned by using Kernel Flows. The method of Kernel Flows is a trainable machine learning method that learns the optimal parameters of a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used. The objective function could be a short-term prediction or some other objective for other variants of Kernel Flows). However, this method is limited by the choice of the base kernel. In this paper, we introduce the method of \emph{Sparse Kernel Flows } in order to learn the ``best'' kernel by starting from a large dictionary of kernels. It is based on sparsifying a kernel that is a linear combination of elemental kernels. We apply this approach to a library of 132 chaotic systems. △ Less

Submitted 27 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2212.07426 [pdf, other]

doi 10.1016/j.physd.2023.133713

Multiclass classification utilising an estimated algorithmic probability prior

Authors: Kamaludin Dingle, Pau Batlle, Houman Owhadi

Abstract: Methods of pattern recognition and machine learning are applied extensively in science, technology, and society. Hence, any advances in related theory may translate into large-scale impact. Here we explore how algorithmic information theory, especially algorithmic probability, may aid in a machine learning task. We study a multiclass supervised classification problem, namely learning the RNA molec… ▽ More Methods of pattern recognition and machine learning are applied extensively in science, technology, and society. Hence, any advances in related theory may translate into large-scale impact. Here we explore how algorithmic information theory, especially algorithmic probability, may aid in a machine learning task. We study a multiclass supervised classification problem, namely learning the RNA molecule sequence-to-shape map, where the different possible shapes are taken to be the classes. The primary motivation for this work is a proof of concept example, where a concrete, well-motivated machine learning task can be aided by approximations to algorithmic probability. Our approach is based on directly estimating the class (i.e., shape) probabilities from shape complexities, and using the estimated probabilities as a prior in a Gaussian process learning problem. Naturally, with a large amount of training data, the prior has no significant influence on classification accuracy, but in the very small training data regime, we show that using the prior can substantially improve classification accuracy. To our knowledge, this work is one of the first to demonstrate how algorithmic probability can aid in a concrete, real-world, machine learning problem. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2209.12086 [pdf, other]

doi 10.1016/j.physd.2022.133583

One-Shot Learning of Stochastic Differential Equations with Data Adapted Kernels

Authors: Matthieu Darcy, Boumediene Hamzi, Giulia Livieri, Houman Owhadi, Peyman Tavallali

Abstract: We consider the problem of learning Stochastic Differential Equations of the form $dX_t = f(X_t)dt+σ(X_t)dW_t $ from one sample trajectory. This problem is more challenging than learning deterministic dynamical systems because one sample trajectory only provides indirect information on the unknown functions $f$, $σ$, and stochastic process $dW_t$ representing the drift, the diffusion, and the stoc… ▽ More We consider the problem of learning Stochastic Differential Equations of the form $dX_t = f(X_t)dt+σ(X_t)dW_t $ from one sample trajectory. This problem is more challenging than learning deterministic dynamical systems because one sample trajectory only provides indirect information on the unknown functions $f$, $σ$, and stochastic process $dW_t$ representing the drift, the diffusion, and the stochastic forcing terms, respectively. We propose a method that combines Computational Graph Completion and data adapted kernels learned via a new variant of cross validation. Our approach can be decomposed as follows: (1) Represent the time-increment map $X_t \rightarrow X_{t+dt}$ as a Computational Graph in which $f$, $σ$ and $dW_t$ appear as unknown functions and random variables. (2) Complete the graph (approximate unknown functions and random variables) via Maximum a Posteriori Estimation (given the data) with Gaussian Process (GP) priors on the unknown functions. (3) Learn the covariance functions (kernels) of the GP priors from data with randomized cross-validation. Numerical experiments illustrate the efficacy, robustness, and scope of our method. △ Less

Submitted 1 December, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

Comments: 22 pages, 21 figures

arXiv:2209.10707 [pdf, other]

Gaussian Process Hydrodynamics

Authors: Houman Owhadi

Abstract: We present a Gaussian Process (GP) approach (Gaussian Process Hydrodynamics, GPH) for approximating the solution of the Euler and Navier-Stokes equations. As in Smoothed Particle Hydrodynamics (SPH), GPH is a Lagrangian particle-based approach involving the tracking of a finite number of particles transported by the flow. However, these particles do not represent mollified particles of matter but… ▽ More We present a Gaussian Process (GP) approach (Gaussian Process Hydrodynamics, GPH) for approximating the solution of the Euler and Navier-Stokes equations. As in Smoothed Particle Hydrodynamics (SPH), GPH is a Lagrangian particle-based approach involving the tracking of a finite number of particles transported by the flow. However, these particles do not represent mollified particles of matter but carry discrete/partial information about the continuous flow. Closure is achieved by placing a divergence-free GP prior $ξ$ on the velocity field and conditioning on vorticity at particle locations. Known physics (e.g., the Richardson cascade and velocity-increments power laws) is incorporated into the GP prior through physics-informed additive kernels. This approach allows us to coarse-grain turbulence in a statistical manner rather than a deterministic one. By enforcing incompressibility and fluid/structure boundary conditions through the selection of the kernel, GPH requires much fewer particles than SPH. Since GPH has a natural probabilistic interpretation, numerical results come with uncertainty estimates enabling their incorporation into a UQ pipeline and the adding/removing of particles (quantas of information) in an adapted manner. The proposed approach is amenable to analysis, it inherits the complexity of state-of-the-art solvers for dense kernel matrices, and it leads to a natural definition of turbulence as information loss. Numerical experiments support the importance of selecting physics-informed kernels and illustrate the major impact of such kernels on accuracy and stability. Since the proposed approach has a Bayesian interpretation, it naturally enables data assimilation and making predictions and estimations based on mixing simulation data with experimental data. △ Less

Submitted 28 January, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 26 pages. See https://www.youtube.com/user/HoumanOwhadi for animations

MSC Class: 35Q30; 76D05; 60G15; 65M75; 65N75; 65N35; 47B34; 41A15; 34B15

Journal ref: Applied Mathematics and Mechanics, 2023

arXiv:2206.02563 [pdf, other]

doi 10.1016/j.jcp.2022.111595

Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics

Authors: Jean-Luc Akian, Luc Bonnet, Houman Owhadi, Éric Savin

Abstract: This paper introduces algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. We adopt the setting of kernel method solutions in ad hoc functional spaces, namely Reproducing Kernel Hilbert Spaces (RKHS), to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning. A first class of algorithms is… ▽ More This paper introduces algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. We adopt the setting of kernel method solutions in ad hoc functional spaces, namely Reproducing Kernel Hilbert Spaces (RKHS), to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning. A first class of algorithms is kernel flow, which was introduced in the context of classification in machine learning. It can be seen as a cross-validation procedure whereby a "best" kernel is selected such that the loss of accuracy incurred by removing some part of the dataset (typically half of it) is minimized. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal in the associated RKHS. Within Mercer's theorem framework, we obtain an explicit construction of that "best" kernel in terms of the main features of the target function. Both approaches of learning kernels from data are illustrated by numerical examples on synthetic test functions, and on a classical test case in turbulence modeling validation for transonic flows about a two-dimensional airfoil. △ Less

Submitted 31 August, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Journal ref: J. Comput. Phys. 2022

arXiv:2112.04161 [pdf, ps, other]

Aggregation of Pareto optimal models

Authors: Hamed Hamze Bajgiran, Houman Owhadi

Abstract: In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preser… ▽ More In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior $π$ prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors $π$ and $π'$ prefer model A over model B, then the prior obtained by aggregating $π$ and $π'$ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2111.13037 [pdf, other]

Learning dynamical systems from data: A simple cross-validation perspective, part III: Irregularly-Sampled Time Series

Authors: Jonghyeon Lee, Edward De Brouwer, Boumediene Hamzi, Houman Owhadi

Abstract: A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF)~\cite{Owhadi19} (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no s… ▽ More A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF)~\cite{Owhadi19} (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used for interpolation). Despite its previous successes, this strategy (based on interpolating the vector field driving the dynamical system) breaks down when the observed time series is not regularly sampled in time. In this work, we propose to address this problem by directly approximating the vector field of the dynamical system by incorporating time differences between observations in the (KF) data-adapted kernels. We compare our approach with the classical one over different benchmark dynamical systems and show that it significantly improves the forecasting accuracy while remaining simple, fast, and robust. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: Kernel Methods, Kernel Flows, Irregularly-Sampled Time Series

arXiv:2111.11630 [pdf, ps, other]

Aggregation of Models, Choices, Beliefs, and Preferences

Authors: Hamed Hamze Bajgiran, Houman Owhadi

Abstract: A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models $A$ and $B$, if the output of model $A$ is $f(A)$ and if the output model $B$ is $f(B)$, then the output of the model obtained by aggregating $A$ and $B$ must be a weighted average of $f(A)$ and $f(B)$. Similarly, a natural notion of rationality for aggregating preferences of ensembles… ▽ More A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models $A$ and $B$, if the output of model $A$ is $f(A)$ and if the output model $B$ is $f(B)$, then the output of the model obtained by aggregating $A$ and $B$ must be a weighted average of $f(A)$ and $f(B)$. Similarly, a natural notion of rationality for aggregating preferences of ensembles of experts is that, for all (possibly aggregated) experts $A$ and $B$, and all possible choices $x$ and $y$, if both $A$ and $B$ prefer $x$ over $y$, then the expert obtained by aggregating $A$ and $B$ must also prefer $x$ over $y$. Rational aggregation is an important element of uncertainty quantification, and it lies behind many seemingly different results in economic theory: spanning social choice, belief formation, and individual decision making. Three examples of rational aggregation rules are as follows. (1) Give each individual model (expert) a weight (a score) and use weighted averaging to aggregate individual or finite ensembles of models (experts). (2) Order/rank individual model (expert) and let the aggregation of a finite ensemble of individual models (experts) be the highest-ranked individual model (expert) in that ensemble. (3) Give each individual model (expert) a weight, introduce a weak order/ranking over the set of models/experts, aggregate $A$ and $B$ as the weighted average of the highest-ranked models (experts) in $A$ or $B$. Note that (1) and (2) are particular cases of (3). In this paper, we show that all rational aggregation rules are of the form (3). This result unifies aggregation procedures across different economic environments. Following the main representation, we show applications and extensions of our representation in various separated economics topics such as belief formation, choice theory, and social welfare economics. △ Less

Submitted 22 November, 2021; originally announced November 2021.

arXiv:2110.10323 [pdf, other]

Computational Graph Completion

Authors: Houman Owhadi

Abstract: We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be formulated as that of completing (from data) a computational graph (or hypergraph) representing dependencies between functions and variables. Nodes represent variables, and edges represent funct… ▽ More We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be formulated as that of completing (from data) a computational graph (or hypergraph) representing dependencies between functions and variables. Nodes represent variables, and edges represent functions. Functions and variables may be known, unknown, or random. Data comes in the form of observations of distinct values of a finite number of subsets of the variables of the graph (satisfying its functional dependencies). The underlying problem combines a regression problem (approximating unknown functions) with a matrix completion problem (recovering unobserved variables in the data). Replacing unknown functions by Gaussian Processes (GPs) and conditioning on observed data provides a simple but efficient approach to completing such graphs. Since this completion process can be reduced to an algorithm, as one solves $\sqrt{2}$ on a pocket calculator without thinking about it, one could, with the automation of the proposed framework, solve a complex CSE problem by drawing a diagram. Compared to traditional kriging, the proposed framework can be used to recover unknown functions with much scarcer data by exploiting interdependencies between multiple functions and variables. The underlying problem could therefore also be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies. Numerous examples illustrate the flexibility, scope, efficacy, and robustness of the proposed framework and show how it can be used as a pathway to identifying simple solutions to classical CSE problems (digital twin modeling, dimension reduction, mode decomposition, etc.). △ Less

Submitted 29 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: 34 pages. To appear in Research in the Mathematical Sciences

MSC Class: 62A09; 62H22; 65S05; 90C35; 94C15; 46E22; 62J02; 15A83

arXiv:2110.05351 [pdf, other]

Sparse recovery of elliptic solvers from matrix-vector products

Authors: Florian Schäfer, Houman Owhadi

Abstract: In this work, we show that solvers of elliptic boundary value problems in $d$ dimensions can be approximated to accuracy $ε$ from only $\mathcal{O}\left(\log(N)\log^{d}(N / ε)\right)$ matrix-vector products with carefully chosen vectors (right-hand sides). The solver is only accessed as a black box, and the underlying operator may be unknown and of an arbitrarily high order. Our algorithm (1) has… ▽ More In this work, we show that solvers of elliptic boundary value problems in $d$ dimensions can be approximated to accuracy $ε$ from only $\mathcal{O}\left(\log(N)\log^{d}(N / ε)\right)$ matrix-vector products with carefully chosen vectors (right-hand sides). The solver is only accessed as a black box, and the underlying operator may be unknown and of an arbitrarily high order. Our algorithm (1) has complexity $\mathcal{O}\left(N\log^2(N)\log^{2d}(N / ε)\right)$ and represents the solution operator as a sparse Cholesky factorization with $\mathcal{O}\left(N\log(N)\log^{d}(N / ε)\right)$ nonzero entries, (2) allows for embarrassingly parallel evaluation of the solution operator and the computation of its log-determinant, (3) allows for $\mathcal{O}\left(\log(N)\log^{d}(N / ε)\right)$ complexity computation of individual entries of the matrix representation of the solver that, in turn, enables its recompression to an $\mathcal{O}\left(N\log^{d}(N / ε)\right)$ complexity representation. As a byproduct, our compression scheme produces a homogenized solution operator with near-optimal approximation accuracy. By polynomial approximation, we can also approximate the continuous Green's function (in operator and Hilbert-Schmidt norm) to accuracy $ε$ from $\mathcal{O}\left(\log^{1 + d}\left(ε^{-1}\right)\right)$ solutions of the PDE. We include rigorous proofs of these results. To the best of our knowledge, our algorithm achieves the best known trade-off between accuracy $ε$ and the number of required matrix-vector products. △ Less

Submitted 1 October, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: Accepted for publication in SISC. This version updates the link of the code repository and corrects some minor typos

MSC Class: 65N55; 65N22; 65N15

arXiv:2108.10517 [pdf, other]

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball

Authors: Hamed Hamze Bajgiran, Pau Batlle Franch, Houman Owhadi, Mostafa Samir, Clint Scovel, Mahdy Shirdel, Michael Stanley, Peyman Tavallali

Abstract: There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation s… ▽ More There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample $x$, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in $ [0,1]$ acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near $1$, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near $0$, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation. △ Less

Submitted 13 September, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

Comments: 49 pages. To appear in the Journal of Computational Physics

MSC Class: 62C20; 62F03; 62F35; 62F25; 68T37

arXiv:2103.12959 [pdf, other]

Solving and Learning Nonlinear PDEs with Gaussian Processes

Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach: (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs; (2) has guaranteed convergence f… ▽ More We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach: (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs; (2) has guaranteed convergence for a very general class of PDEs, and comes equipped with a path to compute error bounds for specific PDE approximations; (3) inherits the state-of-the-art computational complexity of linear solvers for dense kernel matrices. The main idea of our method is to approximate the solution of a given PDE as the maximum a posteriori (MAP) estimator of a Gaussian process conditioned on solving the PDE at a finite number of collocation points. Although this optimization problem is infinite-dimensional, it can be reduced to a finite-dimensional one by introducing additional variables corresponding to the values of the derivatives of the solution at collocation points; this generalizes the representer theorem arising in Gaussian process regression. The reduced optimization problem has the form of a quadratic objective function subject to nonlinear constraints; it is solved with a variant of the Gauss--Newton method. The resulting algorithm (a) can be interpreted as solving successive linearizations of the nonlinear PDE, and (b) in practice is found to converge in a small number of iterations (2 to 10), for a wide range of PDEs. Most traditional approaches to IPs interleave parameter updates with numerical solution of the PDE; our algorithm solves for both parameter and PDE solution simultaneously. Experiments on nonlinear elliptic PDEs, Burgers' equation, a regularized Eikonal equation, and an IP for permeability identification in Darcy flow illustrate the efficacy and scope of our framework. △ Less

Submitted 10 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: 41 pages

MSC Class: 60G15; 65M75; 65N75; 65N35; 47B34; 41A15; 35R30; 34B15

arXiv:2103.10935 [pdf, other]

Data-driven geophysical forecasting: Simple, low-cost, and accurate baselines with kernel methods

Authors: Boumediene Hamzi, Romit Maulik, Houman Owhadi

Abstract: Modeling geophysical processes as low-dimensional dynamical systems and regressing their vector field from data is a promising approach for learning emulators of such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easi… ▽ More Modeling geophysical processes as low-dimensional dynamical systems and regressing their vector field from data is a promising approach for learning emulators of such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on geophysical observational data, for example, the weekly averaged global sea-surface temperature, considerable gains are also observed by the proposed technique in comparison to classical partial differential equation-based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for the daily temperature of the North-American continent, we see significant improvements over classical baselines such as climatology and persistence-based forecast techniques. Although our experiments concern specific examples, the proposed approach is general, and our results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes. △ Less

Submitted 9 August, 2021; v1 submitted 13 February, 2021; originally announced March 2021.

arXiv:2103.09982 [pdf, other]

Decision Theoretic Bootstrap**

Authors: Peyman Tavallali, Hamed Hamze Bajgiran, Danial J. Esaid, Houman Owhadi

Abstract: The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken… ▽ More The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). We present a general decision-theoretic bootstrap** solution to this problem: (1) partition the available data into a training subset and a UQ subset (2) take $m$ subsampled subsets of the training set and train $m$ models (3) partition the UQ set into $n$ sorted subsets and take a random fraction of them to define $n$ corresponding empirical distributions $μ_{j}$ (4) consider the adversarial game where Player I selects a model $i\in\left\{ 1,\ldots,m\right\} $, Player II selects the UQ distribution $μ_{j}$ and Player I receives a loss defined by evaluating the model $i$ against data points sampled from $μ_{j}$ (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to distributional shift in both the distribution of training data and that of the testing data (2) conditional probability distributions on the output space forming aleatory representations of the uncertainty on the output as a function of the input variable. △ Less

Submitted 17 March, 2021; originally announced March 2021.

arXiv:2008.03920 [pdf, other]

Do ideas have shape? Idea registration as the continuous limit of artificial neural networks

Authors: Houman Owhadi

Abstract: We introduce a GP generalization of ResNets (including ResNets as a particular case). We show that ResNets (and their GP generalization) converge, in the infinite depth limit, to a generalization of image registration variational algorithms. Whereas computational anatomy aligns images via war** of the material space, this generalization aligns ideas (or abstract shapes as in Plato's theory of fo… ▽ More We introduce a GP generalization of ResNets (including ResNets as a particular case). We show that ResNets (and their GP generalization) converge, in the infinite depth limit, to a generalization of image registration variational algorithms. Whereas computational anatomy aligns images via war** of the material space, this generalization aligns ideas (or abstract shapes as in Plato's theory of forms) via the war** of the RKHS of functions map** the input space to the output space. While the Hamiltonian interpretation of ResNets is not new, it was based on an Ansatz. We do not rely on this Ansatz and present the first rigorous proof of convergence of ResNets with trained weights and biases towards a Hamiltonian dynamics driven flow. Our constructive proof reveals several remarkable properties of ResNets and their GP generalization. ResNets regressors are kernel regressors with data-dependent war** kernels. Minimizers of $L_2$ regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The trained weights of ResNets with $L^2$ regularization can be identified by solving an autonomous Hamiltonian system. The trained ResNet parameters are unique up to the initial momentum whose representation is generally sparse. The kernel regularization strategy provides a provably robust alternative to Dropout for ANNs. We introduce a functional generalization of GPs leading to error estimates for ResNets. We identify the (EPDiff) mean fields limit of trained ResNet parameters. We show that the composition of war** regression blocks with reduced equivariant multichannel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations. △ Less

Submitted 27 October, 2022; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: 65 pages. To appear in Physica D (special issue on Machine Learning and Dynamical Systems)

MSC Class: 62J02; 68T01; 62M45

arXiv:2007.05074 [pdf, other]

doi 10.1016/j.physd.2020.132817

Learning dynamical systems from data: a simple cross-validation perspective

Authors: Boumediene Hamzi, Houman Owhadi

Abstract: Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows \cite{Owhadi19} and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators. Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows \cite{Owhadi19} and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: File uploaded on arxiv on Sunday, July 5th, 2020. Got delayed due to tex problems on ArXiv. Original version at https://www.researchgate.net/publication/342693818_Learning_dynamical_systems_from_data_a_simple_cross-validation_perspective

arXiv:2006.10179 [pdf, other]

Competitive Mirror Descent

Authors: Florian Schäfer, Anima Anandkumar, Houman Owhadi

Abstract: Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints. This is a highly expressive modeling language that subsumes most of modern machine learning. In this work we propose competitive mirror descent (CMD): a general method for solving such problems based on first order information that can be obtained by automatic differenti… ▽ More Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints. This is a highly expressive modeling language that subsumes most of modern machine learning. In this work we propose competitive mirror descent (CMD): a general method for solving such problems based on first order information that can be obtained by automatic differentiation. First, by adding Lagrange multipliers, we obtain a simplified constraint set with an associated Bregman potential. At each iteration, we then solve for the Nash equilibrium of a regularized bilinear approximation of the full problem to obtain a direction of movement of the agents. Finally, we obtain the next iterate by following this direction according to the dual geometry induced by the Bregman potential. By using the dual geometry we obtain feasible iterates despite only solving a linear system at each iteration, eliminating the need for projection steps while still accounting for the global nonlinear structure of the constraint set. As a special case we obtain a novel competitive multiplicative weights algorithm for problems on the positive cone. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: The code used to produce the numerical experiments can be found under https://github.com/f-t-s/CMD

arXiv:2005.11375 [pdf, other]

Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation

Authors: Yifan Chen, Houman Owhadi, Andrew M. Stuart

Abstract: Gaussian process regression has proven very powerful in statistics, machine learning and inverse problems. A crucial aspect of the success of this methodology, in a wide range of applications to complex and real-world problems, is hierarchical modeling and learning of hyperparameters. The purpose of this paper is to study two paradigms of learning hierarchical parameters: one is from the probabili… ▽ More Gaussian process regression has proven very powerful in statistics, machine learning and inverse problems. A crucial aspect of the success of this methodology, in a wide range of applications to complex and real-world problems, is hierarchical modeling and learning of hyperparameters. The purpose of this paper is to study two paradigms of learning hierarchical parameters: one is from the probabilistic Bayesian perspective, in particular, the empirical Bayes approach that has been largely used in Bayesian statistics; the other is from the deterministic and approximation theoretic view, and in particular the kernel flow algorithm that was proposed recently in the machine learning literature. Analysis of their consistency in the large data limit, as well as explicit identification of their implicit bias in parameter learning, are established in this paper for a Matérn-like model on the torus. A particular technical challenge we overcome is the learning of the regularity parameter in the Matérn-like field, for which consistency results have been very scarce in the spatial statistics literature. Moreover, we conduct extensive numerical experiments beyond the Matérn-like model, comparing the two algorithms further. These experiments demonstrate learning of other hierarchical parameters, such as amplitude and lengthscale; they also illustrate the setting of model misspecification in which the kernel flow approach could show superior performance to the more traditional empirical Bayes approach. △ Less

Submitted 16 March, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: to appear in Mathematics of Computation

MSC Class: 65F12 62C10 41A05 35Q62

arXiv:2004.14455 [pdf, other]

Sparse Cholesky factorization by Kullback-Leibler minimization

Authors: Florian Schäfer, Matthias Katzfuss, Houman Owhadi

Abstract: We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Θ$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $\mathcal{N}(0, Θ)$ and $\mathcal{N}(0, L^{-\top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia ap… ▽ More We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Θ$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $\mathcal{N}(0, Θ)$ and $\mathcal{N}(0, L^{-\top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of $Θ$ obtained from pairwise evaluation of Green's functions of elliptic boundary-value problems at points $\{x_{i}\}_{1 \leq i \leq N} \subset \mathbb{R}^{d}$, we propose an elimination ordering and sparsity pattern that allows us to compute $ε$-approximate inverse Cholesky factors of such $Θ$ in computational complexity $\mathcal{O}(N \log(N/ε)^d)$ in space and $\mathcal{O}(N \log(N/ε)^{2d})$ in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in $N$) space complexity. Motivated by the optimality properties of our methods, we propose methods for applying it to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, sacrificing neither accuracy nor computational complexity. △ Less

Submitted 22 October, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: The code used to run the numerical experiments can be found under https://github.com/f-t-s/cholesky_by_KL_minimization. Appeared in SIAM Journal on Scientific Computing

arXiv:2002.08335 [pdf, other]

doi 10.1016/j.physd.2021.132952

Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows

Authors: Gene Ryan Yoo, Houman Owhadi

Abstract: We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). KFs were introduced as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing… ▽ More We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). KFs were introduced as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing $f_θ(x) = \big(f^{(n)}_{θ_n}\circ f^{(n-1)}_{θ_{n-1}} \circ \dots \circ f^{(1)}_{θ_1}\big)(x)$ for the functional representation of compositional structure of the ANN, the inner layers outputs $h^{(i)}(x) = \big(f^{(i)}_{θ_i}\circ f^{(i-1)}_{θ_{i-1}} \circ \dots \circ f^{(1)}_{θ_1}\big)(x)$ define a hierarchy of feature maps and kernels $k^{(i)}(x,x')=\exp(- γ_i \|h^{(i)}(x)-h^{(i)}(x')\|_2^2)$. When combined with a batch of the dataset these kernels produce KF losses $e_2^{(i)}$ (the $L^2$ regression error incurred by using a random half of the batch to predict the other half) depending on parameters of inner layers $θ_1,\ldots,θ_i$ (and $γ_i$). The proposed method simply consists in aggregating a subset of these KF losses with a classical output loss. We test the proposed method on CNNs and WRNs without alteration of structure nor output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity. We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trap** in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel. △ Less

Submitted 6 August, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:1907.08592 [pdf, other]

Kernel Mode Decomposition and programmable/interpretable regression networks

Authors: Houman Owhadi, Clint Scovel, Gene Ryan Yoo

Abstract: Mode decomposition is a prototypical pattern recognition problem that can be addressed from the (a priori distinct) perspectives of numerical approximation, statistical inference and deep learning. Could its analysis through these combined perspectives be used as a Rosetta stone for deciphering mechanisms at play in deep learning? Motivated by this question we introduce programmable and interpreta… ▽ More Mode decomposition is a prototypical pattern recognition problem that can be addressed from the (a priori distinct) perspectives of numerical approximation, statistical inference and deep learning. Could its analysis through these combined perspectives be used as a Rosetta stone for deciphering mechanisms at play in deep learning? Motivated by this question we introduce programmable and interpretable regression networks for pattern recognition and address mode decomposition as a prototypical problem. The programming of these networks is achieved by assembling elementary modules decomposing and recomposing kernels and data. These elementary steps are repeated across levels of abstraction and interpreted from the equivalent perspectives of optimal recovery, game theory and Gaussian process regression (GPR). The prototypical mode/kernel decomposition module produces an optimal approximation $(w_1,w_2,\cdots,w_m)$ of an element $(v_1,v_2,\ldots,v_m)$ of a product of Hilbert subspaces of a common Hilbert space from the observation of the sum $v:=v_1+\cdots+v_m$. The prototypical mode/kernel recomposition module performs partial sums of the recovered modes $w_i$ based on the alignment between each recovered mode $w_i$ and the data $v$. We illustrate the proposed framework by programming regression networks approximating the modes $v_i= a_i(t)y_i\big(θ_i(t)\big)$ of a (possibly noisy) signal $\sum_i v_i$ when the amplitudes $a_i$, instantaneous phases $θ_i$ and periodic waveforms $y_i$ may all be unknown and show near machine precision recovery under regularity and separation assumptions on the instantaneous amplitudes $a_i$ and frequencies $\dotθ_i$. The structure of some of these networks share intriguing similarities with convolutional neural networks while being interpretable, programmable and amenable to theoretical analysis. △ Less

Submitted 5 August, 2020; v1 submitted 19 July, 2019; originally announced July 2019.

Comments: 102 pages, 39 figures. Python source codes available at https://github.com/kernel-enthusiasts/Kernel-Mode-Decomposition-1D

MSC Class: 62J02; 68T01; 68T10; 42C15

arXiv:1808.04475 [pdf, other]

doi 10.1016/j.jcp.2019.03.040

Kernel Flows: from learning kernels from data into the abyss

Authors: Houman Owhadi, Gene Ryan Yoo

Abstract: Learning can be seen as approximating an unknown function by interpolating the training data. Kriging offers a solution to this problem based on the prior specification of a kernel. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accu… ▽ More Learning can be seen as approximating an unknown function by interpolating the training data. Kriging offers a solution to this problem based on the prior specification of a kernel. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accuracy (measured using the intrinsic RKHS norm $\|\cdot\|$ associated with the kernel). We first test and motivate this idea on a simple problem of recovering the Green's function of an elliptic PDE (with inhomogeneous coefficients) from the sparse observation of one of its solutions. Next we consider the problem of learning non-parametric families of deep kernels of the form $K_1(F_n(x),F_n(x'))$ with $F_{n+1}=(I_d+εG_{n+1})\circ F_n$ and $G_{n+1} \in \operatorname{Span}\{K_1(F_n(x_i),\cdot)\}$. With the proposed approach constructing the kernel becomes equivalent to integrating a stochastic data driven dynamical system, which allows for the training of very deep (bottomless) networks and the exploration of their properties. These networks learn by constructing flow maps in the kernel and input spaces via incremental data-dependent deformations/perturbations (appearing as the cooperative counterpart of adversarial examples) and, at profound depths, they (1) can achieve accurate classification from only one data point per class (2) appear to learn archetypes of each class (3) expand distances between points that are in different classes and contract distances between points in the same class. For kernels parameterized by the weights of Convolutional Neural Networks, minimizing approximation errors incurred by halving random subsets of interpolation points, appears to outperform training (the same CNN architecture) with relative entropy and dropout. △ Less

Submitted 22 September, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: 42 pages, 31 figures. See https://www.youtube.com/watch?v=h9wB8FVH7YM&list=PLdWd7x7FVuLphAODzEvj2KRNws7z7Sv87 for animations of the flows. See http://users.cms.caltech.edu/~owhadi/index_htm_files/kf-static.pdf for slides. See http://users.cms.caltech.edu/~owhadi/KF/ for high resolution videos of the flows in the setting of the Swiss roll cheesecake classification problem

MSC Class: 62J02; 68T01; 91C20

arXiv:1806.00565 [pdf, other]

Fast eigenpairs computation with operator adapted wavelets and hierarchical subspace correction

Authors: Hehu Xie, Lei Zhang, Houman Owhadi

Abstract: We present a method for the fast computation of the eigenpairs of a bijective positive symmetric linear operator $\mathcal{L}$. The method is based on a combination of operator adapted wavelets (gamblets) with hierarchical subspace correction.First, gamblets provide a raw but fast approximation of the eigensubspaces of $\mathcal{L}$ by block-diagonalizing $\mathcal{L}$ into sparse and well-conditi… ▽ More We present a method for the fast computation of the eigenpairs of a bijective positive symmetric linear operator $\mathcal{L}$. The method is based on a combination of operator adapted wavelets (gamblets) with hierarchical subspace correction.First, gamblets provide a raw but fast approximation of the eigensubspaces of $\mathcal{L}$ by block-diagonalizing $\mathcal{L}$ into sparse and well-conditioned blocks. Next, the hierarchical subspace correction method, computes the eigenpairs associated with the Galerkin restriction of $\mathcal{L}$ to a coarse (low dimensional) gamblet subspace, and then, corrects those eigenpairs by solving a hierarchy of linear problems in the finer gamblet subspaces (from coarse to fine, using multigrid iteration). The proposed algorithm is robust for the presence of multiple (a continuum of) scales and is shown to be of near-linear complexity when $\mathcal{L}$ is an (arbitrary local, e.g.~differential) operator map** $\mathcal{H}^s_0(Ω)$ to $\mathcal{H}^{-s}(Ω)$ (e.g.~an elliptic PDE with rough coefficients). △ Less

Submitted 4 September, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

MSC Class: 65N30; 65N25; 65L15; 65B99

arXiv:1805.10736 [pdf, other]

De-noising by thresholding operator adapted wavelets

Authors: Gene Ryan Yoo, Houman Owhadi

Abstract: Donoho and Johnstone proposed a method from reconstructing an unknown smooth function $u$ from noisy data $u+ζ$ by translating the empirical wavelet coefficients of $u+ζ$ towards zero. We consider the situation where the prior information on the unknown function $u$ may not be the regularity of $u$ but that of $ Łu$ where $Ł$ is a linear operator (such as a PDE or a graph Laplacian). We show that… ▽ More Donoho and Johnstone proposed a method from reconstructing an unknown smooth function $u$ from noisy data $u+ζ$ by translating the empirical wavelet coefficients of $u+ζ$ towards zero. We consider the situation where the prior information on the unknown function $u$ may not be the regularity of $u$ but that of $ Łu$ where $Ł$ is a linear operator (such as a PDE or a graph Laplacian). We show that the approximation of $u$ obtained by thresholding the gamblet (operator adapted wavelet) coefficients of $u+ζ$ is near minimax optimal (up to a multiplicative constant), and with high probability, its energy norm (defined by the operator) is bounded by that of $u$ up to a constant depending on the amplitude of the noise. Since gamblets can be computed in $\mathcal{O}(N \operatorname{polylog} N)$ complexity and are localized both in space and eigenspace, the proposed method is of near-linear complexity and generalizable to non-homogeneous noise. △ Less

Submitted 27 May, 2018; originally announced May 2018.

MSC Class: 60G35; 62M20; 62C20; 65N99; 65F99

arXiv:1706.02205 [pdf, other]

Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity

Authors: Florian Schäfer, T. J. Sullivan, Houman Owhadi

Abstract: Dense kernel matrices $Θ\in \mathbb{R}^{N \times N}$ obtained from point evaluations of a covariance function $G$ at locations $\{ x_{i} \}_{1 \leq i \leq N} \subset \mathbb{R}^{d}$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously-distributed sampling points, we show how to ident… ▽ More Dense kernel matrices $Θ\in \mathbb{R}^{N \times N}$ obtained from point evaluations of a covariance function $G$ at locations $\{ x_{i} \}_{1 \leq i \leq N} \subset \mathbb{R}^{d}$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously-distributed sampling points, we show how to identify a subset $S \subset \{ 1 , \dots , N \}^2$, with $\# S = O ( N \log (N) \log^{d} ( N /ε) )$, such that the zero fill-in incomplete Cholesky factorisation of the sparse matrix $Θ_{ij} 1_{( i, j ) \in S}$ is an $ε$-approximation of $Θ$. This factorisation can provably be obtained in complexity $O ( N \log( N ) \log^{d}( N /ε) )$ in space and $O ( N \log^{2}( N ) \log^{2d}( N /ε) )$ in time, improving upon the state of the art for general elliptic operators; we further present numerical evidence that $d$ can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the $x_{i}$ and does not require an analytic representation of $G$. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity $O ( N \log^{d}( N /ε) )$ in space and $O ( N \log^{2d}( N /ε) )$ in time, improving upon the state of the art for general elliptic operators. △ Less

Submitted 30 October, 2020; v1 submitted 7 June, 2017; originally announced June 2017.

Comments: 52 pages. A high level summary of this work can be found under https://f-t-s.github.io/projects/cholesky/

MSC Class: 65F30; 42C40; 65F50; 65N55; 65N75; 60G42; 68Q25; 68W40

arXiv:1703.10761 [pdf, other]

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis

Authors: Houman Owhadi, Clint Scovel

Abstract: We show how the discovery of robust scalable numerical solvers for arbitrary bounded linear operators can be automated as a Game Theory problem by reformulating the process of computing with partial information and limited resources as that of playing underlying hierarchies of adversarial information games. When the solution space is a Banach space $B$ endowed with a quadratic norm $\|\cdot\|$, th… ▽ More We show how the discovery of robust scalable numerical solvers for arbitrary bounded linear operators can be automated as a Game Theory problem by reformulating the process of computing with partial information and limited resources as that of playing underlying hierarchies of adversarial information games. When the solution space is a Banach space $B$ endowed with a quadratic norm $\|\cdot\|$, the optimal measure (mixed strategy) for such games (e.g. the adversarial recovery of $u\in B$, given partial measurements $[φ_i, u]$ with $φ_i\in B^*$, using relative error in $\|\cdot\|$-norm as a loss) is a centered Gaussian field $ξ$ solely determined by the norm $\|\cdot\|$, whose conditioning (on measurements) produces optimal bets. When measurements are hierarchical, the process of conditioning this Gaussian field produces a hierarchy of elementary bets (gamblets). These gamblets generalize the notion of Wavelets and Wannier functions in the sense that they are adapted to the norm $\|\cdot\|$ and induce a multi-resolution decomposition of $B$ that is adapted to the eigensubspaces of the operator defining the norm $\|\cdot\|$. When the operator is localized, we show that the resulting gamblets are localized both in space and frequency and introduce the Fast Gamblet Transform (FGT) with rigorous accuracy and (near-linear) complexity estimates. As the FFT can be used to solve and diagonalize arbitrary PDEs with constant coefficients, the FGT can be used to decompose a wide range of continuous linear operators (including arbitrary continuous linear bijections from $H^s_0$ to $H^{-s}$ or to $L^2$) into a sequence of independent linear systems with uniformly bounded condition numbers and leads to $\mathcal{O}(N \operatorname{polylog} N)$ solvers and eigenspace adapted Multiresolution Analysis (resulting in near linear complexity approximation of all eigensubspaces). △ Less

Submitted 29 May, 2017; v1 submitted 31 March, 2017; originally announced March 2017.

Comments: 142 pages. 14 Figures. Presented at AFOSR (Aug 2016), DARPA (Sep 2016), IPAM (Apr 3, 2017), Hausdorff (April 13, 2017) and ICERM (June 5, 2017)

MSC Class: 68T99; 65T60; 65M55; 65N55; 65F99; 65N75; 62C99; 62C20; 62C10; 42C40; 60G42; 68Q25; 15A18; 35Q91

arXiv:1703.00058 [pdf, other]

On testing the simulation theory

Authors: Tom Campbell, Houman Owhadi, Joe Sauvageau, David Watkinson

Abstract: Can the theory that reality is a simulation be tested? We investigate this question based on the assumption that if the system performing the simulation is finite (i.e. has limited resources), then to achieve low computational complexity, such a system would, as in a video game, render content (reality) only at the moment that information becomes available for observation by a player and not at th… ▽ More Can the theory that reality is a simulation be tested? We investigate this question based on the assumption that if the system performing the simulation is finite (i.e. has limited resources), then to achieve low computational complexity, such a system would, as in a video game, render content (reality) only at the moment that information becomes available for observation by a player and not at the moment of detection by a machine (that would be part of the simulation and whose detection would also be part of the internal computation performed by the Virtual Reality server before rendering content to the player). Guided by this principle we describe conceptual wave/particle duality experiments aimed at testing the simulation theory. △ Less

Submitted 6 June, 2017; v1 submitted 28 February, 2017; originally announced March 2017.

Comments: 22 pages, 8 figures, final version to appear in IJQF

arXiv:1606.07686 [pdf, other]

doi 10.1016/j.jcp.2017.06.037

Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients

Authors: Houman Owhadi, Lei Zhang

Abstract: Implicit schemes are popular methods for the integration of time dependent PDEs such as hyperbolic and parabolic PDEs. However the necessity to solve corresponding linear systems at each time step constitutes a complexity bottleneck in their application to PDEs with rough coefficients. We present a generalization of gamblets introduced in \cite{OwhadiMultigrid:2015} enabling the resolution of thes… ▽ More Implicit schemes are popular methods for the integration of time dependent PDEs such as hyperbolic and parabolic PDEs. However the necessity to solve corresponding linear systems at each time step constitutes a complexity bottleneck in their application to PDEs with rough coefficients. We present a generalization of gamblets introduced in \cite{OwhadiMultigrid:2015} enabling the resolution of these implicit systems in near-linear complexity and provide rigorous a-priori error bounds on the resulting numerical approximations of hyperbolic and parabolic PDEs. These generalized gamblets induce a multiresolution decomposition of the solution space that is adapted to both the underlying (hyperbolic and parabolic) PDE (and the system of ODEs resulting from space discretization) and to the time-steps of the numerical scheme. △ Less

Submitted 30 June, 2017; v1 submitted 24 June, 2016; originally announced June 2016.

Comments: 55 pages. 26 figures

MSC Class: 65T60; 65N55; 65N75; 62C99; 42C40; 62M86

Journal ref: Journal of Computational Physics, 347, 99-128, 2017

arXiv:1508.02449 [pdf, ps, other]

doi 10.1007/978-3-319-11259-6_3-1

Towards Machine Wald

Authors: Houman Owhadi, Clint Scovel

Abstract: The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of… ▽ More The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity. △ Less

Submitted 1 October, 2015; v1 submitted 10 August, 2015; originally announced August 2015.

Comments: 37 pages

MSC Class: 62C99; 68Q32

arXiv:1506.04288 [pdf, ps, other]

Separability of reproducing kernel spaces

Authors: Houman Owhadi, Clint Scovel

Abstract: We demonstrate that a reproducing kernel Hilbert or Banach space of functions on a separable absolute Borel space or an analytic subset of a Polish space is separable if it possesses a Borel measurable feature map. We demonstrate that a reproducing kernel Hilbert or Banach space of functions on a separable absolute Borel space or an analytic subset of a Polish space is separable if it possesses a Borel measurable feature map. △ Less

Submitted 5 July, 2016; v1 submitted 13 June, 2015; originally announced June 2015.

MSC Class: 46E22

arXiv:1506.04208 [pdf, ps, other]

Conditioning Gaussian measure on Hilbert space

Authors: Houman Owhadi, Clint Scovel

Abstract: For a Gaussian measure on a separable Hilbert space with covariance operator $C$, we show that the family of conditional measures associated with conditioning on a closed subspace $S^{\perp}$ are Gaussian with covariance operator the short $\mathcal{S}(C)$ of the operator $C$ to $S$. We provide two proofs. The first uses the theory of Gaussian Hilbert spaces and a characterization of the shorted o… ▽ More For a Gaussian measure on a separable Hilbert space with covariance operator $C$, we show that the family of conditional measures associated with conditioning on a closed subspace $S^{\perp}$ are Gaussian with covariance operator the short $\mathcal{S}(C)$ of the operator $C$ to $S$. We provide two proofs. The first uses the theory of Gaussian Hilbert spaces and a characterization of the shorted operator by Andersen and Trapp. The second uses recent developments by Corach, Maestripieri and Stojanoff on the relationship between the shorted operator and $C$-symmetric oblique projections onto $S^{\perp}$. To obtain the assertion when such projections do not exist, we develop an approximation result for the shorted operator by showing, for any positive operator $A$, how to construct a sequence of approximating operators $A^{n}$ which possess $A^{n}$-symmetric oblique projections onto $S^{\perp}$ such that the sequence of shorted operators $\mathcal{S}(A^{n})$ converges to $\mathcal{S}(A)$ in the weak operator topology. This result combined with the martingale convergence of random variables associated with the corresponding approximations $C^{n}$ establishes the main assertion in general. Moreover, it in turn strengthens the approximation theorem for shorted operator when the operator is trace class; then the sequence of shorted operators $\mathcal{S}(A^{n})$ converges to $\mathcal{S}(A)$ in trace norm. △ Less

Submitted 1 September, 2015; v1 submitted 12 June, 2015; originally announced June 2015.

MSC Class: 28C20

arXiv:1504.06745 [pdf, ps, other]

Extreme points of a ball about a measure with finite support

Authors: Houman Owhadi, Clint Scovel

Abstract: We show that, for the space of Borel probability measures on a Borel subset of a Polish metric space, the extreme points of the Prokhorov, Monge-Wasserstein and Kantorovich metric balls about a measure whose support has at most n points, consist of measures whose supports have at most n+2 points. Moreover, we use the Strassen and Kantorovich-Rubinstein duality theorems to develop representations o… ▽ More We show that, for the space of Borel probability measures on a Borel subset of a Polish metric space, the extreme points of the Prokhorov, Monge-Wasserstein and Kantorovich metric balls about a measure whose support has at most n points, consist of measures whose supports have at most n+2 points. Moreover, we use the Strassen and Kantorovich-Rubinstein duality theorems to develop representations of supersets of the extreme points based on linear programming, and then develop these representations towards the goal of their efficient computation. △ Less

Submitted 28 March, 2016; v1 submitted 25 April, 2015; originally announced April 2015.

MSC Class: 60A10

arXiv:1503.03467 [pdf, other]

Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games

Authors: Houman Owhadi

Abstract: We introduce a near-linear complexity (geometric and meshless/algebraic) multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision/game theory formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measur… ▽ More We introduce a near-linear complexity (geometric and meshless/algebraic) multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision/game theory formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measurements based on norm constraints on its image under a linear operator (3) gambling on the value of the solution of the PDE based on a hierarchy of nested measurements of its solution or source term. The resulting elementary gambles form a hierarchy of (deterministic) basis functions of $H^1_0(Ω)$ (gamblets) that (1) are orthogonal across subscales/subbands with respect to the scalar product induced by the energy norm of the PDE (2) enable sparse compression of the solution space in $H^1_0(Ω)$ (3) induce an orthogonal multiresolution operator decomposition. The operating diagram of the multigrid method is that of an inverted pyramid in which gamblets are computed locally (by virtue of their exponential decay), hierarchically (from fine to coarse scales) and the PDE is decomposed into a hierarchy of independent linear systems with uniformly bounded condition numbers. The resulting algorithm is parallelizable both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other). Although the method is deterministic it has a natural Bayesian interpretation under the measure of probability emerging (as a mixed strategy) from the information game formulation and multiresolution approximations form a martingale with respect to the filtration induced by the hierarchy of nested measurements. △ Less

Submitted 10 February, 2017; v1 submitted 11 March, 2015; originally announced March 2015.

Comments: Presented at SIAM CSE 15. Final (published) version. http://epubs.siam.org/doi/abs/10.1137/15M1013894

MSC Class: 68T99; 65N55; 65F99; 65N75; 62C99; 42C40; 60G42; 68Q25

Journal ref: SIAM Rev. 59-1, pp. 99-149 (2017)

arXiv:1411.3984 [pdf, other]

Qualitative Robustness in Bayesian Inference

Authors: Houman Owhadi, Clint Scovel

Abstract: The practical implementation of Bayesian inference requires numerical approximation when closed-form expressions are not available. What types of accuracy (convergence) of the numerical approximations guarantee robustness and what types do not? In particular, is the recursive application of Bayes' rule robust when subsequent data or posteriors are approximated? When the prior is the push forward o… ▽ More The practical implementation of Bayesian inference requires numerical approximation when closed-form expressions are not available. What types of accuracy (convergence) of the numerical approximations guarantee robustness and what types do not? In particular, is the recursive application of Bayes' rule robust when subsequent data or posteriors are approximated? When the prior is the push forward of a distribution by the map induced by the solution of a PDE, in which norm should that solution be approximated? Motivated by such questions, we investigate the sensitivity of the distribution of posterior distributions (i.e. posterior distribution-valued random variables, randomized through the data) with respect to perturbations of the prior and data generating distributions in the limit when the number of data points grows towards infinity. △ Less

Submitted 20 April, 2016; v1 submitted 14 November, 2014; originally announced November 2014.

arXiv:1406.6668 [pdf, ps, other]

Bayesian Numerical Homogenization

Authors: Houman Owhadi

Abstract: Numerical homogenization, i.e. the finite-dimensional approximation of solution spaces of PDEs with arbitrary rough coefficients, requires the identification of accurate basis elements. These basis elements are oftentimes found after a laborious process of scientific investigation and plain guesswork. Can this identification problem be facilitated? Is there a general recipe/decision framework for… ▽ More Numerical homogenization, i.e. the finite-dimensional approximation of solution spaces of PDEs with arbitrary rough coefficients, requires the identification of accurate basis elements. These basis elements are oftentimes found after a laborious process of scientific investigation and plain guesswork. Can this identification problem be facilitated? Is there a general recipe/decision framework for guiding the design of basis elements? We suggest that the answer to the above questions could be positive based on the reformulation of numerical homogenization as a Bayesian Inference problem in which a given PDE with rough coefficients (or multi-scale operator) is excited with noise (random right hand side/source term) and one tries to estimate the value of the solution at a given point based on a finite number of observations. We apply this reformulation to the identification of bases for the numerical homogenization of arbitrary integro-differential equations and show that these bases have optimal recovery properties. In particular we show how Rough Polyharmonic Splines can be re-discovered as the optimal solution of a Gaussian filtering problem. △ Less

Submitted 9 May, 2015; v1 submitted 25 June, 2014; originally announced June 2014.

Comments: 22 pages. To appear in SIAM Multiscale Modeling and Simulation

MSC Class: 41A15; 34E13; 62C10; 60H30

arXiv:1311.7130 [pdf, ps, other]

Convex Optimal Uncertainty Quantification

Authors: Shuo Han, Molei Tao, Ufuk Topcu, Houman Owhadi, Richard M. Murray

Abstract: Optimal uncertainty quantification (OUQ) is a framework for numerical extreme-case analysis of stochastic systems with imperfect knowledge of the underlying probability distribution. This paper presents sufficient conditions under which an OUQ problem can be reformulated as a finite-dimensional convex optimization problem, for which efficient numerical solutions can be obtained. The sufficient con… ▽ More Optimal uncertainty quantification (OUQ) is a framework for numerical extreme-case analysis of stochastic systems with imperfect knowledge of the underlying probability distribution. This paper presents sufficient conditions under which an OUQ problem can be reformulated as a finite-dimensional convex optimization problem, for which efficient numerical solutions can be obtained. The sufficient conditions include that the objective function is piecewise concave and the constraints are piecewise convex. In particular, we show that piecewise concave objective functions may appear in applications where the objective is defined by the optimal value of a parameterized linear program. △ Less

Submitted 27 April, 2015; v1 submitted 27 November, 2013; originally announced November 2013.

Comments: Accepted for publication in SIAM Journal on Optimization

arXiv:1310.6460 [pdf, ps, other]

doi 10.1007/s00205-015-0932-4

Temporal homogenization of linear ODEs, with applications to parametric super-resonance and energy harvest

Authors: Molei Tao, Houman Owhadi

Abstract: We consider the temporal homogenization of linear ODEs of the form $\dot{x}=Ax+εP(t)x+f(t)$, where $P(t)$ is periodic and $ε$ is small. Using a 2-scale expansion approach, we obtain the long-time approximation $x(t)\approx \exp(At) \left( Ω(t)+\int_0^t \exp(-A τ) f(τ) \, dτ\right)$, where $Ω$ solves the cell problem $\dotΩ=εB Ω+ εF(t)$ with an effective matrix $B$ and an explicitly-known $F(t)$. W… ▽ More We consider the temporal homogenization of linear ODEs of the form $\dot{x}=Ax+εP(t)x+f(t)$, where $P(t)$ is periodic and $ε$ is small. Using a 2-scale expansion approach, we obtain the long-time approximation $x(t)\approx \exp(At) \left( Ω(t)+\int_0^t \exp(-A τ) f(τ) \, dτ\right)$, where $Ω$ solves the cell problem $\dotΩ=εB Ω+ εF(t)$ with an effective matrix $B$ and an explicitly-known $F(t)$. We provide necessary and sufficient condition for the accuracy of the approximation (over a $\mathcal{O}(ε^{-1})$ time-scale), and show how $B$ can be computed (at a cost independent of $ε$). As a direct application, we investigate the possibility of using RLC circuits to harvest the energy contained in small scale oscillations of ambient electromagnetic fields (such as Schumann resonances). Although a RLC circuit parametrically coupled to the field may achieve such energy extraction via parametric resonance, its resistance $R$ needs to be smaller than a threshold $κ$ proportional to the fluctuations of the field, thereby limiting practical applications. We show that if $n$ RLC circuits are appropriately coupled via mutual capacitances or inductances, then energy extraction can be achieved when the resistance of each circuit is smaller than $nκ$. Hence, if the resistance of each circuit has a non-zero fixed value, energy extraction can be made possible through the coupling of a sufficiently large number $n$ of circuits ($n\approx 1000$ for the first mode of Schumann resonances and contemporary values of capacitances, inductances and resistances). The theory is also applied to the control of the oscillation amplitude of a (damped) oscillator. △ Less

Submitted 25 September, 2015; v1 submitted 23 October, 2013; originally announced October 2013.

arXiv:1308.6306 [pdf, other]

doi 10.1137/130938633

On the Brittleness of Bayesian Inference

Authors: Houman Owhadi, Clint Scovel, Tim Sullivan

Abstract: With the advent of high-performance computing, Bayesian methods are increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question… ▽ More With the advent of high-performance computing, Bayesian methods are increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is a pressing question for which there currently exist positive and negative results. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they could be generically brittle when applied to continuous systems (and their discretizations) with finite information on the data-generating distribution. If closeness is defined in terms of the total variation metric or the matching of a finite system of generalized moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusions. The mechanism causing brittlenss/robustness suggests that learning and robustness are antagonistic requirements and raises the question of a missing stability condition for using Bayesian Inference in a continuous world under finite information. △ Less

Submitted 10 April, 2015; v1 submitted 28 August, 2013; originally announced August 2013.

Comments: 20 pages, 2 figures. To appear in SIAM Review (Research Spotlights). arXiv admin note: text overlap with arXiv:1304.6772

MSC Class: 62A01; 62E20; 62F12; 62F15; 62G20; 62G35

Journal ref: SIAM Rev. 57(4):566--582, 2015

Showing 1–50 of 83 results for author: Owhadi, H