-
Probabilistic Learning on Manifolds (PLoM) with Partition
Authors:
Christian Soize,
Roger Ghanem
Abstract:
The probabilistic learning on manifolds (PLoM) introduced in 2016 has solved difficult supervised problems for the ``small data'' limit where the number N of points in the training set is small. Many extensions have since been proposed, making it possible to deal with increasingly complex cases. However, the performance limit has been observed and explained for applications for which $N$ is very s…
▽ More
The probabilistic learning on manifolds (PLoM) introduced in 2016 has solved difficult supervised problems for the ``small data'' limit where the number N of points in the training set is small. Many extensions have since been proposed, making it possible to deal with increasingly complex cases. However, the performance limit has been observed and explained for applications for which $N$ is very small (50 for example) and for which the dimension of the diffusion-map basis is close to $N$. For these cases, we propose a novel extension based on the introduction of a partition in independent random vectors. We take advantage of this novel development to present improvements of the PLoM such as a simplified algorithm for constructing the diffusion-map basis and a new mathematical result for quantifying the concentration of the probability measure in terms of a probability upper bound. The analysis of the efficiency of this novel extension is presented through two applications.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
Probabilistic learning on manifolds constrained by nonlinear partial differential equations for small datasets
Authors:
Christian Soize,
Roger Ghanem
Abstract:
A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented. It makes it possible to synthesize solutions to a wide range of nonlinear stochastic boundary value problems described by partial differential equations (PDEs) for which a stochastic computational model (SCM) is available and depends on a vector-valued random control parameter. The cost of a single numerical evaluati…
▽ More
A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented. It makes it possible to synthesize solutions to a wide range of nonlinear stochastic boundary value problems described by partial differential equations (PDEs) for which a stochastic computational model (SCM) is available and depends on a vector-valued random control parameter. The cost of a single numerical evaluation of this SCM is assumed to be such that only a limited number of points can be computed for constructing the training dataset (small data). Each point of the training dataset is made up realizations from a vector-valued stochastic process (the stochastic solution) and the associated random control parameter on which it depends. The presented PLoM constrained by PDE allows for generating a large number of learned realizations of the stochastic process and its corresponding random control parameter. These learned realizations are generated so as to minimize the vector-valued random residual of the PDE in the mean-square sense. Appropriate novel methods are developed to solve this challenging problem. Three applications are presented. The first one is a simple uncertain nonlinear dynamical system with a nonstationary stochastic excitation. The second one concerns the 2D nonlinear unsteady Navier-Stokes equations for incompressible flows in which the Reynolds number is the random control parameter. The last one deals with the nonlinear dynamics of a 3D elastic structure with uncertainties. The results obtained make it possible to validate the PLoM constrained by stochastic PDE but also provide further validation of the PLoM without constraint.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Normal-bundle Bootstrap
Authors:
Ruda Zhang,
Roger Ghanem
Abstract:
Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in di…
▽ More
Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrap** projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Environmental Economics and Uncertainty: Review and a Machine Learning Outlook
Authors:
Ruda Zhang,
Patrick Wingo,
Rodrigo Duran,
Kelly Rose,
Jennifer Bauer,
Roger Ghanem
Abstract:
Economic assessment in environmental science concerns the measurement or valuation of environmental impacts, adaptation, and vulnerability. Integrated assessment modeling is a unifying framework of environmental economics, which attempts to combine key elements of physical, ecological, and socioeconomic systems. Uncertainty characterization in integrated assessment varies by component models: unce…
▽ More
Economic assessment in environmental science concerns the measurement or valuation of environmental impacts, adaptation, and vulnerability. Integrated assessment modeling is a unifying framework of environmental economics, which attempts to combine key elements of physical, ecological, and socioeconomic systems. Uncertainty characterization in integrated assessment varies by component models: uncertainties associated with mechanistic physical models are often assessed with an ensemble of simulations or Monte Carlo sampling, while uncertainties associated with impact models are evaluated by conjecture or econometric analysis. Manifold sampling is a machine learning technique that constructs a joint probability model of all relevant variables which may be concentrated on a low-dimensional geometric structure. Compared with traditional density estimation methods, manifold sampling is more efficient especially when the data is generated by a few latent variables. The manifold-constrained joint probability model helps answer policy-making questions from prediction, to response, and prevention. Manifold sampling is applied to assess risk of offshore drilling in the Gulf of Mexico.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Sampling of Bayesian posteriors with a non-Gaussian probabilistic learning on manifolds from a small dataset
Authors:
Christian Soize,
Roger Ghanem
Abstract:
This paper tackles the challenge presented by small-data to the task of Bayesian inference. A novel methodology, based on manifold learning and manifold sampling, is proposed for solving this computational statistics problem under the following assumptions: 1) neither the prior model nor the likelihood function are Gaussian and neither can be approximated by a Gaussian measure; 2) the number of fu…
▽ More
This paper tackles the challenge presented by small-data to the task of Bayesian inference. A novel methodology, based on manifold learning and manifold sampling, is proposed for solving this computational statistics problem under the following assumptions: 1) neither the prior model nor the likelihood function are Gaussian and neither can be approximated by a Gaussian measure; 2) the number of functional input (system parameters) and functional output (quantity of interest) can be large; 3) the number of available realizations of the prior model is small, leading to the small-data challenge typically associated with expensive numerical simulations; the number of experimental realizations is also small; 4) the number of the posterior realizations required for decision is much larger than the available initial dataset. The method and its mathematical aspects are detailed. Three applications are presented for validation: The first two involve mathematical constructions aimed to develop intuition around the method and to explore its performance. The third example aims to demonstrate the operational value of the method using a more complex application related to the statistical inverse identification of the non-Gaussian matrix-valued random elasticity field of a damaged biological tissue (osteoporosis in a cortical bone) using ultrasonic waves.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Data-driven discovery of free-form governing differential equations
Authors:
Steven Atkinson,
Waad Subber,
Li** Wang,
Genghis Khan,
Philippe Hawi,
Roger Ghanem
Abstract:
We present a method of discovering governing differential equations from data without the need to specify a priori the terms to appear in the equation. The input to our method is a dataset (or ensemble of datasets) corresponding to a particular solution (or ensemble of particular solutions) of a differential equation. The output is a human-readable differential equation with parameters calibrated…
▽ More
We present a method of discovering governing differential equations from data without the need to specify a priori the terms to appear in the equation. The input to our method is a dataset (or ensemble of datasets) corresponding to a particular solution (or ensemble of particular solutions) of a differential equation. The output is a human-readable differential equation with parameters calibrated to the individual particular solutions provided. The key to our method is to learn differentiable models of the data that subsequently serve as inputs to a genetic programming algorithm in which graphs specify computation over arbitrary compositions of functions, parameters, and (potentially differential) operators on functions. Differential operators are composed and evaluated using recursive application of automatic differentiation, allowing our algorithm to explore arbitrary compositions of operators without the need for human intervention. We also demonstrate an active learning process to identify and remedy deficiencies in the proposed governing equations.
△ Less
Submitted 11 November, 2019; v1 submitted 26 September, 2019;
originally announced October 2019.
-
Entropy-based closure for probabilistic learning on manifolds
Authors:
C. Soizea,
R. Ghanem,
C. Safta,
X. Huan,
Z. P. Vane,
J. Oefelein,
G. Lacaz,
H. N. Najm,
Q. Tang,
X. Chen
Abstract:
In a recent paper, the authors proposed a general methodology for probabilistic learning on manifolds. The method was used to generate numerical samples that are statistically consistent with an existing dataset construed as a realization from a non-Gaussian random vector. The manifold structure is learned using diffusion manifolds and the statistical sample generation is accomplished using a proj…
▽ More
In a recent paper, the authors proposed a general methodology for probabilistic learning on manifolds. The method was used to generate numerical samples that are statistically consistent with an existing dataset construed as a realization from a non-Gaussian random vector. The manifold structure is learned using diffusion manifolds and the statistical sample generation is accomplished using a projected Ito stochastic differential equation. This probabilistic learning approach has been extended to polynomial chaos representation of databases on manifolds and to probabilistic nonconvex constrained optimization with a fixed budget of function evaluations. The methodology introduces an isotropic-diffusion kernel with hyperparameter ε. Currently, ε is more or less arbitrarily chosen. In this paper, we propose a selection criterion for identifying an optimal value of ε, based on a maximum entropy argument. The result is a comprehensive, closed, probabilistic model for characterizing data sets with hidden constraints. This entropy argument ensures that out of all possible models, this is the one that is the most uncertain beyond any specified constraints, which is selected. Applications are presented for several databases.
△ Less
Submitted 28 March, 2018; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Compressive sensing adaptation for polynomial chaos expansions
Authors:
Panagiotis Tsilifis,
Xun Huan,
Cosmin Safta,
Khachik Sargsyan,
Guilhem Lacaze,
Joseph C. Oefelein,
Habib N. Najm,
Roger G. Ghanem
Abstract:
Basis adaptation in Homogeneous Chaos spaces rely on a suitable rotation of the underlying Gaussian germ. Several rotations have been proposed in the literature resulting in adaptations with different convergence properties. In this paper we present a new adaptation mechanism that builds on compressive sensing algorithms, resulting in a reduced polynomial chaos approximation with optimal sparsity.…
▽ More
Basis adaptation in Homogeneous Chaos spaces rely on a suitable rotation of the underlying Gaussian germ. Several rotations have been proposed in the literature resulting in adaptations with different convergence properties. In this paper we present a new adaptation mechanism that builds on compressive sensing algorithms, resulting in a reduced polynomial chaos approximation with optimal sparsity. The developed adaptation algorithm consists of a two-step optimization procedure that computes the optimal coefficients and the input projection matrix of a low dimensional chaos expansion with respect to an optimally rotated basis. We demonstrate the attractive features of our algorithm through several numerical examples including the application on Large-Eddy Simulation (LES) calculations of turbulent combustion in a HIFiRE scramjet engine.
△ Less
Submitted 27 November, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Reduced Wiener Chaos representation of random fields via basis adaptation and projection
Authors:
Panagiotis Tsilifis,
Roger Ghanem
Abstract:
A new characterization of random fields appearing in physical models is presented that is based on their well-known Homogeneous Chaos expansions. We take advantage of the adaptation capabilities of these expansions where the core idea is to rotate the basis of the underlying Gaussian Hilbert space, in order to achieve reduced functional representations that concentrate the induced probability meas…
▽ More
A new characterization of random fields appearing in physical models is presented that is based on their well-known Homogeneous Chaos expansions. We take advantage of the adaptation capabilities of these expansions where the core idea is to rotate the basis of the underlying Gaussian Hilbert space, in order to achieve reduced functional representations that concentrate the induced probability measure in a lower dimensional subspace. For a smooth family of rotations along the domain of interest, the uncorrelated Gaussian inputs are transformed into a Gaussian process, thus introducing a mesoscale that captures intermediate characteristics of the quantity of interest.
△ Less
Submitted 21 March, 2016; v1 submitted 15 March, 2016;
originally announced March 2016.
-
Efficient Bayesian experimentation using an expected information gain lower bound
Authors:
Panagiotis Tsilifis,
Roger G. Ghanem,
Paris Hajali
Abstract:
Experimental design is crucial for inference where limitations in the data collection procedure are present due to cost or other restrictions. Optimal experimental designs determine parameters that in some appropriate sense make the data the most informative possible. In a Bayesian setting this is translated to updating to the best possible posterior. Information theoretic arguments have led to th…
▽ More
Experimental design is crucial for inference where limitations in the data collection procedure are present due to cost or other restrictions. Optimal experimental designs determine parameters that in some appropriate sense make the data the most informative possible. In a Bayesian setting this is translated to updating to the best possible posterior. Information theoretic arguments have led to the formation of the expected information gain as a design criterion. This can be evaluated mainly by Monte Carlo sampling and maximized by using stochastic approximation methods, both known for being computationally expensive tasks. We propose a framework where a lower bound of the expected information gain is used as an alternative design criterion. In addition to alleviating the computational burden, this also addresses issues concerning estimation bias. The problem of permeability inference in a large contaminated area is used to demonstrate the validity of our approach where we employ the massively parallel version of the multiphase multicomponent simulator TOUGH2 to simulate contaminant transport and a Polynomial Chaos approximation of the forward model that further accelerates the objective function evaluations. The proposed methodology is demonstrated to a setting where field measurements are available.
△ Less
Submitted 10 March, 2016; v1 submitted 29 May, 2015;
originally announced June 2015.