Search | arXiv e-print repository

Probabilistic Learning on Manifolds (PLoM) with Partition

Abstract: The probabilistic learning on manifolds (PLoM) introduced in 2016 has solved difficult supervised problems for the ``small data'' limit where the number N of points in the training set is small. Many extensions have since been proposed, making it possible to deal with increasingly complex cases. However, the performance limit has been observed and explained for applications for which $N$ is very s… ▽ More The probabilistic learning on manifolds (PLoM) introduced in 2016 has solved difficult supervised problems for the ``small data'' limit where the number N of points in the training set is small. Many extensions have since been proposed, making it possible to deal with increasingly complex cases. However, the performance limit has been observed and explained for applications for which $N$ is very small (50 for example) and for which the dimension of the diffusion-map basis is close to $N$. For these cases, we propose a novel extension based on the introduction of a partition in independent random vectors. We take advantage of this novel development to present improvements of the PLoM such as a simplified algorithm for constructing the diffusion-map basis and a new mathematical result for quantifying the concentration of the probability measure in terms of a probability upper bound. The analysis of the efficiency of this novel extension is presented through two applications. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 20 pages, 13 figures, preprint

MSC Class: 68Q32; 60E05; 62H10; 65C20; 62P35

arXiv:2010.14324 [pdf, other]

doi 10.1016/j.cma.2021.113777

Probabilistic learning on manifolds constrained by nonlinear partial differential equations for small datasets

Authors: Christian Soize, Roger Ghanem

Abstract: A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented. It makes it possible to synthesize solutions to a wide range of nonlinear stochastic boundary value problems described by partial differential equations (PDEs) for which a stochastic computational model (SCM) is available and depends on a vector-valued random control parameter. The cost of a single numerical evaluati… ▽ More A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented. It makes it possible to synthesize solutions to a wide range of nonlinear stochastic boundary value problems described by partial differential equations (PDEs) for which a stochastic computational model (SCM) is available and depends on a vector-valued random control parameter. The cost of a single numerical evaluation of this SCM is assumed to be such that only a limited number of points can be computed for constructing the training dataset (small data). Each point of the training dataset is made up realizations from a vector-valued stochastic process (the stochastic solution) and the associated random control parameter on which it depends. The presented PLoM constrained by PDE allows for generating a large number of learned realizations of the stochastic process and its corresponding random control parameter. These learned realizations are generated so as to minimize the vector-valued random residual of the PDE in the mean-square sense. Appropriate novel methods are developed to solve this challenging problem. Three applications are presented. The first one is a simple uncertain nonlinear dynamical system with a nonstationary stochastic excitation. The second one concerns the 2D nonlinear unsteady Navier-Stokes equations for incompressible flows in which the Reynolds number is the random control parameter. The last one deals with the nonlinear dynamics of a 3D elastic structure with uncertainties. The results obtained make it possible to validate the PLoM constrained by stochastic PDE but also provide further validation of the PLoM without constraint. △ Less

Submitted 27 October, 2020; originally announced October 2020.

arXiv:2007.13869 [pdf, other]

Normal-bundle Bootstrap

Authors: Ruda Zhang, Roger Ghanem

Abstract: Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in di… ▽ More Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrap** projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting. △ Less

Submitted 27 July, 2020; originally announced July 2020.

MSC Class: 37M22; 53-08; 53A07; 62F40; 62G09

arXiv:2004.11780 [pdf]

doi 10.1093/acrefore/9780199389414.013.572

Environmental Economics and Uncertainty: Review and a Machine Learning Outlook

Authors: Ruda Zhang, Patrick Wingo, Rodrigo Duran, Kelly Rose, Jennifer Bauer, Roger Ghanem

Abstract: Economic assessment in environmental science concerns the measurement or valuation of environmental impacts, adaptation, and vulnerability. Integrated assessment modeling is a unifying framework of environmental economics, which attempts to combine key elements of physical, ecological, and socioeconomic systems. Uncertainty characterization in integrated assessment varies by component models: unce… ▽ More Economic assessment in environmental science concerns the measurement or valuation of environmental impacts, adaptation, and vulnerability. Integrated assessment modeling is a unifying framework of environmental economics, which attempts to combine key elements of physical, ecological, and socioeconomic systems. Uncertainty characterization in integrated assessment varies by component models: uncertainties associated with mechanistic physical models are often assessed with an ensemble of simulations or Monte Carlo sampling, while uncertainties associated with impact models are evaluated by conjecture or econometric analysis. Manifold sampling is a machine learning technique that constructs a joint probability model of all relevant variables which may be concentrated on a low-dimensional geometric structure. Compared with traditional density estimation methods, manifold sampling is more efficient especially when the data is generated by a few latent variables. The manifold-constrained joint probability model helps answer policy-making questions from prediction, to response, and prevention. Manifold sampling is applied to assess risk of offshore drilling in the Gulf of Mexico. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: 24 pages, 7 figures, 1 table. In Oxford Research Encyclopedia of Environmental Science. Oxford University Press

MSC Class: 58J65; 62H05

arXiv:1910.12717 [pdf, ps, other]

Sampling of Bayesian posteriors with a non-Gaussian probabilistic learning on manifolds from a small dataset

Authors: Christian Soize, Roger Ghanem

Abstract: This paper tackles the challenge presented by small-data to the task of Bayesian inference. A novel methodology, based on manifold learning and manifold sampling, is proposed for solving this computational statistics problem under the following assumptions: 1) neither the prior model nor the likelihood function are Gaussian and neither can be approximated by a Gaussian measure; 2) the number of fu… ▽ More This paper tackles the challenge presented by small-data to the task of Bayesian inference. A novel methodology, based on manifold learning and manifold sampling, is proposed for solving this computational statistics problem under the following assumptions: 1) neither the prior model nor the likelihood function are Gaussian and neither can be approximated by a Gaussian measure; 2) the number of functional input (system parameters) and functional output (quantity of interest) can be large; 3) the number of available realizations of the prior model is small, leading to the small-data challenge typically associated with expensive numerical simulations; the number of experimental realizations is also small; 4) the number of the posterior realizations required for decision is much larger than the available initial dataset. The method and its mathematical aspects are detailed. Three applications are presented for validation: The first two involve mathematical constructions aimed to develop intuition around the method and to explore its performance. The third example aims to demonstrate the operational value of the method using a more complex application related to the statistical inverse identification of the non-Gaussian matrix-valued random elasticity field of a damaged biological tissue (osteoporosis in a cortical bone) using ultrasonic waves. △ Less

Submitted 28 October, 2019; originally announced October 2019.

MSC Class: 60J20; 62F15; 68F15

arXiv:1910.05117 [pdf, other]

Data-driven discovery of free-form governing differential equations

Authors: Steven Atkinson, Waad Subber, Li** Wang, Genghis Khan, Philippe Hawi, Roger Ghanem

Abstract: We present a method of discovering governing differential equations from data without the need to specify a priori the terms to appear in the equation. The input to our method is a dataset (or ensemble of datasets) corresponding to a particular solution (or ensemble of particular solutions) of a differential equation. The output is a human-readable differential equation with parameters calibrated… ▽ More We present a method of discovering governing differential equations from data without the need to specify a priori the terms to appear in the equation. The input to our method is a dataset (or ensemble of datasets) corresponding to a particular solution (or ensemble of particular solutions) of a differential equation. The output is a human-readable differential equation with parameters calibrated to the individual particular solutions provided. The key to our method is to learn differentiable models of the data that subsequently serve as inputs to a genetic programming algorithm in which graphs specify computation over arbitrary compositions of functions, parameters, and (potentially differential) operators on functions. Differential operators are composed and evaluated using recursive application of automatic differentiation, allowing our algorithm to explore arbitrary compositions of operators without the need for human intervention. We also demonstrate an active learning process to identify and remedy deficiencies in the proposed governing equations. △ Less

Submitted 11 November, 2019; v1 submitted 26 September, 2019; originally announced October 2019.

Comments: Approved for public release; distribution is unlimited

arXiv:1803.08161

Entropy-based closure for probabilistic learning on manifolds

Authors: C. Soizea, R. Ghanem, C. Safta, X. Huan, Z. P. Vane, J. Oefelein, G. Lacaz, H. N. Najm, Q. Tang, X. Chen

Abstract: In a recent paper, the authors proposed a general methodology for probabilistic learning on manifolds. The method was used to generate numerical samples that are statistically consistent with an existing dataset construed as a realization from a non-Gaussian random vector. The manifold structure is learned using diffusion manifolds and the statistical sample generation is accomplished using a proj… ▽ More In a recent paper, the authors proposed a general methodology for probabilistic learning on manifolds. The method was used to generate numerical samples that are statistically consistent with an existing dataset construed as a realization from a non-Gaussian random vector. The manifold structure is learned using diffusion manifolds and the statistical sample generation is accomplished using a projected Ito stochastic differential equation. This probabilistic learning approach has been extended to polynomial chaos representation of databases on manifolds and to probabilistic nonconvex constrained optimization with a fixed budget of function evaluations. The methodology introduces an isotropic-diffusion kernel with hyperparameter ε. Currently, ε is more or less arbitrarily chosen. In this paper, we propose a selection criterion for identifying an optimal value of ε, based on a maximum entropy argument. The result is a comprehensive, closed, probabilistic model for characterizing data sets with hidden constraints. This entropy argument ensures that out of all possible models, this is the one that is the most uncertain beyond any specified constraints, which is selected. Applications are presented for several databases. △ Less

Submitted 28 March, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: Co author is not happy with the paper would like to withdraw submission and improve the paper

arXiv:1801.01961 [pdf, other]

doi 10.1016/j.jcp.2018.12.010

Compressive sensing adaptation for polynomial chaos expansions

Authors: Panagiotis Tsilifis, Xun Huan, Cosmin Safta, Khachik Sargsyan, Guilhem Lacaze, Joseph C. Oefelein, Habib N. Najm, Roger G. Ghanem

Abstract: Basis adaptation in Homogeneous Chaos spaces rely on a suitable rotation of the underlying Gaussian germ. Several rotations have been proposed in the literature resulting in adaptations with different convergence properties. In this paper we present a new adaptation mechanism that builds on compressive sensing algorithms, resulting in a reduced polynomial chaos approximation with optimal sparsity.… ▽ More Basis adaptation in Homogeneous Chaos spaces rely on a suitable rotation of the underlying Gaussian germ. Several rotations have been proposed in the literature resulting in adaptations with different convergence properties. In this paper we present a new adaptation mechanism that builds on compressive sensing algorithms, resulting in a reduced polynomial chaos approximation with optimal sparsity. The developed adaptation algorithm consists of a two-step optimization procedure that computes the optimal coefficients and the input projection matrix of a low dimensional chaos expansion with respect to an optimally rotated basis. We demonstrate the attractive features of our algorithm through several numerical examples including the application on Large-Eddy Simulation (LES) calculations of turbulent combustion in a HIFiRE scramjet engine. △ Less

Submitted 27 November, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

Comments: Submitted to Journal of Computational Physics

Journal ref: Journal of Computational Physics 380 (2019) 29-47

arXiv:1603.04803 [pdf, ps, other]

doi 10.1016/j.jcp.2017.04.009

Reduced Wiener Chaos representation of random fields via basis adaptation and projection

Authors: Panagiotis Tsilifis, Roger Ghanem

Abstract: A new characterization of random fields appearing in physical models is presented that is based on their well-known Homogeneous Chaos expansions. We take advantage of the adaptation capabilities of these expansions where the core idea is to rotate the basis of the underlying Gaussian Hilbert space, in order to achieve reduced functional representations that concentrate the induced probability meas… ▽ More A new characterization of random fields appearing in physical models is presented that is based on their well-known Homogeneous Chaos expansions. We take advantage of the adaptation capabilities of these expansions where the core idea is to rotate the basis of the underlying Gaussian Hilbert space, in order to achieve reduced functional representations that concentrate the induced probability measure in a lower dimensional subspace. For a smooth family of rotations along the domain of interest, the uncorrelated Gaussian inputs are transformed into a Gaussian process, thus introducing a mesoscale that captures intermediate characteristics of the quantity of interest. △ Less

Submitted 21 March, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

Comments: Submitted to the Journal of Computational Physics

arXiv:1506.00053 [pdf, ps, other]

Efficient Bayesian experimentation using an expected information gain lower bound

Authors: Panagiotis Tsilifis, Roger G. Ghanem, Paris Hajali

Abstract: Experimental design is crucial for inference where limitations in the data collection procedure are present due to cost or other restrictions. Optimal experimental designs determine parameters that in some appropriate sense make the data the most informative possible. In a Bayesian setting this is translated to updating to the best possible posterior. Information theoretic arguments have led to th… ▽ More Experimental design is crucial for inference where limitations in the data collection procedure are present due to cost or other restrictions. Optimal experimental designs determine parameters that in some appropriate sense make the data the most informative possible. In a Bayesian setting this is translated to updating to the best possible posterior. Information theoretic arguments have led to the formation of the expected information gain as a design criterion. This can be evaluated mainly by Monte Carlo sampling and maximized by using stochastic approximation methods, both known for being computationally expensive tasks. We propose a framework where a lower bound of the expected information gain is used as an alternative design criterion. In addition to alleviating the computational burden, this also addresses issues concerning estimation bias. The problem of permeability inference in a large contaminated area is used to demonstrate the validity of our approach where we employ the massively parallel version of the multiphase multicomponent simulator TOUGH2 to simulate contaminant transport and a Polynomial Chaos approximation of the forward model that further accelerates the objective function evaluations. The proposed methodology is demonstrated to a setting where field measurements are available. △ Less

Submitted 10 March, 2016; v1 submitted 29 May, 2015; originally announced June 2015.

Showing 1–10 of 10 results for author: Ghanem, R