-
Causal disentanglement of multimodal data
Authors:
Elise Walker,
Jonas A. Actor,
Carianne Martinez,
Nathaniel Trask
Abstract:
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in…
▽ More
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
△ Less
Submitted 8 November, 2023; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Unsupervised physics-informed disentanglement of multimodal data for high-throughput scientific discovery
Authors:
Nathaniel Trask,
Carianne Martinez,
Kook** Lee,
Brad Boyce
Abstract:
We introduce physics-informed multimodal autoencoders (PIMA) - a variational inference framework for discovering shared information in multimodal scientific datasets representative of high-throughput testing. Individual modalities are embedded into a shared latent space and fused through a product of experts formulation, enabling a Gaussian mixture prior to identify shared features. Sampling from…
▽ More
We introduce physics-informed multimodal autoencoders (PIMA) - a variational inference framework for discovering shared information in multimodal scientific datasets representative of high-throughput testing. Individual modalities are embedded into a shared latent space and fused through a product of experts formulation, enabling a Gaussian mixture prior to identify shared features. Sampling from clusters allows cross-modal generative modeling, with a mixture of expert decoder imposing inductive biases encoding prior scientific knowledge and imparting structured disentanglement of the latent space. This approach enables discovery of fingerprints which may be detected in high-dimensional heterogeneous datasets, avoiding traditional bottlenecks related to high-fidelity measurement and characterization. Motivated by accelerated co-design and optimization of materials manufacturing processes, a dataset of lattice metamaterials from metal additive manufacturing demonstrates accurate cross modal inference between images of mesoscale topology and mechanical stress-strain response.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Probabilistic partition of unity networks: clustering based deep approximation
Authors:
Nat Trask,
Mamikon Gulian,
Andy Huang,
Kook** Lee
Abstract:
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial repr…
▽ More
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial representations of both noiseless and noisy data as Gaussian mixtures with closed form expressions for variance which provides an estimator of local error. The training process yields remarkably sharp partitions of input space based upon correlation of function values. This classification of training points is amenable to a hierarchical refinement strategy that significantly improves the localization of the regression, allowing for higher-order polynomial approximation to be utilized. The framework scales more favorably to large data sets as compared to Gaussian process regression and allows for spatially varying uncertainty, leveraging the expressive power of deep neural networks while bypassing expensive training associated with other probabilistic deep learning methods. Compared to standard deep neural networks, the framework demonstrates hp-convergence without the use of regularizers to tune the localization of partitions. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space. Finally, we introduce a new open-source data set of PDE-based simulations of a semiconductor device and perform unsupervised extraction of a physically interpretable reduced-order basis.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Partition of unity networks: deep hp-approximation
Authors:
Kook** Lee,
Nathaniel A. Trask,
Ravi G. Patel,
Mamikon A. Gulian,
Eric C. Cyr
Abstract:
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measu…
▽ More
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measures are used to build a meshfree partition of space, while polynomial spaces with learnable coefficients are associated to each partition. The resulting hp-element-like approximation allows use of a fast least-squares optimizer, and the resulting architecture size need not scale exponentially with spatial dimension, breaking the curse of dimensionality. An abstract approximation result establishes desirable properties to guide network design. Numerical results for two choices of architecture demonstrate that POUnets yield hp-convergence for smooth functions and consistently outperform MLPs for piecewise polynomial functions with large numbers of discontinuities.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
A physics-informed operator regression framework for extracting data-driven continuum models
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mitchell A. Wood,
Eric C. Cyr
Abstract:
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a…
▽ More
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
A block coordinate descent optimizer for classification problems exploiting convexity
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mamikon A. Gulian,
Eric C. Cyr
Abstract:
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid N…
▽ More
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting possible gains of second-order methods over gradient descent.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Authors:
Eric C. Cyr,
Mamikon A. Gulian,
Ravi G. Patel,
Mauro Perego,
Nathaniel A. Trask
Abstract:
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram…
▽ More
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
GMLS-Nets: A framework for learning from unstructured data
Authors:
Nathaniel Trask,
Ravi G. Patel,
Ben J. Gross,
Paul J. Atzberger
Abstract:
Data fields sampled on irregularly spaced points arise in many applications in the sciences and engineering. For regular grids, Convolutional Neural Networks (CNNs) have been successfully used to gaining benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Moving Least Squares (GMLS). GMLS is a non-parame…
▽ More
Data fields sampled on irregularly spaced points arise in many applications in the sciences and engineering. For regular grids, Convolutional Neural Networks (CNNs) have been successfully used to gaining benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Moving Least Squares (GMLS). GMLS is a non-parametric technique for estimating linear bounded functionals from scattered data, and has recently been used in the literature for solving partial differential equations. By parameterizing the GMLS estimator, we obtain learning methods for operators with unstructured stencils. In GMLS-Nets the necessary calculations are local, readily parallelizable, and the estimator is supported by a rigorous approximation theory. We show how the framework may be used for unstructured physical data sets to perform functional regression to identify associated differential operators and to regress quantities of interest. The results suggest the architectures to be an attractive foundation for data-driven model development in scientific machine learning applications.
△ Less
Submitted 13 September, 2019; v1 submitted 6 September, 2019;
originally announced September 2019.