Search | arXiv e-print repository

Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

Authors: **gmin Sun, Yuxuan Liu, Zecheng Zhang, Hayden Schaeffer

Abstract: Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while co… ▽ More Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while concurrently learning the underlying governing equations of the physical system. Specifically, we focus on multi-operator learning by training distinct one-dimensional time-dependent nonlinear constant coefficient partial differential equations, with potential applications to many physical applications including physics, geology, and biology. More importantly, we provide three extrapolation studies to demonstrate that PROSE-PDE can generalize physical features through the robust training of multiple operators and that the proposed model can extrapolate to predict PDE solutions whose models or data were unseen during the training. Furthermore, we show through systematic numerical experiments that the utilization of the symbolic modality in our model effectively resolves the well-posedness problems with training multiple operators and thus enhances our model's predictive capabilities. △ Less

Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2310.18888 [pdf, other]

D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators

Authors: Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, Hayden Schaeffer

Abstract: Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neu… ▽ More Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neural operators have been used, allowing the sampling of diverse input functions with different sensor locations. However, existing frameworks still require an equal number of sensors for all functions. In our study, we propose a novel distributed approach to further relax the discretization requirements and solve the heterogeneous dataset challenges. Our method involves partitioning the input function space and processing individual input functions using independent and separate neural networks. A centralized neural network is used to handle shared information across all output functions. This distributed methodology reduces the number of gradient descent back-propagation steps, improving efficiency while maintaining accuracy. We demonstrate that the corresponding neural network is a universal approximator of continuous nonlinear operators and present four numerical examples to validate its performance. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2309.16816 [pdf, other]

PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

Authors: Yuxuan Liu, Zecheng Zhang, Hayden Schaeffer

Abstract: Approximating nonlinear differential equations using a neural network provides a robust and efficient tool for various scientific computing tasks, including real-time predictions, inverse problems, optimal controls, and surrogate modeling. Previous works have focused on embedding dynamical systems into networks through two approaches: learning a single solution operator (i.e., the map** from inp… ▽ More Approximating nonlinear differential equations using a neural network provides a robust and efficient tool for various scientific computing tasks, including real-time predictions, inverse problems, optimal controls, and surrogate modeling. Previous works have focused on embedding dynamical systems into networks through two approaches: learning a single solution operator (i.e., the map** from input parametrized functions to solutions) or learning the governing system of equations (i.e., the constitutive model relative to the state variables). Both of these approaches yield different representations for the same underlying data or function. Additionally, observing that families of differential equations often share key characteristics, we seek one network representation across a wide range of equations. Our method, called Predicting Operators and Symbolic Expressions (PROSE), learns maps from multimodal inputs to multimodal outputs, capable of generating both numerical predictions and mathematical equations. By using a transformer structure and a feature fusion approach, our network can simultaneously embed sets of solution operators for various parametric differential equations using a single trained network. Detailed experiments demonstrate that the network benefits from its multimodal nature, resulting in improved prediction accuracy and better generalization. The network is shown to be able to handle noise in the data and errors in the symbolic representation, including noisy numerical values, model misspecification, and erroneous addition or deletion of terms. PROSE provides a new neural network framework for differential equations which allows for more flexibility and generality in learning operators and governing equations from data. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2308.14188 [pdf, other]

Bayesian deep operator learning for homogenized to fine-scale maps for multiscale PDE

Authors: Zecheng Zhang, Christian Moya, Wing Tat Leung, Guang Lin, Hayden Schaeffer

Abstract: We present a new framework for computing fine-scale solutions of multiscale Partial Differential Equations (PDEs) using operator learning tools. Obtaining fine-scale solutions of multiscale PDEs can be challenging, but there are many inexpensive computational methods for obtaining coarse-scale solutions. Additionally, in many real-world applications, fine-scale solutions can only be observed at a… ▽ More We present a new framework for computing fine-scale solutions of multiscale Partial Differential Equations (PDEs) using operator learning tools. Obtaining fine-scale solutions of multiscale PDEs can be challenging, but there are many inexpensive computational methods for obtaining coarse-scale solutions. Additionally, in many real-world applications, fine-scale solutions can only be observed at a limited number of locations. In order to obtain approximations or predictions of fine-scale solutions over general regions of interest, we propose to learn the operator map** from coarse-scale solutions to fine-scale solutions using a limited number (and possibly noisy) observations of the fine-scale solutions. The approach is to train multi-fidelity homogenization maps using mathematically motivated neural operators. The operator learning framework can efficiently obtain the solution of multiscale PDEs at any arbitrary point, making our proposed framework a mesh-free solver. We verify our results on multiple numerical examples showing that our approach is an efficient mesh-free solver for multiscale PDEs. △ Less

Submitted 27 August, 2023; originally announced August 2023.

arXiv:2307.09738 [pdf, other]

A discretization-invariant extension and analysis of some deep operator networks

Authors: Zecheng Zhang, Wing Tat Leung, Hayden Schaeffer

Abstract: We present a generalized version of the discretization-invariant neural operator and prove that the network is a universal approximation in the operator sense. Moreover, by incorporating additional terms in the architecture, we establish a connection between this discretization-invariant neural operator network and those discussed before. The discretization-invariance property of the operator netw… ▽ More We present a generalized version of the discretization-invariant neural operator and prove that the network is a universal approximation in the operator sense. Moreover, by incorporating additional terms in the architecture, we establish a connection between this discretization-invariant neural operator network and those discussed before. The discretization-invariance property of the operator network implies that different input functions can be sampled using various sensor locations within the same training and testing phases. Additionally, since the network learns a ``basis'' for the input and output function spaces, our approach enables the evaluation of input functions on different discretizations. To evaluate the performance of the proposed discretization-invariant neural operator, we focus on challenging examples from multiscale partial differential equations. Our experimental results indicate that the method achieves lower prediction errors compared to previous networks and benefits from its discretization-invariant property. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2212.07336 [pdf, other]

BelNet: Basis enhanced learning, a mesh-free neural operator

Authors: Zecheng Zhang, Wing Tat Leung, Hayden Schaeffer

Abstract: Operator learning trains a neural network to map functions to functions. An ideal operator learning framework should be mesh-free in the sense that the training does not require a particular choice of discretization for the input functions, allows for the input and output functions to be on different domains, and is able to have different grids between samples. We propose a mesh-free neural operat… ▽ More Operator learning trains a neural network to map functions to functions. An ideal operator learning framework should be mesh-free in the sense that the training does not require a particular choice of discretization for the input functions, allows for the input and output functions to be on different domains, and is able to have different grids between samples. We propose a mesh-free neural operator for solving parametric partial differential equations. The basis enhanced learning network (BelNet) projects the input function into a latent space and reconstructs the output functions. In particular, we construct part of the network to learn the ``basis'' functions in the training process. This generalized the networks proposed in Chen and Chen's universal approximation theory for the nonlinear operators to account for differences in input and output meshes. Through several challenging high-contrast and multiscale problems, we show that our approach outperforms other operator learning methods for these tasks and allows for more freedom in the sampling and/or discretization process. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.05591 [pdf, other]

doi 10.1098/rspa.2022.0835

Random Feature Models for Learning Interacting Dynamical Systems

Authors: Yuxuan Liu, Scott G. McCalla, Hayden Schaeffer

Abstract: Particle dynamics and multi-agent systems provide accurate dynamical models for studying and forecasting the behavior of complex interacting systems. They often take the form of a high-dimensional system of differential equations parameterized by an interaction kernel that models the underlying attractive or repulsive forces between agents. We consider the problem of constructing a data-based appr… ▽ More Particle dynamics and multi-agent systems provide accurate dynamical models for studying and forecasting the behavior of complex interacting systems. They often take the form of a high-dimensional system of differential equations parameterized by an interaction kernel that models the underlying attractive or repulsive forces between agents. We consider the problem of constructing a data-based approximation of the interacting forces directly from noisy observations of the paths of the agents in time. The learned interaction kernels are then used to predict the agents behavior over a longer time interval. The approximation developed in this work uses a randomized feature algorithm and a sparse randomized feature approach. Sparsity-promoting regression provides a mechanism for pruning the randomly generated features which was observed to be beneficial when one has limited data, in particular, leading to less overfitting than other approaches. In addition, imposing sparsity reduces the kernel evaluation cost which significantly lowers the simulation cost for forecasting the multi-agent systems. Our method is applied to various examples, including first-order systems with homogeneous and heterogeneous interactions, second order homogeneous systems, and a new sheep swarming system. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2204.06935 [pdf, other]

Concentration of Random Feature Matrices in High-Dimensions

Authors: Zhijun Chen, Hayden Schaeffer, Rachel Ward

Abstract: The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their character… ▽ More The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments. △ Less

Submitted 11 December, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2204.06108 [pdf, other]

SRMD: Sparse Random Mode Decomposition

Authors: Nicholas Richardson, Hayden Schaeffer, Giang Tran

Abstract: Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the… ▽ More Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the spectrogram leads to a sharp separation between time-frequency clusters which makes it easier to identify intrinsic modes, and thus leads to a new data-driven mode decomposition. The applications include signal representation, outlier removal, and mode decomposition. On the benchmark tests, we show that our approach outperforms other state-of-the-art decomposition methods. △ Less

Submitted 15 March, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

arXiv:2202.02877 [pdf, other]

doi 10.1007/s43670-023-00063-9

HARFE: Hard-Ridge Random Feature Expansion

Authors: Esha Saha, Hayden Schaeffer, Giang Tran

Abstract: We propose a random feature model for approximating high-dimensional sparse additive functions called the hard-ridge random feature expansion method (HARFE). This method utilizes a hard-thresholding pursuit-based algorithm applied to the sparse ridge regression (SRR) problem to approximate the coefficients with respect to the random feature matrix. The SRR formulation balances between obtaining sp… ▽ More We propose a random feature model for approximating high-dimensional sparse additive functions called the hard-ridge random feature expansion method (HARFE). This method utilizes a hard-thresholding pursuit-based algorithm applied to the sparse ridge regression (SRR) problem to approximate the coefficients with respect to the random feature matrix. The SRR formulation balances between obtaining sparse models that use fewer terms in their representation and ridge-based smoothing that tend to be robust to noise and outliers. In addition, we use a random sparse connectivity pattern in the random feature matrix to match the additive function assumption. We prove that the HARFE method is guaranteed to converge with a given error bound depending on the noise and the parameters of the sparse ridge regression model. Based on numerical results on synthetic data as well as on real datasets, the HARFE approach obtains lower (or comparable) error than other state-of-the-art algorithms. △ Less

Submitted 2 May, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

Journal ref: Sampling Theory, Signal Processing, and Data Analysis.21.2 (2023) 1-24

arXiv:2112.04002 [pdf, other]

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

Authors: Yuege Xie, Bobby Shi, Hayden Schaeffer, Rachel Ward

Abstract: Sparse shrunk additive models and sparse random feature models have been developed separately as methods to learn low-order functions, where there are few interactions between variables, but neither offers computational efficiency. On the other hand, $\ell_2$-based shrunk additive models are efficient but do not offer feature selection as the resulting coefficient vectors are dense. Inspired by th… ▽ More Sparse shrunk additive models and sparse random feature models have been developed separately as methods to learn low-order functions, where there are few interactions between variables, but neither offers computational efficiency. On the other hand, $\ell_2$-based shrunk additive models are efficient but do not offer feature selection as the resulting coefficient vectors are dense. Inspired by the success of the iterative magnitude pruning technique in finding lottery tickets of neural networks, we propose a new method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit high-dimensional data with inherent low-dimensional structure in the form of sparse variable dependencies. Our method can be viewed as a combined process to construct and find sparse lottery tickets for two-layer dense networks. We explain the observed benefit of SHRIMP through a refined analysis on the generalization error for thresholded Basis Pursuit and resulting bounds on eigenvalues. From function approximation experiments on both synthetic data and real-world benchmark datasets, we show that SHRIMP obtains better than or competitive test accuracy compared to state-of-art sparse feature and additive methods such as SRFE-S, SSAM, and SALSA. Meanwhile, SHRIMP performs feature selection with low computational complexity and is robust to the pruning rate, indicating a robustness in the structure of the obtained subnetworks. We gain insight into the lottery ticket hypothesis through SHRIMP by noting a correspondence between our model and weight/neuron subnetworks. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2110.11477 [pdf, other]

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

Authors: Zhijun Chen, Hayden Schaeffer

Abstract: We provide (high probability) bounds on the condition number of random feature matrices. In particular, we show that if the complexity ratio $\frac{N}{m}$ where $N$ is the number of neurons and $m$ is the number of data samples scales like $\log^{-1}(N)$ or $\log(m)$, then the random feature matrix is well-conditioned. This result holds without the need of regularization and relies on establishing… ▽ More We provide (high probability) bounds on the condition number of random feature matrices. In particular, we show that if the complexity ratio $\frac{N}{m}$ where $N$ is the number of neurons and $m$ is the number of data samples scales like $\log^{-1}(N)$ or $\log(m)$, then the random feature matrix is well-conditioned. This result holds without the need of regularization and relies on establishing various concentration bounds between dependent components of the random feature matrix. Additionally, we derive bounds on the restricted isometry constant of the random feature matrix. We prove that the risk associated with regression problems using a random feature matrix exhibits the double descent phenomenon and that this is an effect of the double descent behavior of the condition number. The risk bounds include the underparameterized setting using the least squares problem and the overparameterized setting where using either the minimum norm interpolation problem or a sparse regression problem. For the least squares or sparse regression cases, we show that the risk decreases as $m$ and $N$ increase, even in the presence of bounded or random noise. The risk bound matches the optimal scaling in the literature and the constants in our results are explicit and independent of the dimension of the data. △ Less

Submitted 4 November, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2103.03191 [pdf, other]

Generalization Bounds for Sparse Random Feature Expansions

Authors: Abolfazl Hashemi, Hayden Schaeffer, Robert Shi, Ufuk Topcu, Giang Tran, Rachel Ward

Abstract: Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting t… ▽ More Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature expansion to obtain parsimonious random feature models. Specifically, we leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. In particular, we provide generalization bounds for functions in a certain class (that is dense in a reproducing kernel Hilbert space) depending on the number of samples and the distribution of features. The generalization bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. In particular, by introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that the sparse random feature expansions outperforms shallow networks in several scientific machine learning tasks. △ Less

Submitted 20 August, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

arXiv:2012.09940 [pdf, ps, other]

Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers

Authors: Kayla Bollinger, Hayden Schaeffer

Abstract: This paper presents a nonlinear model reduction method for systems of equations using a structured neural network. The neural network takes the form of a "three-layer" network with the first layer constrained to lie on the Grassmann manifold and the first activation function set to identity, while the remaining network is a standard two-layer ReLU neural network. The Grassmann layer determines the… ▽ More This paper presents a nonlinear model reduction method for systems of equations using a structured neural network. The neural network takes the form of a "three-layer" network with the first layer constrained to lie on the Grassmann manifold and the first activation function set to identity, while the remaining network is a standard two-layer ReLU neural network. The Grassmann layer determines the reduced basis for the input space, while the remaining layers approximate the nonlinear input-output system. The training alternates between learning the reduced basis and the nonlinear approximation, and is shown to be more effective than fixing the reduced basis and training the network only. An additional benefit of this approach is, for data that lie on low-dimensional subspaces, that the number of parameters in the network does not need to be large. We show that our method can be applied to scientific problems in the data-scarce regime, which is typically not well-suited for neural network approximations. Examples include reduced order modeling for nonlinear dynamical systems and several aerospace engineering problems. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: 18 pages, 2 Figures

arXiv:1908.03190 [pdf, other]

NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data

Authors: Yifan Sun, Linan Zhang, Hayden Schaeffer

Abstract: We propose a neural network based approach for extracting models from dynamic data using ordinary and partial differential equations. In particular, given a time-series or spatio-temporal dataset, we seek to identify an accurate governing system which respects the intrinsic differential structure. The unknown governing model is parameterized by using both (shallow) multilayer perceptrons and nonli… ▽ More We propose a neural network based approach for extracting models from dynamic data using ordinary and partial differential equations. In particular, given a time-series or spatio-temporal dataset, we seek to identify an accurate governing system which respects the intrinsic differential structure. The unknown governing model is parameterized by using both (shallow) multilayer perceptrons and nonlinear differential terms, in order to incorporate relevant correlations between spatio-temporal samples. We demonstrate the approach on several examples where the data is sampled from various dynamical systems and give a comparison to recurrent networks and other data-discovery methods. In addition, we show that for MNIST and Fashion MNIST, our approach lowers the parameter cost as compared to other deep neural networks. △ Less

Submitted 8 August, 2019; originally announced August 2019.

arXiv:1908.01753 [pdf, ps, other]

Extending the step-size restriction for gradient descent to avoid strict saddle points

Authors: Hayden Schaeffer, Scott G. McCalla

Abstract: We provide larger step-size restrictions for which gradient descent based algorithms (almost surely) avoid strict saddle points. In particular, consider a twice differentiable (non-convex) objective function whose gradient has Lipschitz constant L and whose Hessian is well-behaved. We prove that the probability of initial conditions for gradient descent with step-size up to 2/L converging to a str… ▽ More We provide larger step-size restrictions for which gradient descent based algorithms (almost surely) avoid strict saddle points. In particular, consider a twice differentiable (non-convex) objective function whose gradient has Lipschitz constant L and whose Hessian is well-behaved. We prove that the probability of initial conditions for gradient descent with step-size up to 2/L converging to a strict saddle point, given one uniformly random initialization, is zero. This extends previous results up to the sharp limit imposed by the convex case. In addition, the arguments hold in the case when a learning rate schedule is given, with either a continuous decaying rate or a piece-wise constant schedule. △ Less

Submitted 5 August, 2019; originally announced August 2019.

arXiv:1811.10115 [pdf, other]

Recovery guarantees for polynomial approximation from dependent data with outliers

Authors: Lam Si Tung Ho, Hayden Schaeffer, Giang Tran, Rachel Ward

Abstract: Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we… ▽ More Learning non-linear systems from noisy, limited, and/or dependent data is an important task across various scientific fields including statistics, engineering, computer science, mathematics, and many more. In general, this learning task is ill-posed; however, additional information about the data's structure or on the behavior of the unknown function can make the task well-posed. In this work, we study the problem of learning nonlinear functions from corrupted and dependent data. The learning problem is recast as a sparse robust linear regression problem where we incorporate both the unknown coefficients and the corruptions in a basis pursuit framework. The main contribution of our paper is to provide a reconstruction guarantee for the associated $\ell_1$-optimization problem where the sampling matrix is formed from dependent data. Specifically, we prove that the sampling matrix satisfies the null space property and the stable null space property, provided that the data is compact and satisfies a suitable concentration inequality. We show that our recovery results are applicable to various types of dependent data such as exponentially strongly $α$-mixing data, geometrically $\mathcal{C}$-mixing data, and uniformly ergodic Markov chain. Our theoretical results are verified via several numerical simulations. △ Less

Submitted 25 November, 2018; originally announced November 2018.

Comments: 17 pages, 1 figure

MSC Class: 68T05; 41A10; 60F05; 68Q32; 62G08; 94A15; 65K10

arXiv:1811.09885 [pdf, other]

Forward Stability of ResNet and Its Variants

Authors: Linan Zhang, Hayden Schaeffer

Abstract: The residual neural network (ResNet) is a popular deep network architecture which has the ability to obtain high-accuracy results on several image processing problems. In order to analyze the behavior and structure of ResNet, recent work has been on establishing connections between ResNets and continuous-time optimal control problems. In this work, we show that the post-activation ResNet is relate… ▽ More The residual neural network (ResNet) is a popular deep network architecture which has the ability to obtain high-accuracy results on several image processing problems. In order to analyze the behavior and structure of ResNet, recent work has been on establishing connections between ResNets and continuous-time optimal control problems. In this work, we show that the post-activation ResNet is related to an optimal control problem with differential inclusions, and provide continuous-time stability results for the differential inclusion associated with ResNet. Motivated by the stability conditions, we show that alterations of either the architecture or the optimization problem can generate variants of ResNet which improve the theoretical stability bounds. In addition, we establish stability bounds for the full (discrete) network associated with two variants of ResNet, in particular, bounds on the growth of the features and a measure of the sensitivity of the features with respect to perturbations. These results also help to show the relationship between the depth, regularization, and stability of the feature space. Computational experiments on the proposed variants show that the accuracy of ResNet is preserved and that the accuracy seems to be monotone with respect to the depth and various corruptions. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: 35 pages, 8 figures, 5 tables

arXiv:1806.09451 [pdf, other]

doi 10.1088/1361-6420/aad1c7

Stability and Error Estimates of BV Solutions to the Abel Inverse Problem

Authors: Linan Zhang, Hayden Schaeffer

Abstract: Reconstructing images from ill-posed inverse problems often utilizes total variation regularization in order to recover discontinuities in the data while also removing noise and other artifacts. Total variation regularization has been successful in recovering images for (noisy) Abel transformed data, where object boundaries and data support will lead to sharp edges in the reconstructed image. In t… ▽ More Reconstructing images from ill-posed inverse problems often utilizes total variation regularization in order to recover discontinuities in the data while also removing noise and other artifacts. Total variation regularization has been successful in recovering images for (noisy) Abel transformed data, where object boundaries and data support will lead to sharp edges in the reconstructed image. In this work, we analyze the behavior of BV solutions to the Abel inverse problem, deriving a priori estimates on the recovery. In particular, we provide L2-stability bounds on BV solutions to the Abel inverse problem. These bounds yield error estimates on images reconstructed from a proposed total variation regularized minimization problem. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: 40 pages, 3 figures, 2 tables

arXiv:1805.06445 [pdf, other]

On the Convergence of the SINDy Algorithm

Authors: Linan Zhang, Hayden Schaeffer

Abstract: One way to understand time-series data is to identify the underlying dynamical system which generates it. This task can be done by selecting an appropriate model and a set of parameters which best fits the dynamics while providing the simplest representation (i.e. the smallest amount of terms). One such approach is the sparse identification of nonlinear dynamics framework [6] which uses a sparsity… ▽ More One way to understand time-series data is to identify the underlying dynamical system which generates it. This task can be done by selecting an appropriate model and a set of parameters which best fits the dynamics while providing the simplest representation (i.e. the smallest amount of terms). One such approach is the sparse identification of nonlinear dynamics framework [6] which uses a sparsity-promoting algorithm that iterates between a partial least-squares fit and a thresholding (sparsity-promoting) step. In this work, we provide some theoretical results on the behavior and convergence of the algorithm proposed in [6]. In particular, we prove that the algorithm approximates local minimizers of an unconstrained $\ell^0$-penalized least-squares problem. From this, we provide sufficient conditions for general convergence, rate of convergence, and conditions for one-step recovery. Examples illustrate that the rates of convergence are sharp. In addition, our results extend to other algorithms related to the algorithm in [6], and provide theoretical verification to several observed phenomena. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: 24 pages, 4 figures, 3 tables

arXiv:1805.04158 [pdf, other]

Extracting structured dynamical systems using sparse optimization with very few samples

Authors: Hayden Schaeffer, Giang Tran, Rachel Ward, Linan Zhang

Abstract: Learning governing equations allows for deeper understanding of the structure and dynamics of data. We present a random sampling method for learning structured dynamical systems from under-sampled and possibly noisy state-space measurements. The learning problem takes the form of a sparse least-squares fitting over a large set of candidate functions. Based on a Bernstein-like inequality for partly… ▽ More Learning governing equations allows for deeper understanding of the structure and dynamics of data. We present a random sampling method for learning structured dynamical systems from under-sampled and possibly noisy state-space measurements. The learning problem takes the form of a sparse least-squares fitting over a large set of candidate functions. Based on a Bernstein-like inequality for partly dependent random variables, we provide theoretical guarantees on the recovery rate of the sparse coefficients and the identification of the candidate functions for the corresponding problem. Computational results are demonstrated on datasets generated by the Lorenz 96 equation, the viscous Burgers' equation, and the two-component reaction-diffusion equations (which is challenging due to parameter sensitives in the model). This formulation has several advantages including ease of use, theoretical guarantees of success, and computational efficiency with respect to ambient dimension and number of candidate functions. △ Less

Submitted 10 May, 2018; originally announced May 2018.

Comments: 37 pages, 6 figures, 6 tables

arXiv:1709.01558 [pdf, other]

Learning Dynamical Systems and Bifurcation via Group Sparsity

Authors: Hayden Schaeffer, Giang Tran, Rachel Ward

Abstract: Learning governing equations from a family of data sets which share the same physical laws but differ in bifurcation parameters is challenging. This is due, in part, to the wide range of phenomena that could be represented in the data sets as well as the range of parameter values. On the other hand, it is common to assume only a small number of candidate functions contribute to the observed dynami… ▽ More Learning governing equations from a family of data sets which share the same physical laws but differ in bifurcation parameters is challenging. This is due, in part, to the wide range of phenomena that could be represented in the data sets as well as the range of parameter values. On the other hand, it is common to assume only a small number of candidate functions contribute to the observed dynamics. Based on these observations, we propose a group-sparse penalized method for model selection and parameter estimation for such data. We also provide convergence guarantees for our proposed numerical scheme. Various numerical experiments including the 1D logistic equation, the 3D Lorenz sampled from different bifurcation regions, and a switching system provide numerical validation for our method and suggest potential applications to applied dynamical systems. △ Less

Submitted 5 September, 2017; originally announced September 2017.

Comments: 16 pages, 18 figures

MSC Class: 65P30; 65K10; 37N30; 15A12; 65L09; 65L99

arXiv:1707.08528 [pdf, ps, other]

Extracting Sparse High-Dimensional Dynamics from Limited Data

Authors: Hayden Schaeffer, Giang Tran, Rachel Ward

Abstract: Extracting governing equations from dynamic data is an essential task in model selection and parameter estimation. The form of the governing equation is rarely known a priori; however, based on the sparsity-of-effect principle one may assume that the number of candidate functions needed to represent the dynamics is very small. In this work, we leverage the sparse structure of the governing equatio… ▽ More Extracting governing equations from dynamic data is an essential task in model selection and parameter estimation. The form of the governing equation is rarely known a priori; however, based on the sparsity-of-effect principle one may assume that the number of candidate functions needed to represent the dynamics is very small. In this work, we leverage the sparse structure of the governing equations along with recent results from random sampling theory to develop methods for selecting dynamical systems from under-sampled data. In particular, we detail three sampling strategies that lead to the exact recovery of first-order dynamical systems when we are given fewer samples than unknowns. The first method makes no assumptions on the behavior of the data, and requires a certain number of random initial samples. The second method utilizes the structure of the governing equation to limit the number of random initializations needed. The third method leverages chaotic behavior in the data to construct a nearly deterministic sampling strategy. Using results from compressive sensing, we show that the strategies lead to exact recovery, which is stable to the sparse structure of the governing equations and robust to noise in the estimation of the velocity. Computational results validate each of the sampling strategies and highlight potential applications. △ Less

Submitted 18 October, 2018; v1 submitted 26 July, 2017; originally announced July 2017.

Comments: 22 pages, 2 figures, 4 tables

MSC Class: 34F05; 37H99; 65P99; 65L09; 65L99; 37N30

arXiv:1505.00552 [pdf, ps, other]

Lexicographic Generation of Projective Spaces

Authors: Christoph Hering, Hans-Jörg Schaeffer

Abstract: Lexicographic or first choice constructions of geometric objects sometimes lead to amazingly good results. Usually it is difficult to determine the precise identity of these geometries. Here we find infinitely many cases where the identification actually can be accomplished. Lexicographic or first choice constructions of geometric objects sometimes lead to amazingly good results. Usually it is difficult to determine the precise identity of these geometries. Here we find infinitely many cases where the identification actually can be accomplished. △ Less

Submitted 4 May, 2015; originally announced May 2015.

Comments: 3 pages

MSC Class: 51E15; 05B15

arXiv:1503.01988 [pdf, other]

doi 10.1051/0004-6361/201425529

The ALMA Band 9 receiver - Design, construction, characterization, and first light

Authors: A. M. Baryshev, R. Hesper, F. P. Mena, T. M. Klapwijk, T. A. van Kempen, M. R. Hogerheijde, B. D. Jackson, J. Adema, G. J. Gerlofsma, M. E. Bekema, J. Barkhof, L. H. R. de Haan-Stijkel, M. van den Bemt, A. Koops, K. Keizer, C. Pieters, J. Koops van het Jagt, H. H. A. Schaeffer, T. Zijlstra, M. Kroug, C. F. J. Lodewijk, K. Wielinga, W. Boland, M. W. M. de Graauw, E. F. van Dishoeck , et al. (2 additional authors not shown)

Abstract: We describe the design, construction, and characterization of the Band 9 heterodyne receivers (600-720 GHz) for the Atacama Large Millimeter / submillimeter Array (ALMA). The ALMA Band 9 receiver units ("cartridges"), which are installed in the telescope's front end, have been designed to detect and down-convert two orthogonal linear polarization components of the light collected by the ALMA anten… ▽ More We describe the design, construction, and characterization of the Band 9 heterodyne receivers (600-720 GHz) for the Atacama Large Millimeter / submillimeter Array (ALMA). The ALMA Band 9 receiver units ("cartridges"), which are installed in the telescope's front end, have been designed to detect and down-convert two orthogonal linear polarization components of the light collected by the ALMA antennas. The light entering the front end is refocused with a compact arrangement of mirrors, which is fully contained within the cartridge. The arrangement contains a grid to separate the polarizations and two beam splitters to combine each resulting beam with a local oscillator signal. The combined beams are fed into independent double-sideband mixers, each with a corrugated feedhorn coupling the radiation by way of a waveguide with backshort cavity into an impedance-tuned SIS junction that performs the heterodyne down-conversion. Finally, the generated intermediate frequency signals are amplified by cryogenic and room-temperature HEMT amplifiers and exported to the telescope's back end for further processing and, finally, correlation. The receivers have been constructed and tested in the laboratory and they show excellent performance, complying with ALMA requirements. Performance statistics on all 73 Band 9 receivers are reported. On-sky characterization and tests of the performance of the Band 9 cartridges are presented using commissioning data. △ Less

Submitted 6 March, 2015; originally announced March 2015.

Journal ref: A&A 577, A129 (2015)

arXiv:1404.1370 [pdf, ps, other]

An L1 Penalty Method for General Obstacle Problems

Authors: Giang Tran, Hayden Schaeffer, William M. Feldman, Stanley J. Osher

Abstract: We construct an efficient numerical scheme for solving obstacle problems in divergence form. The numerical method is based on a reformulation of the obstacle in terms of an L1-like penalty on the variational problem. The reformulation is an exact regularizer in the sense that for large (but finite) penalty parameter, we recover the exact solution. Our formulation is applied to classical elliptic o… ▽ More We construct an efficient numerical scheme for solving obstacle problems in divergence form. The numerical method is based on a reformulation of the obstacle in terms of an L1-like penalty on the variational problem. The reformulation is an exact regularizer in the sense that for large (but finite) penalty parameter, we recover the exact solution. Our formulation is applied to classical elliptic obstacle problems as well as some related free boundary problems, for example the two-phase membrane problem and the Hele-Shaw model. One advantage of the proposed method is that the free boundary inherent in the obstacle problem arises naturally in our energy minimization without any need for problem specific or complicated discretization. In addition, our scheme also works for nonlinear variational inequalities arising from convex minimization problems. △ Less

Submitted 4 April, 2014; originally announced April 2014.

Comments: 20 pages, 18 figures

arXiv:1311.5850 [pdf, ps, other]

PDEs with Compressed Solutions

Authors: Russel E. Caflisch, Stanley J. Osher, Hayden Schaeffer, Giang Tran

Abstract: Sparsity plays a central role in recent developments in signal processing, linear algebra, statistics, optimization, and other fields. In these developments, sparsity is promoted through the addition of an $L^1$ norm (or related quantity) as a constraint or penalty in a variational principle. We apply this approach to partial differential equations that come from a variational quantity, either by… ▽ More Sparsity plays a central role in recent developments in signal processing, linear algebra, statistics, optimization, and other fields. In these developments, sparsity is promoted through the addition of an $L^1$ norm (or related quantity) as a constraint or penalty in a variational principle. We apply this approach to partial differential equations that come from a variational quantity, either by minimization (to obtain an elliptic PDE) or by gradient flow (to obtain a parabolic PDE). Also, we show that some PDEs can be rewritten in an $L^1$ form, such as the divisible sandpile problem and signum-Gordon. Addition of an $L^1$ term in the variational principle leads to a modified PDE where a subgradient term appears. It is known that modified PDEs of this form will often have solutions with compact support, which corresponds to the discrete solution being sparse. We show that this is advantageous numerically through the use of efficient algorithms for solving $L^1$ based problems. △ Less

Submitted 1 August, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

Comments: 21 pages, 15 figures

arXiv:1212.4132 [pdf, ps, other]

doi 10.1073/pnas.1302752110

Sparse Dynamics for Partial Differential Equations

Authors: Hayden Schaeffer, Stanley Osher, Russel Caflisch, Cory Hauck

Abstract: We investigate the approximate dynamics of several differential equations when the solutions are restricted to a sparse subset of a given basis. The restriction is enforced at every time step by simply applying soft thresholding to the coefficients of the basis approximation. By reducing or compressing the information needed to represent the solution at every step, only the essential dynamics are… ▽ More We investigate the approximate dynamics of several differential equations when the solutions are restricted to a sparse subset of a given basis. The restriction is enforced at every time step by simply applying soft thresholding to the coefficients of the basis approximation. By reducing or compressing the information needed to represent the solution at every step, only the essential dynamics are represented. In many cases, there are natural bases derived from the differential equations which promote sparsity. We find that our method successfully reduces the dynamics of convection equations, diffusion equations, weak shocks, and vorticity equations with high frequency source terms. △ Less

Submitted 17 December, 2012; originally announced December 2012.

Showing 1–28 of 28 results for author: Schaeffer, H