Search | arXiv e-print repository

UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation

Authors: Junhong Shen, Tanya Marwah, Ameet Talwalkar

Abstract: We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to develo** unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding… ▽ More We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to develo** unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding and computationally expensive, we warm-start the transformer from pretrained LLMs and perform explicit alignment to reduce the modality gap while improving data and compute efficiency. The cross-modal UPS achieves state-of-the-art results on a wide range of 1D and 2D PDE families from PDEBench, outperforming existing unified models using 4 times less data and 26 times less compute. Meanwhile, it is capable of few-shot transfer to unseen PDE families and coefficients. △ Less

Submitted 23 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2312.00234 [pdf, other]

Deep Equilibrium Based Neural Operators for Steady-State PDEs

Authors: Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

Abstract: Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We… ▽ More Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2211.15853 [pdf, other]

Disentangling the Mechanisms Behind Implicit Regularization in SGD

Authors: Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary C. Lipton

Abstract: A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an… ▽ More A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. Additionally, we characterize how the quantities that SGD has been claimed to (implicitly) regularize change over the course of training. By using micro-batches, i.e. disjoint smaller subsets of each mini-batch, we empirically show that explicitly penalizing the gradient norm or the Fisher Information Matrix trace, averaged over micro-batches, in the large-batch regime recovers small-batch SGD generalization, whereas Jacobian-based regularizations fail to do so. This generalization performance is shown to often be correlated with how well the regularized model's gradient norms resemble those of small-batch SGD. We additionally show that this behavior breaks down as the micro-batch size approaches the batch size. Finally, we note that in this line of inquiry, positive experimental findings on CIFAR10 are often reversed on other datasets like CIFAR100, highlighting the need to test hypotheses on a wider collection of datasets. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: Accepted as Spotlight at the NeurIPS 2022 Workshop for Higher Order Optimization in Machine Learning

arXiv:2210.12101 [pdf, ps, other]

Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective

Authors: Tanya Marwah, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

Abstract: A burgeoning line of research leverages deep neural networks to approximate the solutions to high dimensional PDEs, opening lines of theoretical inquiry focused on explaining how it is that these models appear to evade the curse of dimensionality. However, most prior theoretical analyses have been limited to linear PDEs. In this work, we take a step towards studying the representational power of n… ▽ More A burgeoning line of research leverages deep neural networks to approximate the solutions to high dimensional PDEs, opening lines of theoretical inquiry focused on explaining how it is that these models appear to evade the curse of dimensionality. However, most prior theoretical analyses have been limited to linear PDEs. In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs. We focus on a class of PDEs known as \emph{nonlinear elliptic variational PDEs}, whose solutions minimize an \emph{Euler-Lagrange} energy functional $\mathcal{E}(u) = \int_ΩL(x, u(x), \nabla u(x)) - f(x) u(x)dx$. We show that if composing a function with Barron norm $b$ with partial derivatives of $L$ produces a function of Barron norm at most $B_L b^p$, the solution to the PDE can be $ε$-approximated in the $L^2$ sense by a function with Barron norm $O\left(\left(dB_L\right)^{\max\{p \log(1/ ε), p^{\log(1/ε)}\}}\right)$. By a classical result due to Barron [1993], this correspondingly bounds the size of a 2-layer neural network needed to approximate the solution. Treating $p, ε, B_L$ as constants, this quantity is polynomial in dimension, thus showing neural networks can evade the curse of dimensionality. Our proof technique involves neurally simulating (preconditioned) gradient in an appropriate Hilbert space, which converges exponentially fast to the solution of the PDE, and such that we can bound the increase of the Barron norm at each iterate. Our results subsume and substantially generalize analogous prior results for linear elliptic PDEs over a unit hypercube. △ Less

Submitted 27 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2103.02138 [pdf, ps, other]

Parametric Complexity Bounds for Approximating PDEs with Neural Networks

Authors: Tanya Marwah, Zachary C. Lipton, Andrej Risteski

Abstract: Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly esca** the curse of dimensionality. However, questions regarding the theoretical basis for such approximations, including the required network size, remain open. In this paper, we investigate the representational power of neural networks for approximating solutions to linear elliptic PD… ▽ More Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly esca** the curse of dimensionality. However, questions regarding the theoretical basis for such approximations, including the required network size, remain open. In this paper, we investigate the representational power of neural networks for approximating solutions to linear elliptic PDEs with Dirichlet boundary conditions. We prove that when a PDE's coefficients are representable by small neural networks, the parameters required to approximate its solution scale polynomially with the input dimension $d$ and proportionally to the parameter counts of the coefficient networks. To this we end, we develop a proof technique that simulates gradient descent (in an appropriate Hilbert space) by growing a neural network architecture whose iterates each participate as sub-networks in their (slightly larger) successors, and converge to the solution of the PDE. We bound the size of the solution, showing a polynomial dependence on $d$ and no dependence on the volume of the domain. △ Less

Submitted 6 July, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:1908.08147 [pdf, other]

Sentiment Dynamics in Social Media News Channels

Authors: Nagendra Kumar, Rakshita Nagalla, Tanya Marwah, Manish Singh

Abstract: Social media is currently one of the most important means of news communication. Since people are consuming a large fraction of their daily news through social media, most of the traditional news channels are using social media to catch the attention of users. Each news channel has its own strategies to attract more users. In this paper, we analyze how the news channels use sentiment to garner use… ▽ More Social media is currently one of the most important means of news communication. Since people are consuming a large fraction of their daily news through social media, most of the traditional news channels are using social media to catch the attention of users. Each news channel has its own strategies to attract more users. In this paper, we analyze how the news channels use sentiment to garner users' attention in social media. We compare the sentiment of social media news posts of television, radio and print media, to show the differences in the ways these channels cover the news. We also analyze users' reactions and opinion sentiment on news posts with different sentiments. We perform our experiments on a dataset extracted from Facebook Pages of five popular news channels. Our dataset contains 0.15 million news posts and 1.13 billion users reactions. The results of our experiments show that the sentiment of user opinion has a strong correlation with the sentiment of the news post and the type of information source. Our study also illustrates the differences among the social media news channels of different types of news sources. △ Less

Submitted 21 August, 2019; originally announced August 2019.

arXiv:1905.03743 [pdf, other]

Interactive Image Generation Using Scene Graphs

Authors: Gaurav Mittal, Shubham Agrawal, Anuva Agarwal, Sushant Mehta, Tanya Marwah

Abstract: Recent years have witnessed some exciting developments in the domain of generating images from scene-based text descriptions. These approaches have primarily focused on generating images from a static text description and are limited to generating images in a single pass. They are unable to generate an image interactively based on an incrementally additive text description (something that is more… ▽ More Recent years have witnessed some exciting developments in the domain of generating images from scene-based text descriptions. These approaches have primarily focused on generating images from a static text description and are limited to generating images in a single pass. They are unable to generate an image interactively based on an incrementally additive text description (something that is more intuitive and similar to the way we describe an image). We propose a method to generate an image incrementally based on a sequence of graphs of scene descriptions (scene-graphs). We propose a recurrent network architecture that preserves the image content generated in previous steps and modifies the cumulative image as per the newly provided scene information. Our model utilizes Graph Convolutional Networks (GCN) to cater to variable-sized scene graphs along with Generative Adversarial image translation networks to generate realistic multi-object images without needing any intermediate supervision during training. We experiment with Coco-Stuff dataset which has multi-object images along with annotations describing the visual scene and show that our model significantly outperforms other approaches on the same dataset in generating visually consistent images for incrementally growing scene graphs. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: Published at ICLR 2019 Deep Generative Models for Highly Structured Data Workshop

arXiv:1708.05980 [pdf, other]

Attentive Semantic Video Generation using Captions

Authors: Tanya Marwah, Gaurav Mittal, Vineeth N. Balasubramanian

Abstract: This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to disting… ▽ More This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. The network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network's ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition. (Accepted in International Conference in Computer Vision (ICCV) 2017) △ Less

Submitted 21 October, 2017; v1 submitted 20 August, 2017; originally announced August 2017.

Journal ref: Presented at ICCV 2017 (International Conference on Computer Vision)

arXiv:1611.10314 [pdf, other]

doi 10.1145/3123266.3123309

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

Authors: Gaurav Mittal, Tanya Marwah, Vineeth N. Balasubramanian

Abstract: This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW can also perform text-to-video generation which, to the best of our knowledge, makes it the first approach of its kind. It combines a Variational Autoencoder~(VAE) with a Recurrent Attention Mechanism in a novel manner to create a temporally dependent sequence of… ▽ More This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW can also perform text-to-video generation which, to the best of our knowledge, makes it the first approach of its kind. It combines a Variational Autoencoder~(VAE) with a Recurrent Attention Mechanism in a novel manner to create a temporally dependent sequence of frames that are gradually formed over time. The recurrent attention mechanism in Sync-DRAW attends to each individual frame of the video in sychronization, while the VAE learns a latent distribution for the entire video at the global level. Our experiments with Bouncing MNIST, KTH and UCF-101 suggest that Sync-DRAW is efficient in learning the spatial and temporal information of the videos and generates frames with high structural integrity, and can generate videos from simple captions on these datasets. (Accepted as oral paper in ACM-Multimedia 2017) △ Less

Submitted 21 October, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

Showing 1–9 of 9 results for author: Marwah, T