Search | arXiv e-print repository

Uncertainty Quantification of Graph Convolution Neural Network Models of Evolving Processes

Authors: Jeremiah Hauth, Cosmin Safta, Xun Huan, Ravi G. Patel, Reese E. Jones

Abstract: The application of neural network models to scientific machine learning tasks has proliferated in recent years. In particular, neural network models have proved to be adept at modeling processes with spatial-temporal complexity. Nevertheless, these highly parameterized models have garnered skepticism in their ability to produce outputs with quantified error bounds over the regimes of interest. Hen… ▽ More The application of neural network models to scientific machine learning tasks has proliferated in recent years. In particular, neural network models have proved to be adept at modeling processes with spatial-temporal complexity. Nevertheless, these highly parameterized models have garnered skepticism in their ability to produce outputs with quantified error bounds over the regimes of interest. Hence there is a need to find uncertainty quantification methods that are suitable for neural networks. In this work we present comparisons of the parametric uncertainty quantification of neural networks modeling complex spatial-temporal processes with Hamiltonian Monte Carlo and Stein variational gradient descent and its projected variant. Specifically we apply these methods to graph convolutional neural network models of evolving systems modeled with recurrent neural network and neural ordinary differential equations architectures. We show that Stein variational inference is a viable alternative to Monte Carlo methods with some clear advantages for complex neural network models. For our exemplars, Stein variational interference gave similar uncertainty profiles through time compared to Hamiltonian Monte Carlo, albeit with generally more generous variance.Projected Stein variational gradient descent also produced similar uncertainty profiles to the non-projected counterpart, but large reductions in the active weight space were confounded by the stability of the neural network predictions and the convoluted likelihood landscape. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 27 pages, 20 figures

arXiv:2305.14991 [pdf, other]

MuLER: Detailed and Scalable Reference-based Evaluation

Authors: Taelin Karidi, Leshem Choshen, Gal Patel, Omri Abend

Abstract: We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can l… ▽ More We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can lead to targeted improvement efforts for specific phenomena. We perform experiments in both synthetic and naturalistic settings to support MuLER's validity and showcase its usability in MT evaluation, and other tasks, such as summarization. Analyzing all submissions to WMT in 2014-2020, we find consistent trends. For example, nouns and verbs are among the most frequent POS tags. However, they are among the hardest to translate. Performance on most POS tags improves with overall system performance, but a few are not thus correlated (their identity changes from language to language). Preliminary experiments with summarization reveal similar trends. △ Less

Submitted 29 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2304.09750 [pdf, other]

Application of Tensor Neural Networks to Pricing Bermudan Swaptions

Authors: Raj G. Patel, Tomas Dominguez, Mohammad Dib, Samuel Palmer, Andrea Cadarso, Fernando De Lope Contreras, Abdelkader Ratnani, Francisco Gomez Casanova, Senaida Hernández-Santana, Álvaro Díaz-Fernández, Eva Andrés, Jorge Luis-Hita, Escolástico Sánchez-Martínez, Samuel Mugel, Roman Orus

Abstract: The Cheyette model is a quasi-Gaussian volatility interest rate model widely used to price interest rate derivatives such as European and Bermudan Swaptions for which Monte Carlo simulation has become the industry standard. In low dimensions, these approaches provide accurate and robust prices for European Swaptions but, even in this computationally simple setting, they are known to underestimate… ▽ More The Cheyette model is a quasi-Gaussian volatility interest rate model widely used to price interest rate derivatives such as European and Bermudan Swaptions for which Monte Carlo simulation has become the industry standard. In low dimensions, these approaches provide accurate and robust prices for European Swaptions but, even in this computationally simple setting, they are known to underestimate the value of Bermudan Swaptions when using the state variables as regressors. This is mainly due to the use of a finite number of predetermined basis functions in the regression. Moreover, in high-dimensional settings, these approaches succumb to the Curse of Dimensionality. To address these issues, Deep-learning techniques have been used to solve the backward Stochastic Differential Equation associated with the value process for European and Bermudan Swaptions; however, these methods are constrained by training time and memory. To overcome these limitations, we propose leveraging Tensor Neural Networks as they can provide significant parameter savings while attaining the same accuracy as classical Dense Neural Networks. In this paper we rigorously benchmark the performance of Tensor Neural Networks and Dense Neural Networks for pricing European and Bermudan Swaptions, and we show that Tensor Neural Networks can be trained faster than Dense Neural Networks and provide more accurate and robust prices than their Dense counterparts. △ Less

Submitted 10 March, 2024; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: 16 pages, 9 figures, 2 tables, minor changes

arXiv:2302.14290 [pdf, other]

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Authors: Gaurav Patel, Konda Reddy Mopuri, Qiang Qiu

Abstract: Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher neural network to a Student neural network in the absence of training data. However, in the Adversarial DFKD framework, the student network's accuracy, suffers due to the non-stationary distribution of the pseudo-samples under multiple generator update… ▽ More Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher neural network to a Student neural network in the absence of training data. However, in the Adversarial DFKD framework, the student network's accuracy, suffers due to the non-stationary distribution of the pseudo-samples under multiple generator updates. To this end, at every generator update, we aim to maintain the student's performance on previously encountered examples while acquiring knowledge from samples of the current distribution. Thus, we propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test, respectively. Hence, we dub our method as Learning to Retain while Acquiring. Moreover, we identify an implicit aligning factor between the Knowledge-Retention and Knowledge-Acquisition tasks indicating that the proposed student update strategy enforces a common gradient direction for both tasks, alleviating interference between the two objectives. Finally, we support our hypothesis by exhibiting extensive evaluation and comparison of our method with prior arts on multiple datasets. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at CVPR 2023

arXiv:2212.14076 [pdf, other]

Quantum-Inspired Tensor Neural Networks for Option Pricing

Authors: Raj G. Patel, Chia-Wei Hsing, Serkan Sahin, Samuel Palmer, Saeed S. Jahromi, Shivam Sharma, Tomas Dominguez, Kris Tziritas, Christophe Michel, Vincent Porte, Mustafa Abid, Stephane Aubert, Pierre Castellani, Samuel Mugel, Roman Orus

Abstract: Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Altho… ▽ More Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory. △ Less

Submitted 10 March, 2024; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 11 pages, 8 figures, minor changes. arXiv admin note: substantial text overlap with arXiv:2208.02235

arXiv:2209.00641 [pdf, other]

Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised Text Recognition

Authors: Gaurav Patel, Jan Allebach, Qiang Qiu

Abstract: This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the most popular SSL approaches is pseudo-labeling (PL). PL approaches assign labels to unlabeled data before re-training the model with a combination of labeled and pseudo-labeled data. However, PL methods are severely degraded by noise and are prone to over-fitting to noisy labels, due to the inclusion of… ▽ More This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the most popular SSL approaches is pseudo-labeling (PL). PL approaches assign labels to unlabeled data before re-training the model with a combination of labeled and pseudo-labeled data. However, PL methods are severely degraded by noise and are prone to over-fitting to noisy labels, due to the inclusion of erroneous high confidence pseudo-labels generated from poorly calibrated models, thus, rendering threshold-based selection ineffective. Moreover, the combinatorial complexity of the hypothesis space and the error accumulation due to multiple incorrect autoregressive steps posit pseudo-labeling challenging for sequence models. To this end, we propose a pseudo-label generation and an uncertainty-based data selection framework for semi-supervised text recognition. We first use Beam-Search inference to yield highly probable hypotheses to assign pseudo-labels to the unlabelled examples. Then we adopt an ensemble of models, sampled by applying dropout, to obtain a robust estimate of the uncertainty associated with the prediction, considering both the character-level and word-level predictive distribution to select good quality pseudo-labels. Extensive experiments on several benchmark handwriting and scene-text datasets show that our method outperforms the baseline approaches and the previous state-of-the-art semi-supervised text-recognition methods. △ Less

Submitted 6 October, 2022; v1 submitted 30 August, 2022; originally announced September 2022.

Comments: Accepted at WACV 2023

arXiv:2207.02891 [pdf, other]

Don't overfit the history -- Recursive time series data augmentation

Authors: Amine Mohamed Aboussalah, Min-Jae Kwon, Raj G Patel, Cheng Chi, Chi-Guhn Lee

Abstract: Time series observations can be seen as realizations of an underlying dynamical system governed by rules that we typically do not know. For time series learning tasks, we need to understand that we fit our model on available data, which is a unique realized history. Training on a single realization often induces severe overfitting lacking generalization. To address this issue, we introduce a gener… ▽ More Time series observations can be seen as realizations of an underlying dynamical system governed by rules that we typically do not know. For time series learning tasks, we need to understand that we fit our model on available data, which is a unique realized history. Training on a single realization often induces severe overfitting lacking generalization. To address this issue, we introduce a general recursive framework for time series augmentation, which we call Recursive Interpolation Method, denoted as RIM. New samples are generated using a recursive interpolation function of all previous values in such a way that the enhanced samples preserve the original inherent time series dynamics. We perform theoretical analysis to characterize the proposed RIM and to guarantee its test performance. We apply RIM to diverse real world time series cases to achieve strong performance over non-augmented data on regression, classification, and reinforcement learning tasks. △ Less

Submitted 28 January, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: Accepted to ICLR 2023 Resubmitted here due to major change in proofs following conference submission

arXiv:2204.10909 [pdf, other]

Error-in-variables modelling for operator learning

Authors: Ravi G. Patel, Indu Manickam, Myoungkyu Lee, Mamikon Gulian

Abstract: Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typicall… ▽ More Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator. △ Less

Submitted 19 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 23 pages, 10 figures

arXiv:2110.03067 [pdf, other]

On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

Authors: Gal Patel, Leshem Choshen, Omri Abend

Abstract: We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems. We demonstrate our model-agnostic approach with the Transformer English-German translation model. We analyze neuron-level correlation of activations between paraphrases while discussing the methodology challenges and the need for confound analysis to isolate the effe… ▽ More We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems. We demonstrate our model-agnostic approach with the Transformer English-German translation model. We analyze neuron-level correlation of activations between paraphrases while discussing the methodology challenges and the need for confound analysis to isolate the effects of shallow cues. We find that similarity between activation patterns can be mostly accounted for by similarity in word choice and sentence length. Following that, we manipulate neuron activations to control the syntactic form of the output. We show this intervention to be somewhat successful, indicating that deep models capture sentence-structure distinctions, despite finding no such indication at the neuron level. To conduct our experiments, we develop a semi-automatic method to generate meaning-preserving minimal pair paraphrases (active-passive voice and adverbial clause-noun phrase) and compile a corpus of such pairs. △ Less

Submitted 2 November, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

arXiv:2104.02488 [pdf, other]

Weakly supervised segmentation with cross-modality equivariant constraints

Authors: Gaurav Patel, Jose Dolz

Abstract: Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation. Most current approaches exploit class activation maps (CAMs), which can be generated from image-level annotations. Nevertheless, resulting maps have been demonstrated to be highly discriminant, failing to serve as optimal proxy pixel-level labels. We present… ▽ More Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation. Most current approaches exploit class activation maps (CAMs), which can be generated from image-level annotations. Nevertheless, resulting maps have been demonstrated to be highly discriminant, failing to serve as optimal proxy pixel-level labels. We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs. In particular, the proposed method is based on two observations. First, the learning of fully-supervised segmentation networks implicitly imposes equivariance by means of data augmentation, whereas this implicit constraint disappears on CAMs generated with image tags. And second, the commonalities between image modalities can be employed as an efficient self-supervisory signal, correcting the inconsistency shown by CAMs obtained across multiple modalities. To effectively train our model, we integrate a novel loss function that includes a within-modality and a cross-modality equivariant term to explicitly impose these constraints during training. In addition, we add a KL-divergence on the class prediction distributions to facilitate the information exchange between modalities, which, combined with the equivariant regularizers further improves the performance of our model. Exhaustive experiments on the popular multi-modal BRATS dataset demonstrate that our approach outperforms relevant recent literature under the same learning conditions. △ Less

Submitted 13 January, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Under Review at MedIA. Code available

arXiv:2101.11256 [pdf, other]

Partition of unity networks: deep hp-approximation

Authors: Kook** Lee, Nathaniel A. Trask, Ravi G. Patel, Mamikon A. Gulian, Eric C. Cyr

Abstract: Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measu… ▽ More Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measures are used to build a meshfree partition of space, while polynomial spaces with learnable coefficients are associated to each partition. The resulting hp-element-like approximation allows use of a fast least-squares optimizer, and the resulting architecture size need not scale exponentially with spatial dimension, breaking the curse of dimensionality. An abstract approximation result establishes desirable properties to guide network design. Numerical results for two choices of architecture demonstrate that POUnets yield hp-convergence for smooth functions and consistently outperform MLPs for piecewise polynomial functions with large numbers of discontinuities. △ Less

Submitted 27 January, 2021; originally announced January 2021.

Comments: 8 pages, 5 figures

arXiv:2009.11992 [pdf, other]

doi 10.1016/j.cma.2020.113500

A physics-informed operator regression framework for extracting data-driven continuum models

Authors: Ravi G. Patel, Nathaniel A. Trask, Mitchell A. Wood, Eric C. Cyr

Abstract: The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a… ▽ More The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 37 pages, 15 figures

arXiv:2008.03750 [pdf, other]

Switching Loss for Generalized Nucleus Detection in Histopathology

Authors: Deepak Anand, Gaurav Patel, Yaman Dang, Amit Sethi

Abstract: The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is ba… ▽ More The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is based on Dice loss, which is better suited for segmentation and detection. Furthermore, to get the most out the training samples, we adapt the loss with each mini-batch, unlike previous proposals that adapt once for the entire training set. A nucleus detector trained using the proposed loss function on a source dataset outperformed those trained using cross-entropy, Dice, or focal losses. Remarkably, without retraining on target datasets, our pre-trained nucleus detector also outperformed existing nucleus detectors that were trained on at least some of the images from the target datasets. To establish a broad utility of the proposed loss, we also confirmed that it led to more accurate ventricle segmentation in MRI as compared to the other loss functions. Our GPU-enabled pre-trained nucleus detection software is also ready to process whole slide images right out-of-the-box and is usably fast. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2006.10123 [pdf, other]

A block coordinate descent optimizer for classification problems exploiting convexity

Authors: Ravi G. Patel, Nathaniel A. Trask, Mamikon A. Gulian, Eric C. Cyr

Abstract: Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid N… ▽ More Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting possible gains of second-order methods over gradient descent. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 10 pages, 4 figures

arXiv:1912.04862 [pdf, other]

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Authors: Eric C. Cyr, Mamikon A. Gulian, Ravi G. Patel, Mauro Perego, Nathaniel A. Trask

Abstract: Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram… ▽ More Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 26 pages

arXiv:1909.05371 [pdf, other]

GMLS-Nets: A framework for learning from unstructured data

Authors: Nathaniel Trask, Ravi G. Patel, Ben J. Gross, Paul J. Atzberger

Abstract: Data fields sampled on irregularly spaced points arise in many applications in the sciences and engineering. For regular grids, Convolutional Neural Networks (CNNs) have been successfully used to gaining benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Moving Least Squares (GMLS). GMLS is a non-parame… ▽ More Data fields sampled on irregularly spaced points arise in many applications in the sciences and engineering. For regular grids, Convolutional Neural Networks (CNNs) have been successfully used to gaining benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Moving Least Squares (GMLS). GMLS is a non-parametric technique for estimating linear bounded functionals from scattered data, and has recently been used in the literature for solving partial differential equations. By parameterizing the GMLS estimator, we obtain learning methods for operators with unstructured stencils. In GMLS-Nets the necessary calculations are local, readily parallelizable, and the estimator is supported by a rigorous approximation theory. We show how the framework may be used for unstructured physical data sets to perform functional regression to identify associated differential operators and to regress quantities of interest. The results suggest the architectures to be an attractive foundation for data-driven model development in scientific machine learning applications. △ Less

Submitted 13 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

Journal ref: AAAI-MLPS Proceedings, (2020)

arXiv:1810.08552 [pdf, ps, other]

Nonlinear integro-differential operator regression with neural networks

Authors: Ravi G. Patel, Olivier Desjardins

Abstract: This note introduces a regression technique for finding a class of nonlinear integro-differential operators from data. The method parametrizes the spatial operator with neural networks and Fourier transforms such that it can fit a class of nonlinear operators without needing a library of a priori selected operators. We verify that this method can recover the spatial operators in the fractional hea… ▽ More This note introduces a regression technique for finding a class of nonlinear integro-differential operators from data. The method parametrizes the spatial operator with neural networks and Fourier transforms such that it can fit a class of nonlinear operators without needing a library of a priori selected operators. We verify that this method can recover the spatial operators in the fractional heat equation and the Kuramoto-Sivashinsky equation from numerical solutions of the equations. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: 5 pages, 3 figures, preprint submitted to the Journal of Computational Physics

arXiv:1507.01818 [pdf, other]

Improved Upper Bounds on $a'(G\Box H)$

Authors: Punit Mehta, Rahul Muthu, Gaurav Patel, Om Thakkar, Devanshi Vyas

Abstract: The acyclic edge colouring problem is extensively studied in graph theory. The corner-stone of this field is a conjecture of Alon et. al.\cite{alonacyclic} that $a'(G)\le Δ(G)+2$. In that and subsequent work, $a'(G)$ is typically bounded in terms of $Δ(G)$. Motivated by this we introduce a term $gap(G)$ defined as $gap(G)=a'(G)-Δ(G)$. Alon's conjecture can be rephrased as $gap(G)\le2$ for all grap… ▽ More The acyclic edge colouring problem is extensively studied in graph theory. The corner-stone of this field is a conjecture of Alon et. al.\cite{alonacyclic} that $a'(G)\le Δ(G)+2$. In that and subsequent work, $a'(G)$ is typically bounded in terms of $Δ(G)$. Motivated by this we introduce a term $gap(G)$ defined as $gap(G)=a'(G)-Δ(G)$. Alon's conjecture can be rephrased as $gap(G)\le2$ for all graphs $G$. In \cite{manusccartprod} it was shown that $a'(G\Box H)\le a'(G)+a'(H)$, under some assumptions. Based on Alon's conjecture, we conjecture that $a'(G\Box H)\le a'(G)+Δ(H)$ under the same assumptions, resulting in a strengthening. The results of \cite{alonacyclic} validate our conjecture for the class of graphs it considers. We prove our conjecture for a significant subclass of sub-cubic graphs and state some generic conditions under which our conjecture can be proved. We suggest how our technique can be potentially applied by future researchers to expand the class of graphs for which our conjecture holds. Our results improve the understanding of the relationship between Cartesian Product and acyclic chromatic index. △ Less

Submitted 7 July, 2015; originally announced July 2015.

Comments: 10 pages, 5 figures

arXiv:1212.5462 [pdf, other]

doi 10.1109/ACSSC.2012.6488952

On the Impact of Phase Noise on Active Cancellation in Wireless Full-Duplex

Authors: Achaleshwar Sahai, Gaurav Patel, Chris Dick, Ashutosh Sabharwal

Abstract: Recent experimental results have shown that full-duplex communication is possible for short-range communications. However, extending full-duplex to long-range communication remains a challenge, primarily due to residual self-interference even with a combination of passive suppression and active cancellation methods. In this paper, we investigate the root cause of performance bottlenecks in current… ▽ More Recent experimental results have shown that full-duplex communication is possible for short-range communications. However, extending full-duplex to long-range communication remains a challenge, primarily due to residual self-interference even with a combination of passive suppression and active cancellation methods. In this paper, we investigate the root cause of performance bottlenecks in current full-duplex systems. We first classify all known full-duplex architectures based on how they compute their cancelling signal and where the cancelling signal is injected to cancel self-interference. Based on the classification, we analytically explain several published experimental results. The key bottleneck in current systems turns out to be the phase noise in the local oscillators in the transmit and receive chain of the full-duplex node. As a key by-product of our analysis, we propose signal models for wideband and MIMO full-duplex systems, capturing all the salient design parameters, and thus allowing future analytical development of advanced coding and signal design for full-duplex systems. △ Less

Submitted 21 December, 2012; originally announced December 2012.

Comments: 35 pages, Submitted to IEEE Transactions on Vehicular Technology, Dec 2012

arXiv:1107.0607 [pdf, other]

Pushing the limits of Full-duplex: Design and Real-time Implementation

Authors: Achaleshwar Sahai, Gaurav Patel, Ashutosh Sabharwal

Abstract: Recent work has shown the feasibility of single-channel full-duplex wireless physical layer, allowing nodes to send and receive in the same frequency band at the same time. In this report, we first design and implement a real-time 64-subcarrier 10 MHz full-duplex OFDM physical layer, FD-PHY. The proposed FD-PHY not only allows synchronous full-duplex transmissions but also selective asynchronous f… ▽ More Recent work has shown the feasibility of single-channel full-duplex wireless physical layer, allowing nodes to send and receive in the same frequency band at the same time. In this report, we first design and implement a real-time 64-subcarrier 10 MHz full-duplex OFDM physical layer, FD-PHY. The proposed FD-PHY not only allows synchronous full-duplex transmissions but also selective asynchronous full-duplex modes. Further, we show that in over-the-air experiments using optimal antenna placement on actual devices, the self-interference can be suppressed upto 80dB, which is 10dB more than prior reported results. Then we propose a full-duplex MAC protocol, FD-MAC, which builds on IEEE 802.11 with three new mechanisms -- shared random backoff, header snoo** and virtual backoffs. The new mechanisms allow FD-MAC to discover and exploit full-duplex opportunities in a distributed manner. Our over-the-air tests show over 70% throughput gains from using full-duplex over half-duplex in realistically used cases. △ Less

Submitted 4 July, 2011; originally announced July 2011.

Comments: 12 page Rice University technical report

Report number: TREE1104

Showing 1–20 of 20 results for author: Patel, G