Search | arXiv e-print repository

DBO: Response Time Fairness for Cloud-Hosted Financial Exchanges

Authors: Prateesh Goyal, Eashan Gupta, Ilias Marinos, Chenxingyu Zhao, Radhika Mittal, Ranveer Chandra

Abstract: In this paper, we consider the problem of hosting financial exchanges in the cloud. Financial exchanges require predictable, equal latency to all market participants to ensure fairness for various tasks, such as high speed trading. However, it is extremely difficult to ensure equal latency to all market participants in existing cloud deployments, because of various reasons, such as congestion, and… ▽ More In this paper, we consider the problem of hosting financial exchanges in the cloud. Financial exchanges require predictable, equal latency to all market participants to ensure fairness for various tasks, such as high speed trading. However, it is extremely difficult to ensure equal latency to all market participants in existing cloud deployments, because of various reasons, such as congestion, and unequal network paths. In this paper, we address the unfairness that stems from lack of determinism in cloud networks. We argue that predictable or bounded latency is not necessary to achieve fairness. Inspired by the use of logical clocks in distributed systems, we present Delivery Based Ordering (DBO), a new approach that ensures fairness by instead correcting for differences in latency to the participants. We evaluate DBO both in our hardware test bed and in a public cloud deployment and demonstrate that it is feasible to achieve guaranteed fairness and sub-100 microsecond latency while operating at high transaction rates. △ Less

Submitted 29 March, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.08705 [pdf, other]

Intrinsic optical absorption in Dirac metals

Authors: Adamya P. Goyal, Prachi Sharma, Dmitrii L. Maslov

Abstract: A Dirac metal is a doped (gated) Dirac material with the Fermi energy ($E_\text{F}$) lying either in the conduction or valence bands. In the non-interacting picture, optical absorption in gapless Dirac metals occurs only if the frequency of incident photons ($Ω$) exceeds the direct (Pauli) frequency threshold, equal to $2E_\text{F}$. In this work, we study, both analytically and numerically, the r… ▽ More A Dirac metal is a doped (gated) Dirac material with the Fermi energy ($E_\text{F}$) lying either in the conduction or valence bands. In the non-interacting picture, optical absorption in gapless Dirac metals occurs only if the frequency of incident photons ($Ω$) exceeds the direct (Pauli) frequency threshold, equal to $2E_\text{F}$. In this work, we study, both analytically and numerically, the role of electron-electron ($ee$) and electron-hole ($eh$) interactions in optical absorption of two-dimensional (2D) and three-dimensional (3D) Dirac metals in the entire interval of frequencies below $2E_\text{F}$. We show that, for $Ω\ll E_\text{F}$, the optical conductivity, $\Reσ(Ω)$, arising from the combination of $ee$ and certain $eh$ scattering processes, scales as $Ω^2\lnΩ$ in 2D and as $Ω^2$ in 3D, respectively, both for short-range (Hubbard) and long-range (screened Coulomb) interactions. Another type of $eh$ processes, similar to Auger-Meitner (AM) processes in atomic physics, starts to contribute for $Ω$ above the direct threshold, equal to $E_\text{F}$. Similar to the case of doped semiconductors with parabolic bands studied in prior literature, the AM contribution to $\Reσ(Ω)$ in Dirac metals is manifested by a threshold singularity, $\Reσ(Ω)\propto (Ω-E_\text{F})^{d+2}$, where $d$ is the spatial dimensionality and $0<Ω-E_\text{F}\ll E_\text{F}$. In contrast to doped semiconductors, however, the AM contribution in Dirac metals is completely overshadowed by the $ee$ and other $eh$ contributions. Numerically, $\Reσ(Ω)$ happens to be small in almost the entire range of $Ω<2E_\text{F}$. This finding may have important consequences for collective modes in Dirac metals lying below $2E_\text{F}$. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 33 pages, 10 figures

arXiv:2302.09685 [pdf, other]

Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages

Authors: Ankan Mullick, Ishani Mondal, Sourjyadip Ray, R Raghav, G Sai Chaitanya, Pawan Goyal

Abstract: Scarcity of data and technological limitations for resource-poor languages in develo** countries like India poses a threat to the development of sophisticated NLU systems for healthcare. To assess the current status of various state-of-the-art language models in healthcare, this paper studies the problem by initially proposing two different Healthcare datasets, Indian Healthcare Query Intent-Web… ▽ More Scarcity of data and technological limitations for resource-poor languages in develo** countries like India poses a threat to the development of sophisticated NLU systems for healthcare. To assess the current status of various state-of-the-art language models in healthcare, this paper studies the problem by initially proposing two different Healthcare datasets, Indian Healthcare Query Intent-WebMD and 1mg (IHQID-WebMD and IHQID-1mg) and one real world Indian hospital query data in English and multiple Indic languages (Hindi, Bengali, Tamil, Telugu, Marathi and Gujarati) which are annotated with the query intents as well as entities. Our aim is to detect query intents and extract corresponding entities. We perform extensive experiments on a set of models in various realistic settings and explore two scenarios based on the access to English data only (less costly) and access to target language data (more expensive). We analyze context specific practical relevancy through empirical analysis. The results, expressed in terms of overall F1 score show that our approach is practically useful to identify intents and entities. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Journal ref: EACL 2023 Findings Full Paper

arXiv:2302.09527 [pdf, other]

SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

Authors: Jivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera, Tushar Sandhan, Pawan Goyal

Abstract: We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritSha… ▽ More We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritShala is deployed as a web-based application, which allows a user to get real-time analysis for the given input. It is built with easy-to-use interactive data annotation features that allow annotators to correct the system predictions when it makes mistakes. We publicly release the source codes of the 4 modules included in the toolkit, 7 word embedding models that have been trained on publicly available Sanskrit corpora and multiple annotated datasets such as word similarity, relatedness, categorization, analogy prediction to assess intrinsic properties of word embeddings. So far as we know, this is the first neural-based Sanskrit NLP toolkit that has a web-based interface and a number of NLP modules. We are sure that the people who are willing to work with Sanskrit will find it useful for pedagogical and annotative purposes. SanskritShala is available at: https://cnerg.iitkgp.ac.in/sanskritshala. The demo video of our platform can be accessed at: https://youtu.be/x0X31Y9k0mw4. △ Less

Submitted 29 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 7 pages, Accepted at ACL23 (Demo track) to be held at Toronto, Canada

arXiv:2302.09521 [pdf, other]

Rank-Minimizing and Structured Model Inference

Authors: Pawan Goyal, Benjamin Peherstorfer, Peter Benner

Abstract: While extracting information from data with machine learning plays an increasingly important role, physical laws and other first principles continue to provide critical insights about systems and processes of interest in science and engineering. This work introduces a method that infers models from data with physical insights encoded in the form of structure and that minimizes the model order so t… ▽ More While extracting information from data with machine learning plays an increasingly important role, physical laws and other first principles continue to provide critical insights about systems and processes of interest in science and engineering. This work introduces a method that infers models from data with physical insights encoded in the form of structure and that minimizes the model order so that the training data are fitted well while redundant degrees of freedom without conditions and sufficient data to fix them are automatically eliminated. The models are formulated via solution matrices of specific instances of generalized Sylvester equations that enforce interpolation of the training data and relate the model order to the rank of the solution matrices. The proposed method numerically solves the Sylvester equations for minimal-rank solutions and so obtains models of low order. Numerical experiments demonstrate that the combination of structure preservation and rank minimization leads to accurate models with orders of magnitude fewer degrees of freedom than models of comparable prediction quality that are learned with structure preservation alone. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2302.02086 [pdf, ps, other]

The Born Rule -- Axiom or Result?

Authors: Jay Lawrence, Philip Goyal

Abstract: The Born rule is part of the collapse axiom in the standard version of quantum theory, as presented by standard textbooks on the subject. We show here that its signature quadratic dependence follows from a single additional physical assumption beyond the other axioms - namely, that the probability of a particular measurement outcome (the state $φ_k$, say) is independent of the choice of observable… ▽ More The Born rule is part of the collapse axiom in the standard version of quantum theory, as presented by standard textbooks on the subject. We show here that its signature quadratic dependence follows from a single additional physical assumption beyond the other axioms - namely, that the probability of a particular measurement outcome (the state $φ_k$, say) is independent of the choice of observable to be measured, so long as one of its eigenstates corresponds to that outcome. We call this assumption ``observable independence.'' As a consequence, the Born rule cannot be completely eliminated from the list of axioms, but it can, in principle, be reduced to a more physical statement. Our presentation is suitable for advanced undergraduates or graduate students who have taken a standard course in quantum theory. It does not depend on any particular interpretation of the theory. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Four pages

arXiv:2301.10060 [pdf, other]

Inference of Continuous Linear Systems from Data with Guaranteed Stability

Authors: Pawan Goyal, Igor Pontes Duff, Peter Benner

Abstract: Machine-learning technologies for learning dynamical systems from data play an important role in engineering design. This research focuses on learning continuous linear models from data. Stability, a key feature of dynamic systems, is especially important in design tasks such as prediction and control. Thus, there is a need to develop methodologies that provide stability guarantees. To that end, w… ▽ More Machine-learning technologies for learning dynamical systems from data play an important role in engineering design. This research focuses on learning continuous linear models from data. Stability, a key feature of dynamic systems, is especially important in design tasks such as prediction and control. Thus, there is a need to develop methodologies that provide stability guarantees. To that end, we leverage the parameterization of stable matrices proposed in [Gillis/Sharma, Automatica, 2017] to realize the desired models. Furthermore, to avoid the estimation of derivative information to learn continuous systems, we formulate the inference problem in an integral form. We also discuss a few extensions, including those related to control systems. Numerical experiments show that the combination of a stable matrix parameterization and an integral form of differential equations allows us to learn stable systems without requiring derivative information, which can be challenging to obtain in situations with noisy or limited data. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.09770 [pdf, other]

Language-guided Task Adaptation for Imitation Learning

Authors: Prasoon Goyal, Raymond J. Mooney, Scott Niekum

Abstract: We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language. The proposed setting allows reusing demonstrations from other tasks, by providing low effort language descriptions, and can also be used to provide feedback to correct agent errors, which are both important desiderata fo… ▽ More We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language. The proposed setting allows reusing demonstrations from other tasks, by providing low effort language descriptions, and can also be used to provide feedback to correct agent errors, which are both important desiderata for building intelligent agents that assist humans in daily tasks. To enable progress in this proposed setting, we create two benchmarks -- Room Rearrangement and Room Navigation -- that cover a diverse set of task adaptations. Further, we propose a framework that uses a transformer-based model to reason about the entities in the tasks and their relationships, to learn a policy for the target task △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2301.09484 [pdf, other]

Dominant Subspaces of High-Fidelity Nonlinear Structured Parametric Dynamical Systems and Model Reduction

Authors: Pawan Goyal, Igor Pontes Duff, Peter Benner

Abstract: In this work, we investigate a model order reduction scheme for high-fidelity nonlinear structured parametric dynamical systems. More specifically, we consider a class of nonlinear dynamical systems whose nonlinear terms are polynomial functions, and the linear part corresponds to a linear structured model, such as second-order, time-delay, or fractional-order systems. Our approach relies on the V… ▽ More In this work, we investigate a model order reduction scheme for high-fidelity nonlinear structured parametric dynamical systems. More specifically, we consider a class of nonlinear dynamical systems whose nonlinear terms are polynomial functions, and the linear part corresponds to a linear structured model, such as second-order, time-delay, or fractional-order systems. Our approach relies on the Volterra series representation of these dynamical systems. Using this representation, we identify the kernels and, thus, the generalized multivariate transfer functions associated with these systems. Consequently, we present results allowing the construction of reduced-order models whose generalized transfer functions interpolate these of the original system at pre-defined frequency points. For efficient calculations, we also need the concept of a symmetric Kronecker product representation of a tensor and derive particular properties of them. Moreover, we propose an algorithm that extracts dominant subspaces from the prescribed interpolation conditions. This allows the construction of reduced-order models that preserve the structure. We also extend these results to parametric systems and a special case (delay in input/output). We demonstrate the efficiency of the proposed method by means of various numerical benchmarks. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2301.05852 [pdf, other]

CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials

Authors: Kishalay Das, Bidisha Samanta, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly

Abstract: In recent years, graph neural network (GNN) based approaches have emerged as a powerful technique to encode complex topological structure of crystal materials in an enriched representation space. These models are often supervised in nature and using the property-specific training data, learn relationship between crystal structure and different properties like formation energy, bandgap, bulk modulu… ▽ More In recent years, graph neural network (GNN) based approaches have emerged as a powerful technique to encode complex topological structure of crystal materials in an enriched representation space. These models are often supervised in nature and using the property-specific training data, learn relationship between crystal structure and different properties like formation energy, bandgap, bulk modulus, etc. Most of these methods require a huge amount of property-tagged data to train the system which may not be available for different properties. However, there is an availability of a huge amount of crystal data with its chemical composition and structural bonds. To leverage these untapped data, this paper presents CrysGNN, a new pre-trained GNN framework for crystalline materials, which captures both node and graph level structural information of crystal graphs using a huge amount of unlabelled material data. Further, we extract distilled knowledge from CrysGNN and inject into different state of the art property predictors to enhance their property prediction accuracy. We conduct extensive experiments to show that with distilled knowledge from the pre-trained model, all the SOTA algorithms are able to outperform their own vanilla version with good margins. We also observe that the distillation process provides a significant improvement over the conventional approach of finetuning the pre-trained model. We have released the pre-trained model along with the large dataset of 800K crystal graph which we carefully curated; so that the pretrained model can be plugged into any existing and upcoming models to enhance their prediction accuracy. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 16 Pages,5 figures

arXiv:2212.13436 [pdf, ps, other]

Almost commuting scheme of symplectic matrices and quantum Hamiltonian reduction

Authors: Pallav Goyal

Abstract: Losev introduced the scheme $X$ of almost commuting elements (i.e., elements commuting upto a rank one element) of $\mathfrak{g}=\mathfrak{sp}(V)$ for a symplectic vector space $V$ and discussed its algebro-geometric properties. We construct a Lagrangian subscheme $X^{nil}$ of $X$ and show that it is a complete intersection of dimension $\text{dim}(\mathfrak{g})+\frac{1}{2}\text{dim}(V)$ and compu… ▽ More Losev introduced the scheme $X$ of almost commuting elements (i.e., elements commuting upto a rank one element) of $\mathfrak{g}=\mathfrak{sp}(V)$ for a symplectic vector space $V$ and discussed its algebro-geometric properties. We construct a Lagrangian subscheme $X^{nil}$ of $X$ and show that it is a complete intersection of dimension $\text{dim}(\mathfrak{g})+\frac{1}{2}\text{dim}(V)$ and compute its irreducible components. We also study the quantum Hamiltonian reduction of the algebra $\mathcal{D}(\mathfrak{g})$ of differential operators on the Lie algebra $\mathfrak{g}$ tensored with the Weyl algebra with respect to the action of the symplectic group, and show that it is isomorphic to the spherical subalgebra of a certain rational Cherednik algebra of Type $C$. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2211.12503 [pdf, other]

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Authors: Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta

Abstract: Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benc… ▽ More Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.00357 [pdf, other]

Generalized Quadratic Embeddings for Nonlinear Dynamics using Deep Learning

Authors: Pawan Goyal, Peter Benner

Abstract: The engineering design process often relies on mathematical modeling that can describe the underlying dynamic behavior. In this work, we present a data-driven methodology for modeling the dynamics of nonlinear systems. To simplify this task, we aim to identify a coordinate transformation that allows us to represent the dynamics of nonlinear systems using a common, simple model structure. The advan… ▽ More The engineering design process often relies on mathematical modeling that can describe the underlying dynamic behavior. In this work, we present a data-driven methodology for modeling the dynamics of nonlinear systems. To simplify this task, we aim to identify a coordinate transformation that allows us to represent the dynamics of nonlinear systems using a common, simple model structure. The advantage of a common simple model is that customized design tools developed for it can be applied to study a large variety of nonlinear systems. The simplest common model -- one can think of -- is linear, but linear systems often fall short in accurately capturing the complex dynamics of nonlinear systems. In this work, we propose using quadratic systems as the common structure, inspired by the lifting principle. According to this principle, smooth nonlinear systems can be expressed as quadratic systems in suitable coordinates without approximation errors. However, finding these coordinates solely from data is challenging. Here, we leverage deep learning to identify such lifted coordinates using only data, enabling a quadratic dynamical system to describe the system's dynamics. Additionally, we discuss the asymptotic stability of these quadratic dynamical systems. We illustrate the approach using data collected from various numerical examples, demonstrating its superior performance with the existing well-known techniques. △ Less

Submitted 4 January, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.12467 [pdf, other]

ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts

Authors: Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya Sharma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Abstract: Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of sui… ▽ More Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. In this work, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark our dataset with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yet-effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls. △ Less

Submitted 26 October, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: 14 pages; Accepted as a Long Paper in EMNLP 2022 (Main Conference); Codes: https://github.com/rajdeep345/ECTSum

ACM Class: I.2.7

arXiv:2210.11753 [pdf, other]

TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer

Authors: Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

Abstract: Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete c… ▽ More Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete candidate solution space, over which various methods are applied to produce the most valid solution. However, these approaches fail while encountering out-of-vocabulary tokens. On the other hand, purely engineering methods for SWS have made use of recent advances in deep learning, but cannot make use of the latent word information on availability. To mitigate the shortcomings of both families of approaches, we propose Transformer based Linguistically Informed Sanskrit Tokenizer (TransLIST) consisting of (1) a module that encodes the character input along with latent-word information, which takes into account the sandhi phenomenon specific to SWS and is apt to work with partial or no candidate solutions, (2) a novel soft-masked attention to prioritize potential candidate words and (3) a novel path ranking algorithm to rectify the corrupted predictions. Experiments on the benchmark datasets for SWS show that TransLIST outperforms the current state-of-the-art system by an average 7.2 points absolute gain in terms of perfect match (PM) metric. The codebase and datasets are publicly available at https://github.com/rsingha108/TransLIST △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP22 (Findings)

arXiv:2210.07710 [pdf, ps, other]

doi 10.1016/j.ymssp.2023.110620

An Operator Inference Oriented Approach for Mechanical Systems

Authors: Yevgeniya Filanova, Igor Pontes Duff, Pawan Goyal, Peter Benner

Abstract: Model-order reduction techniques allow the construction of low-dimensional surrogate models that can accelerate engineering design processes. Often, these techniques are intrusive, meaning that they require direct access to underlying high-fidelity models. Accessing these models is laborious or may not even be possible in some cases. Therefore, there is an interest in develo** non-intrusive mode… ▽ More Model-order reduction techniques allow the construction of low-dimensional surrogate models that can accelerate engineering design processes. Often, these techniques are intrusive, meaning that they require direct access to underlying high-fidelity models. Accessing these models is laborious or may not even be possible in some cases. Therefore, there is an interest in develo** non-intrusive model reduction techniques to construct low-dimensional models directly from simulated or experimental data. In this work, we focus on a recent data-driven methodology, namely operator inference, that aims at inferring the reduced operators using only trajectories of high-fidelity models. We present an extension of operator inference for mechanical systems, preserving the second-order structure. We also study a particular case in which complete information about the external forces is available. In this formulation, the reduced operators having certain properties inspired by the original system matrices are enforced by adding constraints to the optimization problem. We illustrate the presented methodology using three numerical examples. △ Less

Submitted 2 December, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.07544 [pdf, other]

Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation

Authors: Abhay Shukla, Paheli Bhattacharya, Soham Poddar, Rajdeep Mukherjee, Kripabandhu Ghosh, Pawan Goyal, Saptarshi Ghosh

Abstract: Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input… ▽ More Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input tokens, and legal documents are known to be very long. Also, it is an open question on how best to evaluate legal case document summarization systems. In this paper, we carry out extensive experiments with several extractive and abstractive summarization methods (both supervised and unsupervised) over three legal summarization datasets that we have developed. Our analyses, that includes evaluation by law practitioners, lead to several interesting insights on legal summarization in specific and long document summarization in general. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted at The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP), 2022

arXiv:2209.15412 [pdf, other]

A quadratic decoder approach to nonintrusive reduced-order modeling of nonlinear dynamical systems

Authors: Peter Benner, Pawan Goyal, Jan Heiland, Igor Pontes

Abstract: Linear projection schemes like Proper Orthogonal Decomposition can efficiently reduce the dimensions of dynamical systems but are naturally limited, e.g., for convection-dominated problems. Nonlinear approaches have shown to outperform linear methods in terms of dimension reduction versus accuracy but, typically, come with a large computational overhead. In this work, we consider a quadratic reduc… ▽ More Linear projection schemes like Proper Orthogonal Decomposition can efficiently reduce the dimensions of dynamical systems but are naturally limited, e.g., for convection-dominated problems. Nonlinear approaches have shown to outperform linear methods in terms of dimension reduction versus accuracy but, typically, come with a large computational overhead. In this work, we consider a quadratic reduction scheme which induces nonlinear structures that are well accessible to tensorized linear algebra routines. We discuss that nonintrusive approaches can be used to simultaneously reduce the complexity in the equations and propose an operator inference formulation that respects dynamics on nonlinear manifolds. △ Less

Submitted 30 September, 2022; originally announced September 2022.

MSC Class: 37N10; 68T05; 76D05; 65F22; 93A15; 93C10

arXiv:2209.10292 [pdf, other]

Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Authors: Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

Abstract: With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is n… ▽ More With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios. △ Less

Submitted 22 September, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted to ICADL'22

ACM Class: K.4.1; I.2.7; I.2.6

arXiv:2209.06049 [pdf, other]

Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law

Authors: Shounak Paul, Arpan Mandal, Pawan Goyal, Saptarshi Ghosh

Abstract: NLP in the legal domain has seen increasing success with the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. PLMs trained over European and US legal text are available publicly; however, legal text from other domains (countries), such as India, have a lot of distinguishing characteristics. With the rapidly increasing volume of Legal NLP applications in… ▽ More NLP in the legal domain has seen increasing success with the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. PLMs trained over European and US legal text are available publicly; however, legal text from other domains (countries), such as India, have a lot of distinguishing characteristics. With the rapidly increasing volume of Legal NLP applications in various countries, it has become necessary to pre-train such LMs over legal text of other countries as well. In this work, we attempt to investigate pre-training in the Indian legal domain. We re-train (continue pre-training) two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian legal data, as well as train a model from scratch with a vocabulary based on Indian legal text. We apply these PLMs over three benchmark legal NLP tasks -- Legal Statute Identification from facts, Semantic Segmentation of Court Judgment Documents, and Court Appeal Judgment Prediction -- over both Indian and non-Indian (EU, UK) datasets. We observe that our approach not only enhances performance on the new domain (Indian texts) but also over the original domain (European and UK texts). We also conduct explainability experiments for a qualitative comparison of all these different PLMs. △ Less

Submitted 15 May, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: To be published in the 19th International Conference on Artificial Intelligence and Law - ICAIL 2023

arXiv:2208.13626 [pdf, other]

CH-MARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning

Authors: Vasu Sharma, Prasoon Goyal, Kaixiang Lin, Govind Thattai, Qiaozi Gao, Gaurav S. Sukhatme

Abstract: We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning. We introduce a benchmark multimodal dataset with tasks involving collaboration between multiple simulated heterogeneous robots in a rich multi-room home environment. We provide an integrated learning framework, multimodal implementations of state-of-the-art multi-agent reinforcement lear… ▽ More We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning. We introduce a benchmark multimodal dataset with tasks involving collaboration between multiple simulated heterogeneous robots in a rich multi-room home environment. We provide an integrated learning framework, multimodal implementations of state-of-the-art multi-agent reinforcement learning techniques, and a consistent evaluation protocol. Our experiments investigate the impact of different modalities on multi-agent learning performance. We also introduce a simple message passing method between agents. The results suggest that multimodality introduces unique challenges for cooperative multi-agent learning and there is significant room for advancing multi-agent reinforcement learning methods in such settings. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2208.13490 [pdf, other]

An artificial neural network for surrogate modeling of stress fields in viscoplastic polycrystalline materials

Authors: Mohammad S. Khorrami, Jaber R. Mianroodi, Nima H. Siboni, Pawan Goyal, Bob Svendsen, Peter Benner, Dierk Raabe

Abstract: The purpose of this work is the development of an artificial neural network (ANN) for surrogate modeling of the mechanical response of viscoplastic grain microstructures. To this end, a U-Net-based convolutional neural network (CNN) is trained to account for the history dependence of the material behavior. The training data take the form of numerical simulation results for the von Mises stress fie… ▽ More The purpose of this work is the development of an artificial neural network (ANN) for surrogate modeling of the mechanical response of viscoplastic grain microstructures. To this end, a U-Net-based convolutional neural network (CNN) is trained to account for the history dependence of the material behavior. The training data take the form of numerical simulation results for the von Mises stress field under quasi-static tensile loading. The trained CNN (tCNN) can accurately reproduce both the average response as well as the local von Mises stress field. The tCNN calculates the von Mises stress field of grain microstructures not included in the training dataset about 500 times faster than its calculation based on the numerical solution with a spectral solver of the corresponding initial-boundary-value problem. The tCNN is also successfully applied to other types of microstructure morphologies (e.g., matrix-inclusion type topologies) and loading levels not contained in the training dataset. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2208.10310 [pdf, other]

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

Authors: Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, Tushar Sandhan, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

Abstract: The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier appro… ▽ More The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.The code and datasets are publicly available at https://github.com/ashishgupta2598/SaCTI △ Less

Submitted 11 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: The work is accepted at COLING22, Gyeongju, Republic of Korea

arXiv:2208.07130 [pdf, other]

Exploring Generative Models for Joint Attribute Value Extraction from Product Titles

Authors: Kalyani Roy, Tapas Nayak, Pawan Goyal

Abstract: Attribute values of the products are an essential component in any e-commerce platform. Attribute Value Extraction (AVE) deals with extracting the attributes of a product and their values from its title or description. In this paper, we propose to tackle the AVE task using generative frameworks. We present two types of generative paradigms, namely, word sequence-based and positional sequence-based… ▽ More Attribute values of the products are an essential component in any e-commerce platform. Attribute Value Extraction (AVE) deals with extracting the attributes of a product and their values from its title or description. In this paper, we propose to tackle the AVE task using generative frameworks. We present two types of generative paradigms, namely, word sequence-based and positional sequence-based, by formulating the AVE task as a generation problem. We conduct experiments on two datasets where the generative approaches achieve the new state-of-the-art results. This shows that we can use the proposed framework for AVE tasks without additional tagging or task-specific model design. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 6 pages

arXiv:2205.10558 [pdf, other]

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Authors: Bishal Santra, Ravi Ghadia, Manish Gupta, Pawan Goyal

Abstract: In the field of Natural Language Processing, there are many tasks that can be tackled effectively using the cross-entropy (CE) loss function. However, the task of dialog generation poses unique challenges for CE loss. This is because CE loss assumes that, for any given input, the only possible output is the one available as the ground truth in the training dataset. But, in dialog generation, there… ▽ More In the field of Natural Language Processing, there are many tasks that can be tackled effectively using the cross-entropy (CE) loss function. However, the task of dialog generation poses unique challenges for CE loss. This is because CE loss assumes that, for any given input, the only possible output is the one available as the ground truth in the training dataset. But, in dialog generation, there can be multiple valid responses (for a given context) that not only have different surface forms but can also be semantically different. Furthermore, CE loss computation for the dialog generation task does not take the input context into consideration and, hence, it grades the response irrespective of the context. To grade the generated response for qualities like relevance, engagingness, etc., the loss function should depend on both the context and the generated response. To address these limitations, this paper proposes CORAL, a novel loss function based on a reinforcement learning (RL) view of the dialog generation task with a reward function that estimates human preference for generated responses while considering both the context and the response. Furthermore, to overcome challenges such as high sample complexity of RL training and a large action space, we propose a mix-policy training algorithm. Notably, using CORAL we can train dialog generation models without assuming the ground-truth as the only correct response. Extensive comparisons on benchmark datasets demonstrate that CORAL based models outperform strong state-of-the-art baseline models of different sizes. △ Less

Submitted 20 May, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

Comments: 15 pages, 3 figures. TLDR: CORAL proposes a novel loss function for dialog generation, incorporating context and multiple valid responses. It outperforms existing models by optimizing human preference through reinforcement learning

arXiv:2205.09479 [pdf, other]

Neural ODEs with Irregular and Noisy Data

Authors: Pawan Goyal, Peter Benner

Abstract: Measurement noise is an integral part while collecting data of a physical process. Thus, noise removal is necessary to draw conclusions from these data, and it often becomes essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and irregular sampled measurements. In our methodology, the main innovation can be seen in the in… ▽ More Measurement noise is an integral part while collecting data of a physical process. Thus, noise removal is necessary to draw conclusions from these data, and it often becomes essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and irregular sampled measurements. In our methodology, the main innovation can be seen in the integration of deep neural networks with the neural ordinary differential equations (ODEs) approach. Precisely, we aim at learning a neural network that provides (approximately) an implicit representation of the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by constraining using neural ODEs. The proposed framework to learn a model describing the vector field is highly effective under noisy measurements. The approach can handle scenarios where dependent variables are not available at the same temporal grid. Moreover, a particular structure, e.g., second-order with respect to time, can easily be incorporated. We demonstrate the effectiveness of the proposed method for learning models using data obtained from various differential equations and present a comparison with the neural ODE method that does not make any special treatment to noise. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.11446

arXiv:2205.02005 [pdf, other]

A Framework to Generate High-Quality Datapoints for Multiple Novel Intent Detection

Authors: Ankan Mullick, Sukannya Purkayastha, Pawan Goyal, Niloy Ganguly

Abstract: Systems like Voice-command based conversational agents are characterized by a pre-defined set of skills or intents to perform user specified tasks. In the course of time, newer intents may emerge requiring retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically. Thus, there are two important tasks at hand (a). identifying emerging new intents, (b)… ▽ More Systems like Voice-command based conversational agents are characterized by a pre-defined set of skills or intents to perform user specified tasks. In the course of time, newer intents may emerge requiring retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically. Thus, there are two important tasks at hand (a). identifying emerging new intents, (b). annotating data of the new intents so that the underlying classifier can be retrained efficiently. The tasks become specially challenging when a large number of new intents emerge simultaneously and there is a limited budget of manual annotation. In this paper, we propose MNID (Multiple Novel Intent Detection) which is a cluster based framework to detect multiple novel intents with budgeted human annotation cost. Empirical results on various benchmark datasets (of different sizes) demonstrate that MNID, by intelligently using the budget for annotation, outperforms the baseline methods in terms of accuracy and F1-score. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: Accepted as Full Paper at Findings of NAACL, 2022

arXiv:2205.01234 [pdf, other]

Scalable Tail Latency Estimation for Data Center Networks

Authors: Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas E. Anderson

Abstract: In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected by a wide variety of factors, including network load, inter-rack traffic skew, traffic burstiness, flow size distributions, oversubscription, and topology asym… ▽ More In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected by a wide variety of factors, including network load, inter-rack traffic skew, traffic burstiness, flow size distributions, oversubscription, and topology asymmetry. Network simulators such as ns-3 and OMNeT++ can provide accurate answers, but are very hard to parallelize, taking hours or days to answer what if questions for a single configuration at even moderate scale. Recent work with MimicNet has shown how to use machine learning to improve simulation performance, but at a cost of including a long training step per configuration, and with assumptions about workload and topology uniformity that typically do not hold in practice. We address this gap by develo** a set of techniques to provide fast performance estimates for large scale networks with general traffic matrices and topologies. A key step is to decompose the problem into a large number of parallel independent single-link simulations; we carefully combine these link-level simulations to produce accurate estimates of end-to-end flow level performance distributions for the entire network. Like MimicNet, we exploit symmetry where possible to gain additional speedups, but without relying on machine learning, so there is no training delay. On large-scale networks where ns-3 takes 11 to 27 hours to simulate five seconds of network behavior, our techniques run in one to two minutes with 99th percentile accuracy within 9% for flow completion times. △ Less

Submitted 30 September, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.08167 [pdf, other]

A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems

Authors: Debjoy Saha, Bishal Santra, Pawan Goyal

Abstract: We tackle the Dialogue Belief State Tracking(DST) problem of task-oriented conversational systems. Recent approaches to this problem leveraging Transformer-based models have yielded great results. However, training these models is expensive, both in terms of computational resources and time. Additionally, collecting high quality annotated dialogue datasets remains a challenge for researchers becau… ▽ More We tackle the Dialogue Belief State Tracking(DST) problem of task-oriented conversational systems. Recent approaches to this problem leveraging Transformer-based models have yielded great results. However, training these models is expensive, both in terms of computational resources and time. Additionally, collecting high quality annotated dialogue datasets remains a challenge for researchers because of the extensive annotation required for training these models. Driven by the recent success of pre-trained language models and prompt-based learning, we explore prompt-based few-shot learning for Dialogue Belief State Tracking. We formulate the DST problem as a 2-stage prompt-based language modelling task and train language models for both tasks and present a comprehensive empirical analysis of their separate and joint performance. We demonstrate the potential of prompt-based methods in few-shot learning for DST and provide directions for future improvement. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 9 pages, 12 figures

arXiv:2204.05674 [pdf, other]

A Generative Approach for Financial Causality Extraction

Authors: Tapas Nayak, Soumya Sharma, Yash Butala, Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly

Abstract: Causality represents the foremost relation between events in financial documents such as financial news articles, financial reports. Each financial causality contains a cause span and an effect span. Previous works proposed sequence labeling approaches to solve this task. But sequence labeling models find it difficult to extract multiple causalities and overlap** causalities from the text segmen… ▽ More Causality represents the foremost relation between events in financial documents such as financial news articles, financial reports. Each financial causality contains a cause span and an effect span. Previous works proposed sequence labeling approaches to solve this task. But sequence labeling models find it difficult to extract multiple causalities and overlap** causalities from the text segments. In this paper, we explore a generative approach for causality extraction using the encoder-decoder framework and pointer networks. We use a causality dataset from the financial domain, \textit{FinCausal}, for our experiments and our proposed framework achieves very competitive performance on this dataset. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: Accepted at FinWeb 2022 workshop of WWW 2022

arXiv:2203.14267 [pdf, other]

bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments

Authors: Vitthal Bhandari, Poonam Goyal

Abstract: Online social networks are ubiquitous and user-friendly. Nevertheless, it is vital to detect and moderate offensive content to maintain decency and empathy. However, mining social media texts is a complex task since users don't adhere to any fixed patterns. Comments can be written in any combination of languages and many of them may be low-resource. In this paper, we present our system for the L… ▽ More Online social networks are ubiquitous and user-friendly. Nevertheless, it is vital to detect and moderate offensive content to maintain decency and empathy. However, mining social media texts is a complex task since users don't adhere to any fixed patterns. Comments can be written in any combination of languages and many of them may be low-resource. In this paper, we present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments. We experiment with a number of monolingual and multilingual transformer based models such as mBERT along with a data augmentation technique for tackling class imbalance. Such pretrained large models have recently shown tremendous success on a variety of benchmark tasks in natural language processing. We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil. Our submission achieved ranks 9, 6 and 3 with a macro-averaged F1-score of 0.42, 0.64 and 0.58 in the English, Tamil and Tamil-English subtasks respectively. The code for the system has been open sourced. △ Less

Submitted 9 April, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: 6 pages, Accepted at LT-EDI workshop ACL 2022. Camera ready version. Addressed all reviewer comments. Added Baseline methods and Ablation study

arXiv:2202.10261 [pdf, other]

A Self-Supervised Descriptor for Image Copy Detection

Authors: Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, Matthijs Douze

Abstract: Image copy detection is an important task for content moderation. We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine im… ▽ More Image copy detection is an important task for content moderation. We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine images. Our approach relies on an entropy regularization term, promoting consistent separation between descriptor vectors, and we demonstrate that this significantly improves copy detection accuracy. Our method produces a compact descriptor vector, suitable for real-world web scale applications. Statistical information from a background image distribution can be incorporated into the descriptor. On the recent DISC2021 benchmark, SSCD is shown to outperform both baseline copy detection models and self-supervised architectures designed for image classification by huge margins, in all settings. For example, SSCD out-performs SimCLR descriptors by 48% absolute. Code is available at https://github.com/facebookresearch/sscd-copy-detection. △ Less

Submitted 25 March, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

arXiv:2202.08360 [pdf, other]

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Authors: Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

Abstract: Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any… ▽ More Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any salient and more representative information present in diverse unbounded set of images from across the globe. To do so, we train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn. We scale our model size to dense 10 billion parameters to avoid underfitting on a large data size. We extensively study and validate our model performance on over 50 benchmarks including fairness, robustness to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets. The resulting model, not only captures well semantic information, it also captures information about artistic style and learns salient information such as geolocations and multilingual word embeddings based on visual content only. More importantly, we discover that such model is more robust, more fair, less harmful and less biased than supervised models or models trained on object centric datasets such as ImageNet. △ Less

Submitted 22 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

arXiv:2202.07603 [pdf, other]

Fairness Indicators for Systematic Assessments of Visual Feature Extractors

Authors: Priya Goyal, Adriana Romero Soriano, Caner Hazirbas, Levent Sagun, Nicolas Usunier

Abstract: Does everyone equally benefit from computer vision systems? Answers to this question become more and more important as computer vision systems are deployed at large scale, and can spark major concerns when they exhibit vast performance discrepancies between people from various demographic and social backgrounds. Systematic diagnosis of fairness, harms, and biases of computer vision systems is an i… ▽ More Does everyone equally benefit from computer vision systems? Answers to this question become more and more important as computer vision systems are deployed at large scale, and can spark major concerns when they exhibit vast performance discrepancies between people from various demographic and social backgrounds. Systematic diagnosis of fairness, harms, and biases of computer vision systems is an important step towards building socially responsible systems. To initiate an effort towards standardized fairness audits, we propose three fairness indicators, which aim at quantifying harms and biases of visual systems. Our indicators use existing publicly available datasets collected for fairness evaluations, and focus on three main types of harms and bias identified in the literature, namely harmful label associations, disparity in learned representations of social and demographic traits, and biased performance on geographically diverse images from across the world.We define precise experimental protocols applicable to a wide range of computer vision models. These indicators are part of an ever-evolving suite of fairness probes and are not intended to be a substitute for a thorough analysis of the broader impact of the new computer vision technologies. Yet, we believe it is a necessary first step towards (1) facilitating the widespread adoption and mandate of the fairness assessments in computer vision research, and (2) tracking progress towards building socially responsible models. To study the practical effectiveness and broad applicability of our proposed indicators to any visual system, we apply them to off-the-shelf models built using widely adopted model training paradigms which vary in their ability to whether they can predict labels on a given image or only produce the embeddings. We also systematically study the effect of data domain and model size. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.04321 [pdf, other]

Optimal Congestion Control for Time-varying Wireless Links

Authors: Prateesh Goyal, Mohammad Alizadeh, Thomas E. Anderson

Abstract: Modern networks exhibit a high degree of variability in link rates. Cellular network bandwidth inherently varies with receiver motion and orientation, while class-based packet scheduling in datacenter and service provider networks induces high variability in available capacity for network tenants. Recent work has proposed numerous congestion control protocols to cope with this variability, offerin… ▽ More Modern networks exhibit a high degree of variability in link rates. Cellular network bandwidth inherently varies with receiver motion and orientation, while class-based packet scheduling in datacenter and service provider networks induces high variability in available capacity for network tenants. Recent work has proposed numerous congestion control protocols to cope with this variability, offering different tradeoffs between link utilization and queuing delay. In this paper, we develop a formal model of congestion control over time-varying links, and we use this model to derive a bound on the performance of any congestion control protocol running over a time-varying link with a given distribution of rate variation. Using the insights from this analysis, we derive an optimal control law that offers a smooth tradeoff between link utilization and queuing delay. We compare the performance of this control law to several existing control algorithms on cellular link traces to show that there is significant room for optimization. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.11391 [pdf, other]

Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages

Authors: Jivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay, Laxmidhar Behera, Pawan Goyal

Abstract: Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain,… ▽ More Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain, covers ten language families, containing 94 hours of speech by 130+ speakers, manually aligned with corresponding text in the target language. The Prabhupadavani is about Vedic culture and heritage from Indic literature, where code-switching in the case of quotation from literature is important in the context of humanities teaching. To the best of our knowledge, Prabhupadvani is the first multi-lingual code-mixed ST dataset available in the ST literature. This data also can be used for a code-mixed machine translation task. All the dataset can be accessed at https://github.com/frozentoad9/CMST. △ Less

Submitted 4 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: The work is accepted at COLING22-SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

arXiv:2201.11374 [pdf, other]

Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing

Authors: Jivnesh Sandhan, Laxmidhar Behera, Pawan Goyal

Abstract: In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the e… ▽ More In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the efficacy of these strategies. We experiment with 5 low-resource strategies for our ensembled approach on 7 Universal Dependency (UD) low-resource languages. Our exhaustive experimentation on these languages supports the effective improvements for languages not covered in pretrained models. We show a successful application of the ensembled system on a truly low-resource language Sanskrit. The code and data are available at: https://github.com/Jivnesh/SanDP △ Less

Submitted 29 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: Accepted at EACL2023 to be held in Croatia Europe

arXiv:2201.09027 [pdf]

Effect of MagneticField on the Dam** Behavior of a Ferrofluid based Damper

Authors: Durga N K P Rao Miriyala, P S Goyal

Abstract: This paper is an extension of our earlier work where we had reported a proof of concept for a ferrofluid based damper. The damper used ferrofluid as dam** medium and it was seen that dam** efficiency of the damper changes on application of magnetic field. The present paper deals with a systematic study of the effect of magnetic field on the dam** efficiency of the damper. Results of these st… ▽ More This paper is an extension of our earlier work where we had reported a proof of concept for a ferrofluid based damper. The damper used ferrofluid as dam** medium and it was seen that dam** efficiency of the damper changes on application of magnetic field. The present paper deals with a systematic study of the effect of magnetic field on the dam** efficiency of the damper. Results of these studies are reported. It is seen that dam** ratio varies linearly with magnetic field (ζ / H = 0.028 per kG) for magnetic field in range of 0.0 to 4.5 kG. It may be mentioned that ferrofluid is different from magnetorheological fluid even though both of them are magnetic field-responsive fluids. The ferrofluid-dampers are better suited than MR Fluid-dampers for their use in automobiles. △ Less

Submitted 22 January, 2022; originally announced January 2022.

Comments: Conference on Technologies for Future Cities

arXiv:2201.04957 [pdf]

Dielectric Properties of Polysulfone Carbon Nanotube Composite Membranes

Authors: Bhakti Hirani, P. S. Goyal, Deepali Shrivastava, S. K. Deshpande

Abstract: Polymeric membranes, including Polysulfone (PSf) membranes, are routinely used for water treatment. To enhance water permeation of above membranes, it is common to synthesize polymeric membranes with carbon nanotubes (CNTs) embedded in them. It is seen that water permeability of membranes having vertically aligned CNTs is higher, as compared to those where CNTs are not aligned. It is of interest t… ▽ More Polymeric membranes, including Polysulfone (PSf) membranes, are routinely used for water treatment. To enhance water permeation of above membranes, it is common to synthesize polymeric membranes with carbon nanotubes (CNTs) embedded in them. It is seen that water permeability of membranes having vertically aligned CNTs is higher, as compared to those where CNTs are not aligned. It is of interest to examine if the dielectric constant of a CNT based nanocomposite membrane is sensitive to alignment of CNTs or not. This paper reports dielectric properties of PSf-MWCNT membranes, both, for aligned and unaligned MWCNTs. Multi Walled Carbon Nanotubes (MWCNTs) based polysulfone membranes were synthesized using standard methods. MWCNTs in above membranes were aligned by casting the membrane in presence of magnetic field. The present paper, for the first time, shows that the above result is valid for membranes also. △ Less

Submitted 6 January, 2022; originally announced January 2022.

Comments: Conference on Technologies for Future Cities 2021

arXiv:2112.14731 [pdf, other]

LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents

Authors: Shounak Paul, Pawan Goyal, Saptarshi Ghosh

Abstract: The task of Legal Statute Identification (LSI) aims to identify the legal statutes that are relevant to a given description of Facts or evidence of a legal case. Existing methods only utilize the textual content of Facts and legal articles to guide such a task. However, the citation network among case documents and legal statutes is a rich source of additional information, which is not considered… ▽ More The task of Legal Statute Identification (LSI) aims to identify the legal statutes that are relevant to a given description of Facts or evidence of a legal case. Existing methods only utilize the textual content of Facts and legal articles to guide such a task. However, the citation network among case documents and legal statutes is a rich source of additional information, which is not considered by existing models. In this work, we take the first step towards utilising both the text and the legal citation network for the LSI task. We curate a large novel dataset for this task, including Facts of cases from several major Indian Courts of Law, and statutes from the Indian Penal Code (IPC). Modeling the statutes and training documents as a heterogeneous graph, our proposed model LeSICiN can learn rich textual and graphical features, and can also tune itself to correlate these features. Thereafter, the model can be used to inductively predict links between test documents (new nodes whose graphical features are not available to the model) and statutes (existing nodes). Extensive experiments on the dataset show that our model comfortably outperforms several state-of-the-art baselines, by exploiting the graphical structure along with textual features. The dataset and our codes are available at https://github.com/Law-AI/LeSICiN. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: This paper has been accepted at the Main Track of the AAAI Conference on Artificial Intelligence (AAAI) 2022. Dataset and codes are available at https://github.com/Law-AI/LeSICiN

ACM Class: I.2.1; I.2.7

arXiv:2112.05798 [pdf, other]

doi 10.1145/3488560.3498536

MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs

Authors: Rajdeep Mukherjee, Uppada Vishnu, Hari Chandana Peruri, Sourangshu Bhattacharya, Koustav Rudra, Pawan Goyal, Niloy Ganguly

Abstract: Occurrences of catastrophes such as natural or man-made disasters trigger the spread of rumours over social media at a rapid pace. Presenting a trustworthy and summarized account of the unfolding event in near real-time to the consumers of such potentially unreliable information thus becomes an important task. In this work, we propose MTLTS, the first end-to-end solution for the task that jointly… ▽ More Occurrences of catastrophes such as natural or man-made disasters trigger the spread of rumours over social media at a rapid pace. Presenting a trustworthy and summarized account of the unfolding event in near real-time to the consumers of such potentially unreliable information thus becomes an important task. In this work, we propose MTLTS, the first end-to-end solution for the task that jointly determines the credibility and summary-worthiness of tweets. Our credibility verifier is designed to recursively learn the structural properties of a Twitter conversation cascade, along with the stances of replies towards the source tweet. We then take a hierarchical multi-task learning approach, where the verifier is trained at a lower layer, and the summarizer is trained at a deeper layer where it utilizes the verifier predictions to determine the salience of a tweet. Different from existing disaster-specific summarizers, we model tweet summarization as a supervised task. Such an approach can automatically learn summary-worthy features, and can therefore generalize well across domains. When trained on the PHEME dataset [29], not only do we outperform the strongest baselines for the auxiliary task of verification/rumour detection, we also achieve 21 - 35% gains in the verified ratio of summary tweets, and 16 - 20% gains in ROUGE1-F1 scores over the existing state-of-the-art solutions for the primary task of trustworthy summarization. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: Accepted as a Full Paper at WSDM 2022; 9 pages; Codes: https://github.com/rajdeep345/MTLTS

ACM Class: H.3.3

arXiv:2112.05787 [pdf, other]

Representation Learning for Conversational Data using Discourse Mutual Information Maximization

Authors: Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal

Abstract: Although many pretrained models exist for text or images, there have been relatively fewer attempts to train representations specifically for dialog understanding. Prior works usually relied on finetuned representations based on generic text representation models like BERT or GPT-2. But such language modeling pretraining objectives do not take the structural information of conversational text into… ▽ More Although many pretrained models exist for text or images, there have been relatively fewer attempts to train representations specifically for dialog understanding. Prior works usually relied on finetuned representations based on generic text representation models like BERT or GPT-2. But such language modeling pretraining objectives do not take the structural information of conversational text into consideration. Although generative dialog models can learn structural features too, we argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling. We empirically demonstrate that such representations do not perform consistently across various dialog understanding tasks. Hence, we propose a structure-aware Mutual Information based loss-function DMI (Discourse Mutual Information) for training dialog-representation models, that additionally captures the inherent uncertainty in response prediction. Extensive evaluation on nine diverse dialog modeling tasks shows that our proposed DMI-based models outperform strong baselines by significant margins. △ Less

Submitted 3 May, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: Preprint, 15 pages, To appear in NAACL 2022 (Main)

arXiv:2111.12995 [pdf, other]

Learning Low-Dimensional Quadratic-Embeddings of High-Fidelity Nonlinear Dynamics using Deep Learning

Authors: Pawan Goyal, Peter Benner

Abstract: Learning dynamical models from data plays a vital role in engineering design, optimization, and predictions. Building models describing dynamics of complex processes (e.g., weather dynamics, or reactive flows) using empirical knowledge or first principles are onerous or infeasible. Moreover, these models are high-dimensional but spatially correlated. It is, however, observed that the dynamics of h… ▽ More Learning dynamical models from data plays a vital role in engineering design, optimization, and predictions. Building models describing dynamics of complex processes (e.g., weather dynamics, or reactive flows) using empirical knowledge or first principles are onerous or infeasible. Moreover, these models are high-dimensional but spatially correlated. It is, however, observed that the dynamics of high-fidelity models often evolve in low-dimensional manifolds. Furthermore, it is also known that for sufficiently smooth vector fields defining the nonlinear dynamics, a quadratic model can describe it accurately in an appropriate coordinate system, conferring to the McCormick relaxation idea in nonconvex optimization. Here, we aim at finding a low-dimensional embedding of high-fidelity dynamical data, ensuring a simple quadratic model to explain its dynamics. To that aim, this work leverages deep learning to identify low-dimensional quadratic embeddings for high-fidelity dynamical systems. Precisely, we identify the embedding of data using an autoencoder to have the desired property of the embedding. We also embed a Runge-Kutta method to avoid the time-derivative computations, which is often a challenge. We illustrate the ability of the approach by a couple of examples, arising in describing flow dynamics and the oscillatory tubular reactor model. △ Less

Submitted 25 November, 2021; originally announced November 2021.

arXiv:2111.01074 [pdf, ps, other]

doi 10.1109/COMSNETS53615.2022.9668376

FedFm: Towards a Robust Federated Learning Approach For Fault Mitigation at the Edge Nodes

Authors: Manupriya Gupta, Pavas Goyal, Rohit Verma, Rajeev Shorey, Huzur Saran

Abstract: Federated Learning deviates from the norm of "send data to model" to "send model to data". When used in an edge ecosystem, numerous heterogeneous edge devices collecting data through different means and connected through different network channels get involved in the training process. Failure of edge devices in such an ecosystem due to device fault or network issues is highly likely. In this paper… ▽ More Federated Learning deviates from the norm of "send data to model" to "send model to data". When used in an edge ecosystem, numerous heterogeneous edge devices collecting data through different means and connected through different network channels get involved in the training process. Failure of edge devices in such an ecosystem due to device fault or network issues is highly likely. In this paper, we first analyse the impact of the number of edge devices on an FL model and provide a strategy to select an optimal number of devices that would contribute to the model. We observe how the edge ecosystem behaves when the selected devices fail and provide a mitigation strategy to ensure a robust Federated Learning technique. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.14183 [pdf, other]

(Im)balance in the Representation of News? An Extensive Study on a Decade Long Dataset from India

Authors: Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

Abstract: (Im)balance in the representation of news has always been a topic of debate in political circles. The concept of balance has often been discussed and studied in the context of the social responsibility theory and the prestige press in the USA. While various qualitative, as well as quantitative measures of balance, have been suggested in the literature, a comprehensive analysis of all these measu… ▽ More (Im)balance in the representation of news has always been a topic of debate in political circles. The concept of balance has often been discussed and studied in the context of the social responsibility theory and the prestige press in the USA. While various qualitative, as well as quantitative measures of balance, have been suggested in the literature, a comprehensive analysis of all these measures across a large dataset of the post-truth era comprising different popular news media houses and over a sufficiently long temporal scale in a non-US democratic setting is lacking. We use this concept of balance to measure and understand the evolution of imbalance in Indian media on various journalistic metrics on a month-by-month basis. For this study, we amass a huge dataset of over four million political articles from India for 9+ years and analyze the extent and quality of coverage given to issues and political parties in the context of contemporary influential events for three leading newspapers. We use several state-of-the-art NLP tools to effectively understand political polarization (if any) manifesting in these articles over time. We find that two out of the three news outlets are more strongly clustered in their imbalance metrics. We also observe that only a few locations are extensively covered across all the news outlets and the situation is only slightly getting better for one of the three news outlets. Cloze tests show that the changing landscape of events get reflected in all the news outlets with border and terrorism issues dominating in around 2010 while economic aspects like unemployment, GST, demonetization, etc. became more dominant in the period 2014 -- 2018. Further, cloze tests clearly portray the changing popularity profile of the political parties over time. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 14 pages, submitted to IEEE TCSS

Journal ref: International Conference on Social Informatics, SocInfo, 2022

arXiv:2110.13992 [pdf, other]

Leveraging Local Temporal Information for Multimodal Scene Classification

Authors: Saurabh Sahu, Palash Goyal

Abstract: Robust video scene classification models should capture the spatial (pixel-wise) and temporal (frame-wise) characteristics of a video effectively. Transformer models with self-attention which are designed to get contextualized representations for individual tokens given a sequence of tokens, are becoming increasingly popular in many computer vision tasks. However, the use of Transformer based mode… ▽ More Robust video scene classification models should capture the spatial (pixel-wise) and temporal (frame-wise) characteristics of a video effectively. Transformer models with self-attention which are designed to get contextualized representations for individual tokens given a sequence of tokens, are becoming increasingly popular in many computer vision tasks. However, the use of Transformer based models for video understanding is still relatively unexplored. Moreover, these models fail to exploit the strong temporal relationships between the neighboring video frames to get potent frame-level representations. In this paper, we propose a novel self-attention block that leverages both local and global temporal relationships between the video frames to obtain better contextualized representations for the individual frames. This enables the model to understand the video at various granularities. We illustrate the performance of our models on the large scale YoutTube-8M data set on the task of video categorization and further analyze the results to showcase improvement. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2110.13950 [pdf, other]

Can't Fool Me: Adversarially Robust Transformer for Video Understanding

Authors: Divya Choudhary, Palash Goyal, Saurabh Sahu

Abstract: Deep neural networks have been shown to perform poorly on adversarial examples. To address this, several techniques have been proposed to increase robustness of a model for image classification tasks. However, in video understanding tasks, develo** adversarially robust models is still unexplored. In this paper, we aim to bridge this gap. We first show that simple extensions of image based advers… ▽ More Deep neural networks have been shown to perform poorly on adversarial examples. To address this, several techniques have been proposed to increase robustness of a model for image classification tasks. However, in video understanding tasks, develo** adversarially robust models is still unexplored. In this paper, we aim to bridge this gap. We first show that simple extensions of image based adversarially robust models slightly improve the worst-case performance. Further, we propose a temporal attention regularization scheme in Transformer to improve the robustness of attention modules to adversarial examples. We illustrate using a large-scale video data set YouTube-8M that the final model (A-ART) achieves close to non-adversarial performance on its adversarial example set. We achieve 91% GAP on adversarial examples, whereas baseline Transformer and simple adversarial extensions achieve 72.9% and 82% respectively, showing significant improvement in robustness over the state-of-the-art. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.10043

arXiv:2110.04794 [pdf, other]

PASTE: A Tagging-Free Decoding Framework Using Pointer Networks for Aspect Sentiment Triplet Extraction

Authors: Rajdeep Mukherjee, Tapas Nayak, Yash Butala, Sourangshu Bhattacharya, Pawan Goyal

Abstract: Aspect Sentiment Triplet Extraction (ASTE) deals with extracting opinion triplets, consisting of an opinion target or aspect, its associated sentiment, and the corresponding opinion term/span explaining the rationale behind the sentiment. Existing research efforts are majorly tagging-based. Among the methods taking a sequence tagging approach, some fail to capture the strong interdependence betwee… ▽ More Aspect Sentiment Triplet Extraction (ASTE) deals with extracting opinion triplets, consisting of an opinion target or aspect, its associated sentiment, and the corresponding opinion term/span explaining the rationale behind the sentiment. Existing research efforts are majorly tagging-based. Among the methods taking a sequence tagging approach, some fail to capture the strong interdependence between the three opinion factors, whereas others fall short of identifying triplets with overlap** aspect/opinion spans. A recent grid tagging approach on the other hand fails to capture the span-level semantics while predicting the sentiment between an aspect-opinion pair. Different from these, we present a tagging-free solution for the task, while addressing the limitations of the existing works. We adapt an encoder-decoder architecture with a Pointer Network-based decoding framework that generates an entire opinion triplet at each time step thereby making our solution end-to-end. Interactions between the aspects and opinions are effectively captured by the decoder by considering their entire detected spans while predicting their connecting sentiment. Extensive experiments on several benchmark datasets establish the better efficacy of our proposed approach, especially in the recall, and in predicting multiple and aspect/opinion-overlapped triplets from the same review sentence. We report our results both with and without BERT and also demonstrate the utility of domain-specific BERT post-training for the task. △ Less

Submitted 10 October, 2021; originally announced October 2021.

Comments: Accepted as a Long Paper at EMNLP 2021 (Main Conference); 13 pages; Codes: https://github.com/rajdeep345/PASTE

ACM Class: I.2.7

arXiv:2109.11446 [pdf, other]

Learning Dynamics from Noisy Measurements using Deep Learning with a Runge-Kutta Constraint

Authors: Pawan Goyal, Peter Benner

Abstract: Measurement noise is an integral part while collecting data of a physical process. Thus, noise removal is a necessary step to draw conclusions from these data, and it often becomes quite essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and sparsely sampled measurements. In our methodology, the main innovation can be se… ▽ More Measurement noise is an integral part while collecting data of a physical process. Thus, noise removal is a necessary step to draw conclusions from these data, and it often becomes quite essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and sparsely sampled measurements. In our methodology, the main innovation can be seen in of integration of deep neural networks with a classical numerical integration method. Precisely, we aim at learning a neural network that implicitly represents the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by enforcing the constraint that the data at the next time-steps can be given by following a numerical integration scheme such as the fourth-order Runge-Kutta scheme. The proposed framework to learn a model predicting the vector field is highly effective under noisy measurements. The approach can handle scenarios where dependent variables are not available at the same temporal grid. We demonstrate the effectiveness of the proposed method to learning models using data obtained from various differential equations. The proposed approach provides a promising methodology to learn dynamic models, where the first-principle understanding remains opaque. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.07140 [pdf, ps, other]

On the Universality of Deep Contextual Language Models

Authors: Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury, Sunayana Sitaram

Abstract: Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, po… ▽ More Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena. △ Less

Submitted 18 December, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: 9 pages

Showing 51–100 of 255 results for author: Goyal, P