-
Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework
Authors:
Abhilash Nandy,
Soumya Sharma,
Shubham Maddhashiya,
Kapil Sachdeva,
Pawan Goyal,
Niloy Ganguly
Abstract:
Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E…
▽ More
Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/EMNLP-2021-Findings, and the corresponding project website is https://sites.google.com/view/emanualqa/home.
△ Less
Submitted 14 September, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
The geometrical meaning of statistical isotropy of smooth random fields in two dimensions
Authors:
Pravabati Chingangbam,
Priya Goyal,
K. P. Yogendran,
Stephen Appleby
Abstract:
We revisit the geometrical meaning of statistical isotropy that is manifest in excursion sets of smooth random fields in two dimensions. Using the contour Minkowski tensor, $\W_1$, as our basic tool we first examine geometrical properties of single structures. For simple closed curves in two dimensions we show that $\W_1$ is proportional to the identity matrix if the curve has $m$-fold symmetry, w…
▽ More
We revisit the geometrical meaning of statistical isotropy that is manifest in excursion sets of smooth random fields in two dimensions. Using the contour Minkowski tensor, $\W_1$, as our basic tool we first examine geometrical properties of single structures. For simple closed curves in two dimensions we show that $\W_1$ is proportional to the identity matrix if the curve has $m$-fold symmetry, with $m\ge 3$. Then we elaborate on how $\W_1$ maps any arbitrary shaped simple closed curve to an ellipse that is unique up to translations of its centroid. We also carry out a comparison of the shape parameters, $α$ and $β$, defined using $\W_1$, with the filamentarity parameter defined using two scalar Minkowski functionals - area and contour length. We show that they contain complementary shape information, with $\W_1$ containing additional information of orientation of structures. Next, we apply our method to boundaries of excursion sets of random fields and examine what statistical isotropy means for the geometry of the excursion sets. Focusing on Gaussian isotropic fields, and using a semi-numerical approach we quantify the effect of finite sampling of the field on the geometry of the excursion sets. In doing so we obtain an analytic expression for $α$ which takes into account the effect of finite sampling. Finally we derive an analytic expression for the ensemble expectation of $\W_1$ for Gaussian anisotropic random fields. Our results provide insights that are useful for designing tests of statistical isotropy using cosmological data.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Box Embeddings: An open-source library for representation learning using geometric structures
Authors:
Tejas Chheda,
Purujit Goyal,
Trang Tran,
Dhruvesh Patel,
Michael Boratko,
Shib Sankar Dasgupta,
Andrew McCallum
Abstract:
A major factor contributing to the success of modern representation learning is the ease of performing various vector operations. Recently, objects with geometric structures (eg. distributions, complex or hyperbolic vectors, or regions such as cones, disks, or boxes) have been explored for their alternative inductive biases and additional representational capacities. In this work, we introduce Box…
▽ More
A major factor contributing to the success of modern representation learning is the ease of performing various vector operations. Recently, objects with geometric structures (eg. distributions, complex or hyperbolic vectors, or regions such as cones, disks, or boxes) have been explored for their alternative inductive biases and additional representational capacities. In this work, we introduce Box Embeddings, a Python library that enables researchers to easily apply and extend probabilistic box embeddings.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews
Authors:
Shruti Singh,
Mayank Singh,
Pawan Goyal
Abstract:
Comparing research papers is a conventional method to demonstrate progress in experimental research. We present COMPARE, a taxonomy and a dataset of comparison discussions in peer reviews of research papers in the domain of experimental deep learning. From a thorough observation of a large set of review sentences, we build a taxonomy of categories in comparison discussions and present a detailed a…
▽ More
Comparing research papers is a conventional method to demonstrate progress in experimental research. We present COMPARE, a taxonomy and a dataset of comparison discussions in peer reviews of research papers in the domain of experimental deep learning. From a thorough observation of a large set of review sentences, we build a taxonomy of categories in comparison discussions and present a detailed annotation scheme to analyze this. Overall, we annotate 117 reviews covering 1,800 sentences. We experiment with various methods to identify comparison sentences in peer reviews and report a maximum F1 Score of 0.49. We also pretrain two language models specifically on ML, NLP, and CV paper abstracts and reviews to learn informative representations of peer reviews. The annotated dataset and the pretrained models are available at https://github.com/shruti-singh/COMPARE .
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
You too Brutus! Trap** Hateful Users in Social Media: Challenges, Solutions & Insights
Authors:
Mithun Das,
Punyajoy Saha,
Ritam Dutt,
Pawan Goyal,
Animesh Mukherjee,
Binny Mathew
Abstract:
Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an ar…
▽ More
Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an array of models ranging from purely textual to graph based to finally semi-supervised techniques using Graph Neural Networks (GNN) that utilize both textual and graph-based features. We run exhaustive experiments on two datasets -- Gab, which is loosely moderated and Twitter, which is strictly moderated. Overall the AGNN model achieves 0.791 macro F1-score on the Gab dataset and 0.780 macro F1-score on the Twitter dataset using only 5% of the labeled instances, considerably outperforming all the other models including the fully supervised ones. We perform detailed error analysis on the best performing text and graph based models and observe that hateful users have unique network neighborhood signatures and the AGNN model benefits by paying attention to these signatures. This property, as we observe, also allows the model to generalize well across platforms in a zero-shot setting. Lastly, we utilize the best performing GNN model to analyze the evolution of hateful users and their targets over time in Gab.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
A Greedy Data Collection Scheme For Linear Dynamical Systems
Authors:
Karim Cherifi,
Pawan Goyal,
Peter Benner
Abstract:
Mathematical models are essential to analyze and understand the dynamics of complex systems. Recently, data-driven methodologies have got a lot of attention which is leveraged by advancements in sensor technology. However, the quality of obtained data plays a vital role in learning a good and reliable model. Therefore, in this paper, we propose an efficient heuristic methodology to collect data bo…
▽ More
Mathematical models are essential to analyze and understand the dynamics of complex systems. Recently, data-driven methodologies have got a lot of attention which is leveraged by advancements in sensor technology. However, the quality of obtained data plays a vital role in learning a good and reliable model. Therefore, in this paper, we propose an efficient heuristic methodology to collect data both in the frequency domain and time-domain, aiming at the best possible information gain from limited experimental data. The efficiency of the proposed methodology is illustrated by means of several examples, and also, its robustness in the presence of noisy data is shown.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument Aggregation
Authors:
Debanjana Kar,
Sudeshna Sarkar,
Pawan Goyal
Abstract:
Most of the existing information extraction frameworks (Wadden et al., 2019; Veysehet al., 2020) focus on sentence-level tasks and are hardly able to capture the consolidated information from a given document. In our endeavour to generate precise document-level information frames from lengthy textual records, we introduce the task of Information Aggregation or Argument Aggregation. More specifical…
▽ More
Most of the existing information extraction frameworks (Wadden et al., 2019; Veysehet al., 2020) focus on sentence-level tasks and are hardly able to capture the consolidated information from a given document. In our endeavour to generate precise document-level information frames from lengthy textual records, we introduce the task of Information Aggregation or Argument Aggregation. More specifically, our aim is to filter irrelevant and redundant argument mentions that were extracted at a sentence level and render a document level information frame. Majority of the existing works have been observed to resolve related tasks of document-level event argument extraction (Yang et al., 2018a; Zheng et al., 2019a) and salient entity identification (Jain et al.,2020) using supervised techniques. To remove dependency from large amounts of labelled data, we explore the task of information aggregation using weakly-supervised techniques. In particular, we present an extractive algorithm with multiple sieves which adopts active learning strategies to work efficiently in low-resource settings. For this task, we have annotated our own test dataset comprising of 131 document information frames and have released the code and dataset to further research prospects in this new domain. To the best of our knowledge, we are the first to establish baseline results for this task in English. Our data and code are publicly available at https://github.com/DebanjanaKar/ArgFuse.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
Authors:
Devaraja Adiga,
Rishabh Kumar,
Amrith Krishna,
Preethi Jyothi,
Ganesh Ramakrishnan,
Pawan Goyal
Abstract:
Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the various linguistic peculiarities present in the language. The Sanskrit language is lexically productive, undergoes euphonic assimilation of phones at the word boundaries and exhibits variations in spelling conventions and in pronunciations. In this work, we propose the first large scale study of automatic speech recognitio…
▽ More
Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the various linguistic peculiarities present in the language. The Sanskrit language is lexically productive, undergoes euphonic assimilation of phones at the word boundaries and exhibits variations in spelling conventions and in pronunciations. In this work, we propose the first large scale study of automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language. We investigate the role of different acoustic model and language model units in ASR systems for Sanskrit. We also propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to the next vowel. We also highlight the importance of choosing graphemic representations for Sanskrit and show the impact of this choice on word error rates (WER). Finally, we extend these insights from Sanskrit ASR for building ASR systems in two other Indic languages, Gujarati and Telugu. For both these languages, our experimental results show that the use of phonetic based graphemic representations in ASR results in performance improvements as compared to ASR systems that use native scripts.
△ Less
Submitted 23 July, 2021; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Multi-VFL: A Vertical Federated Learning System for Multiple Data and Label Owners
Authors:
Vaikkunth Mugunthan,
Pawan Goyal,
Lalana Kagal
Abstract:
Vertical Federated Learning (VFL) refers to the collaborative training of a model on a dataset where the features of the dataset are split among multiple data owners, while label information is owned by a single data owner. In this paper, we propose a novel method, Multi Vertical Federated Learning (Multi-VFL), to train VFL models when there are multiple data and label owners. Our approach is the…
▽ More
Vertical Federated Learning (VFL) refers to the collaborative training of a model on a dataset where the features of the dataset are split among multiple data owners, while label information is owned by a single data owner. In this paper, we propose a novel method, Multi Vertical Federated Learning (Multi-VFL), to train VFL models when there are multiple data and label owners. Our approach is the first to consider the setting where $D$-data owners (across which features are distributed) and $K$-label owners (across which labels are distributed) exist. This proposed configuration allows different entities to train and learn optimal models without having to share their data. Our framework makes use of split learning and adaptive federated optimizers to solve this problem. For empirical evaluation, we run experiments on the MNIST and FashionMNIST datasets. Our results show that using adaptive optimizers for model aggregation fastens convergence and improves accuracy.
△ Less
Submitted 16 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Zero-shot Task Adaptation using Natural Language
Authors:
Prasoon Goyal,
Raymond J. Mooney,
Scott Niekum
Abstract:
Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent. However, as the complexity of tasks grows, it could be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where an agent is given both a demonstration and a description, and must combine information from bo…
▽ More
Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent. However, as the complexity of tasks grows, it could be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where an agent is given both a demonstration and a description, and must combine information from both the modalities. Specifically, given a demonstration for a task (the source task), and a natural language description of the differences between the demonstrated task and a related but different task (the target task), our goal is to train an agent to complete the target task in a zero-shot setting, that is, without any demonstrations for the target task. To this end, we introduce Language-Aided Reward and Value Adaptation (LARVA) which, given a source demonstration and a linguistic description of how the target task differs, learns to output a reward / value function that accurately describes the target task. Our experiments show that on a diverse set of adaptations, our approach is able to complete more than 95% of target tasks when using template-based descriptions, and more than 70% when using free-form natural language.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Synthesis and water permeation studies of polysulfone based composite membranes having vertically aligned CNTs
Authors:
Bhakti Hirani,
P. S. Goyal
Abstract:
Polymeric membranes, including Polysulfone (PSf) membranes, are routinely used for water treatment. It is known for quite some time that water permeability of above membranes can be improved if one incorporates carbon nanotubes (single-walled, SWCNTs or multi-walled, MWCNTs) in to the membrane and aligns them in direction of flow of water. This paper reports a method of synthesizing polymeric memb…
▽ More
Polymeric membranes, including Polysulfone (PSf) membranes, are routinely used for water treatment. It is known for quite some time that water permeability of above membranes can be improved if one incorporates carbon nanotubes (single-walled, SWCNTs or multi-walled, MWCNTs) in to the membrane and aligns them in direction of flow of water. This paper reports a method of synthesizing polymeric membranes having vertically aligned hollow CNTs embedded in them. This involves mixing of nanomagnetic particles in the dope solution and casting of membrane in presence of moderate magnetic fields.
A semi-automatic membrane casting machine which allows casting of membrane in presence magnetic field was designed and fabricated. PSf nanocomposite membranes, having vertically aligned MWCNTSs, were synthesized using above machine. The effect of magnetic field and the exposure time on the water permeation of above membranes was studied. It was seen that water permeability of membrane increases by a factor of 4 when the magnetic field is increased from 0 to 1500 Gauss. There was additional 40% increase in water permeability, when the time for which film was exposed to magnetic field was increased from 5 sec. to 10 sec.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Discovery of Nonlinear Dynamical Systems using a Runge-Kutta Inspired Dictionary-based Sparse Regression Approach
Authors:
Pawan Goyal,
Peter Benner
Abstract:
Discovering dynamical models to describe underlying dynamical behavior is essential to draw decisive conclusions and engineering studies, e.g., optimizing a process. Experimental data availability notwithstanding has increased significantly, but interpretable and explainable models in science and engineering yet remain incomprehensible. In this work, we blend machine learning and dictionary-based…
▽ More
Discovering dynamical models to describe underlying dynamical behavior is essential to draw decisive conclusions and engineering studies, e.g., optimizing a process. Experimental data availability notwithstanding has increased significantly, but interpretable and explainable models in science and engineering yet remain incomprehensible. In this work, we blend machine learning and dictionary-based learning with numerical analysis tools to discover governing differential equations from noisy and sparsely-sampled measurement data. We utilize the fact that given a dictionary containing huge candidate nonlinear functions, dynamical models can often be described by a few appropriately chosen candidates. As a result, we obtain interpretable and parsimonious models which are prone to generalize better beyond the sampling regime. Additionally, we integrate a numerical integration framework with dictionary learning that yields differential equations without requiring or approximating derivative information at any stage. Hence, it is utterly effective in corrupted and sparsely-sampled data. We discuss its extension to governing equations, containing rational nonlinearities that typically appear in biological networks. Moreover, we generalized the method to governing equations that are subject to parameter variations and externally controlled inputs. We demonstrate the efficiency of the method to discover a number of diverse differential equations using noisy measurements, including a model describing neural dynamics, chaotic Lorenz model, Michaelis-Menten Kinetics, and a parameterized Hopf normal form.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Event Argument Extraction using Causal Knowledge Structures
Authors:
Debanjana Kar,
Sudeshna Sarkar,
Pawan Goyal
Abstract:
Event Argument extraction refers to the task of extracting structured information from unstructured text for a particular event of interest. The existing works exhibit poor capabilities to extract causal event arguments like Reason and After Effects. Furthermore, most of the existing works model this task at a sentence level, restricting the context to a local scope. While it may be effective for…
▽ More
Event Argument extraction refers to the task of extracting structured information from unstructured text for a particular event of interest. The existing works exhibit poor capabilities to extract causal event arguments like Reason and After Effects. Furthermore, most of the existing works model this task at a sentence level, restricting the context to a local scope. While it may be effective for short spans of text, for longer bodies of text such as news articles, it has often been observed that the arguments for an event do not necessarily occur in the same sentence as that containing an event trigger. To tackle the issue of argument scattering across sentences, the use of global context becomes imperative in this task. In our work, we propose an external knowledge aided approach to infuse document-level event information to aid the extraction of complex event arguments. We develop a causal network for our event-annotated dataset by extracting relevant event causal structures from ConceptNet and phrases from Wikipedia. We use the extracted event causal features in a bi-directional transformer encoder to effectively capture long-range inter-sentence dependencies. We report the effectiveness of our proposed approach through both qualitative and quantitative analysis. In this task, we establish our findings on an event annotated dataset in 5 Indian languages. This dataset adds further complexity to the task by labelling arguments of entity type (like Time, Place) as well as more complex argument types (like Reason, After-Effect). Our approach achieves state-of-the-art performance across all the five languages. Since our work does not rely on any language-specific features, it can be easily extended to other languages.
△ Less
Submitted 2 May, 2021;
originally announced May 2021.
-
CrysXPP:An Explainable Property Predictor for Crystalline Materials
Authors:
Kishalay Das,
Bidisha Samanta,
Pawan Goyal,
Seung-Cheol Lee,
Satadeep Bhattacharjee,
Niloy Ganguly
Abstract:
We present a deep-learning framework, CrysXPP, to allow rapid prediction of electronic, magnetic and elastic properties of a wide range of materials with reasonable precision. Although our work is consistent with several recent attempts to build deep learning-based property predictors, it overcomes some of their limitations. CrysXPP lowers the need for a large volume of tagged data to train a deep…
▽ More
We present a deep-learning framework, CrysXPP, to allow rapid prediction of electronic, magnetic and elastic properties of a wide range of materials with reasonable precision. Although our work is consistent with several recent attempts to build deep learning-based property predictors, it overcomes some of their limitations. CrysXPP lowers the need for a large volume of tagged data to train a deep learning model by intelligently designing an autoencoder CrysAE and passing the structural information to the property prediction process. The autoencoder in turn is trained on a huge volume of untagged crystal graphs, the designed loss function helps in capturing all their important structural and chemical information. Moreover, CrysXPP uses only a small amount of tagged data for property prediction, and also trains a feature selector that provides interpretability to the results obtained. We demonstrate that CrysXPP convincingly performs better than all the competing and recent baseline algorithms across seven diverse set of properties. Most notably, when given a small amount of experimental data, CrysXPP is consistently able to outperform conventional DFT. We release the large pretrained model CrysAE so that it could be fine-tuned using small amount of tagged data by the research community on various applications with restricted data source.
△ Less
Submitted 2 February, 2022; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Local patch analysis for testing statistical isotropy of the Planck convergence map
Authors:
Priya Goyal,
Pravabati Chingangbam
Abstract:
The small but measurable effect of weak gravitational lensing on the cosmic microwave background radiation provide information about the large-scale distribution of matter in the universe. We use the all sky distribution of matter, as represented by the {\em convergence map} that is inferred from CMB lensing measurement by Planck survey, to test the fundamental assumption of Statistical Isotropy (…
▽ More
The small but measurable effect of weak gravitational lensing on the cosmic microwave background radiation provide information about the large-scale distribution of matter in the universe. We use the all sky distribution of matter, as represented by the {\em convergence map} that is inferred from CMB lensing measurement by Planck survey, to test the fundamental assumption of Statistical Isotropy (SI) of the universe. For the analysis we use the $α$ statistic that is devised from the contour Minkowski tensor, a tensorial generalization of the scalar Minkowski functional, the contour length. In essence, the $α$ statistic captures the ellipticity of isofield contours at any chosen threshold value of a smooth random field and provides a measure of anisotropy. The SI of the observed convergence map is tested against the suite of realistic simulations of the convergence map provided by the Planck collaboration. We first carry out a global analysis using the full sky data after applying the galactic and point sources mask. We find that the observed data is consistent with SI. Further we carry out a local search for departure from SI in small patches of the sky using $α$. This analysis reveals several sky patches which exhibit deviations from simulations with statistical significance higher than 95\% confidence level (CL). Our analysis indicates that the source of the anomalous behaviour of most of the outlier patches is inaccurate estimation of noise. We identify two outlier patches which exhibit anomalous behaviour originating from departure from SI at higher than 95\% CL. Most of the anomalous patches are found to be located roughly along the ecliptic plane or in proximity to the ecliptic poles.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Evaluating Neural Word Embeddings for Sanskrit
Authors:
Jivnesh Sandhan,
Om Adideva,
Digumarthi Komal,
Laxmidhar Behera,
Pawan Goyal
Abstract:
Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embedd…
▽ More
Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embeddings. Word embedding helps to transfer knowledge learned from readily available unlabelled data for improving task-specific performance in low-resource setting. Last decade, there has been much excitement in the field of digitization of Sanskrit. To effectively use such readily available resources, it is very much essential to perform a systematic study on word embedding approaches for the Sanskrit language. In this work, we investigate the effectiveness of word embeddings. We classify word embeddings in broad categories to facilitate systematic experimentation and evaluate them on four intrinsic tasks. We investigate the efficacy of embeddings approaches (originally proposed for languages other than Sanskrit) for Sanskrit along with various challenges posed by language.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Deep Neural Approaches to Relation Triplets Extraction: A Comprehensive Survey
Authors:
Tapas Nayak,
Navonil Majumder,
Pawan Goyal,
Soujanya Poria
Abstract:
Recently, with the advances made in continuous representation of words (word embeddings) and deep neural architectures, many research works are published in the area of relation extraction and it is very difficult to keep track of so many papers. To help future research, we present a comprehensive review of the recently published research works in relation extraction. We mostly focus on relation e…
▽ More
Recently, with the advances made in continuous representation of words (word embeddings) and deep neural architectures, many research works are published in the area of relation extraction and it is very difficult to keep track of so many papers. To help future research, we present a comprehensive review of the recently published research works in relation extraction. We mostly focus on relation extraction using deep neural networks which have achieved state-of-the-art performance on publicly available datasets. In this survey, we cover sentence-level relation extraction to document-level relation extraction, pipeline-based approaches to joint extraction approaches, annotated datasets to distantly supervised datasets along with few very recent research directions such as zero-shot or few-shot relation extraction, noise mitigation in distantly supervised datasets. Regarding neural architectures, we cover convolutional models, recurrent network models, attention network models, and graph convolutional models in this survey.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training
Authors:
Saurabh Sahu,
Palash Goyal
Abstract:
The introduction of Transformer model has led to tremendous advancements in sequence modeling, especially in text domain. However, the use of attention-based models for video understanding is still relatively unexplored. In this paper, we introduce Gated Adversarial Transformer (GAT) to enhance the applicability of attention-based models to videos. GAT uses a multi-level attention gate to model th…
▽ More
The introduction of Transformer model has led to tremendous advancements in sequence modeling, especially in text domain. However, the use of attention-based models for video understanding is still relatively unexplored. In this paper, we introduce Gated Adversarial Transformer (GAT) to enhance the applicability of attention-based models to videos. GAT uses a multi-level attention gate to model the relevance of a frame based on local and global contexts. This enables the model to understand the video at various granularities. Further, GAT uses adversarial training to improve model generalization. We propose temporal attention regularization scheme to improve the robustness of attention modules to adversarial examples. We illustrate the performance of GAT on the large-scale YoutTube-8M data set on the task of video categorization. We further show ablation studies along with quantitative and qualitative analysis to showcase the improvement.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
LQResNet: A Deep Neural Network Architecture for Learning Dynamic Processes
Authors:
Pawan Goyal,
Peter Benner
Abstract:
Mathematical modeling is an essential step, for example, to analyze the transient behavior of a dynamical process and to perform engineering studies such as optimization and control. With the help of first-principles and expert knowledge, a dynamic model can be built, but for complex dynamic processes, appearing, e.g., in biology, chemical plants, neuroscience, financial markets, this often remain…
▽ More
Mathematical modeling is an essential step, for example, to analyze the transient behavior of a dynamical process and to perform engineering studies such as optimization and control. With the help of first-principles and expert knowledge, a dynamic model can be built, but for complex dynamic processes, appearing, e.g., in biology, chemical plants, neuroscience, financial markets, this often remains an onerous task. Hence, data-driven modeling of the dynamics process becomes an attractive choice and is supported by the rapid advancement in sensor and measurement technology. A data-driven approach, namely operator inference framework, models a dynamic process, where a particular structure of the nonlinear term is assumed. In this work, we suggest combining the operator inference with certain deep neural network approaches to infer the unknown nonlinear dynamics of the system. The approach uses recent advancements in deep learning and possible prior knowledge of the process if possible. We also briefly discuss several extensions and advantages of the proposed methodology. We demonstrate that the proposed methodology accomplishes the desired tasks for dynamics processes encountered in neural dynamics and the glycolytic oscillator.
△ Less
Submitted 27 March, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Self-supervised Pretraining of Visual Features in the Wild
Authors:
Priya Goyal,
Mathilde Caron,
Benjamin Lefaudeux,
Min Xu,
Pengchao Wang,
Vivek Pai,
Mannat Singh,
Vitaliy Liptchinsky,
Ishan Misra,
Armand Joulin,
Piotr Bojanowski
Abstract:
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives…
▽ More
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl
△ Less
Submitted 5 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
SWP: Microsecond Network SLOs Without Priorities
Authors:
Kevin Zhao,
Prateesh Goyal,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail lat…
▽ More
The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail latency for some traffic, this comes at a cost of much worse delay behavior for all other traffic on the network. Most operators choose to run their networks at very low average utilization, despite the added cost, and yet still suffer poor tail behavior.
This paper takes a different approach. We build a system, swp, to help operators (and network designers) to understand and control tail latency without relying on priority scheduling. As network workload changes, swp is designed to give real-time advice on the network switch configurations needed to maintain tail latency objectives for each traffic class. The core of swp is an efficient model for simulating the combined effect of traffic characteristics, end-to-end congestion control, and switch scheduling on service-level objectives (SLOs), along with an optimizer that adjusts switch-level scheduling weights assigned to each class. Using simulation across a diverse set of workloads with different SLOs, we show that to meet the same SLOs as swp provides, FIFO would require 65% greater link capacity, and 79% more for scenarios with tight SLOs on bursty traffic classes.
△ Less
Submitted 2 March, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Random Sampling in Reproducing Kernel Subspace of Mixed Lebesgue Spaces
Authors:
Prashant Goyal,
Dhiraj Patel,
Sivananthan Sampath
Abstract:
In this article, we consider the random sampling in the image space $V$ of mixed Lebesgue space $L^{p,q}(\mathbb{R}^{n+1})$ under an idempotent integral operator. We assume some decay and regularity conditions of the kernel and approximate the unit sphere in $V$ on a bounded cube $C_{R,S}$ by a finite-dimensional subspace of $V$. Consequently, the set of concentrated functions is totally bounded.…
▽ More
In this article, we consider the random sampling in the image space $V$ of mixed Lebesgue space $L^{p,q}(\mathbb{R}^{n+1})$ under an idempotent integral operator. We assume some decay and regularity conditions of the kernel and approximate the unit sphere in $V$ on a bounded cube $C_{R,S}$ by a finite-dimensional subspace of $V$. Consequently, the set of concentrated functions is totally bounded. We prove with an overwhelming probability that the random sample set uniformly distributed over $C_{R,S}$ is a stable set of sampling for the set of concentrated functions on $C_{R,S}$. Moreover, we propose an iterative scheme to reconstruct the concentrated signal from its random measurements.
△ Less
Submitted 20 July, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages
Authors:
Jivnesh Sandhan,
Amrith Krishna,
Ashim Gupta,
Laxmidhar Behera,
Pawan Goyal
Abstract:
Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, t…
▽ More
Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, the morphological disambiguation and lack of powerful analyzers pose challenges to get this information for MRLs. To address these challenges, we propose simple auxiliary tasks for pretraining. We perform experiments on 10 MRLs in low-resource settings to measure the efficacy of our proposed pretraining method and observe an average absolute gain of 2 points (UAS) and 3.6 points (LAS). Code and data available at: https://github.com/jivnesh/LCM
△ Less
Submitted 12 April, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles
Authors:
Rajkumar Pujari,
Swara Desai,
Niloy Ganguly,
Pawan Goyal
Abstract:
This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the g…
▽ More
This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article and relation between the sentences. In the HITS schema, the opinionated sentences are treated as Hubs and the facts around these opinions are treated as the Authorities. The algorithm is implemented and evaluated against a set of manually marked data. We show that using HITS significantly improves the precision over the baseline Naive Bayes classifier. We also argue that the proposed method actually discovers the underlying structure of the article, thus extracting various opinions, grouped with supporting facts as well as other supporting opinions from the article.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild
Authors:
Rajdeep Mukherjee,
Shreyas Shetty,
Subrata Chattopadhyay,
Subhadeep Maji,
Samik Datta,
Pawan Goyal
Abstract:
With the exponential growth of online marketplaces and user-generated content therein, aspect-based sentiment analysis has become more important than ever. In this work, we critically review a representative sample of the models published during the past six years through the lens of a practitioner, with an eye towards deployment in production. First, our rigorous empirical evaluation reveals poor…
▽ More
With the exponential growth of online marketplaces and user-generated content therein, aspect-based sentiment analysis has become more important than ever. In this work, we critically review a representative sample of the models published during the past six years through the lens of a practitioner, with an eye towards deployment in production. First, our rigorous empirical evaluation reveals poor reproducibility: an average 4-5% drop in test accuracy across the sample. Second, to further bolster our confidence in empirical evaluation, we report experiments on two challenging data slices, and observe a consistent 12-55% drop in accuracy. Third, we study the possibility of transfer across domains and observe that as little as 10-25% of the domain-specific training dataset, when used in conjunction with datasets from other domains within the same locale, largely closes the gap between complete cross-domain and complete in-domain predictive performance. Lastly, we open-source two large-scale annotated review corpora from a large e-commerce portal in India in order to aid the study of replicability and transfer, with the hope that it will fuel further growth of the field.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
Joint Autoregressive and Graph Models for Software and Developer Social Networks
Authors:
Rima Hazra,
Hardik Aggarwal,
Pawan Goyal,
Animesh Mukherjee,
Soumen Chakrabarti
Abstract:
Social network research has focused on hyperlink graphs, bibliographic citations, friend/follow patterns, influence spread, etc. Large software repositories also form a highly valuable networked artifact, usually in the form of a collection of packages, their developers, dependencies among them, and bug reports. This "social network of code" is rarely studied by social network researchers. We intr…
▽ More
Social network research has focused on hyperlink graphs, bibliographic citations, friend/follow patterns, influence spread, etc. Large software repositories also form a highly valuable networked artifact, usually in the form of a collection of packages, their developers, dependencies among them, and bug reports. This "social network of code" is rarely studied by social network researchers. We introduce two new problems in this setting. These problems are well-motivated in the software engineering community but not closely studied by social network scientists. The first is to identify packages that are most likely to be troubled by bugs in the immediate future, thereby demanding the greatest attention. The second is to recommend developers to packages for the next development cycle. Simple autoregression can be applied to historical data for both problems, but we propose a novel method to integrate network-derived features and demonstrate that our method brings additional benefits. Apart from formalizing these problems and proposing new baseline approaches, we prepare and contribute a substantial dataset connecting multiple attributes built from the long-term history of 20 releases of Ubuntu, growing to over 25,000 packages with their dependency links, maintained by over 3,800 developers, with over 280k bug reports.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Medical Entity Linking using Triplet Network
Authors:
Ishani Mondal,
Sukannya Purkayastha,
Sudeshna Sarkar,
Pawan Goyal,
Jitesh Pillai,
Amitava Bhattacharyya,
Mahanandeeshwar Gattu
Abstract:
Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is ex…
▽ More
Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is executed in two phases: candidate generation and candidate scoring. In this paper, we present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention. We make use of the Triplet Network for candidate ranking. While the existing methods have used carefully generated sieves and external resources for candidate generation, we introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules. Experimental results on the standard benchmark NCBI disease dataset demonstrate that our system outperforms the prior methods by a significant margin.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Authors:
Binny Mathew,
Punyajoy Saha,
Seid Muhie Yimam,
Chris Biemann,
Pawan Goyal,
Animesh Mukherjee
Abstract:
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three…
▽ More
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public at https://github.com/punyajoy/HateXplain
△ Less
Submitted 12 April, 2022; v1 submitted 18 December, 2020;
originally announced December 2020.
-
An End-to-End Solution for Named Entity Recognition in eCommerce Search
Authors:
Xiang Cheng,
Mitchell Bowden,
Bhushan Ramesh Bhange,
Priyanka Goyal,
Thomas Packer,
Faizan Javed
Abstract:
Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shop** experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are stil…
▽ More
Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shop** experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework "TripleLearn" which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on homedepot.com for more than 9 months, boosting search conversions and revenue. Beyond our application, this TripleLearn framework, as well as the end-to-end process, is model-independent and problem-independent, so it can be generalized to more industrial applications, especially to the eCommerce industry which has similar data foundations and problems.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Finding Prerequisite Relations between Concepts using Textbook
Authors:
Shivam Pal,
Vipul Arora,
Pawan Goyal
Abstract:
A prerequisite is anything that you need to know or understand first before attempting to learn or understand something new. In the current work, we present a method of finding prerequisite relations between concepts using related textbooks. Previous researchers have focused on finding these relations using Wikipedia link structure through unsupervised and supervised learning approaches. In the cu…
▽ More
A prerequisite is anything that you need to know or understand first before attempting to learn or understand something new. In the current work, we present a method of finding prerequisite relations between concepts using related textbooks. Previous researchers have focused on finding these relations using Wikipedia link structure through unsupervised and supervised learning approaches. In the current work, we have proposed two methods, one is statistical method and another is learning-based method. We mine the rich and structured knowledge available in the textbooks to find the content for those concepts and the order in which they are discussed. Using this information, proposed statistical method estimates explicit as well as implicit prerequisite relations between concepts. During experiments, we have found performance of proposed statistical method is better than the popular RefD method, which uses Wikipedia link structure. And proposed learning-based method has shown a significant increase in the efficiency of supervised learning method when compared with graph and text-based learning-based approaches.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Hierarchical Transformer for Task Oriented Dialog Systems
Authors:
Bishal Santra,
Potnuru Anusha,
Pawan Goyal
Abstract:
Generative models for dialog systems have gained much interest because of the recent success of RNN and Transformer based models in tasks like question answering and summarization. Although the task of dialog response generation is generally seen as a sequence-to-sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models.…
▽ More
Generative models for dialog systems have gained much interest because of the recent success of RNN and Transformer based models in tasks like question answering and summarization. Although the task of dialog response generation is generally seen as a sequence-to-sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn meaningful utterance and conversation level features, Sordoni et al. (2015b); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer-based models dominating the seq2seq problems lately, the natural question to ask is the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we propose a generalized framework for Hierarchical Transformer Encoders and show how a standard transformer can be morphed into any hierarchical encoder, including HRED and HIBERT like models, by using specially designed attention masks and positional encodings. We demonstrate that Hierarchical Encoding helps achieve better natural language understanding of the contexts in transformer-based models for task-oriented dialog systems through a wide range of experiments.
△ Less
Submitted 9 May, 2021; v1 submitted 24 October, 2020;
originally announced November 2020.
-
Site-to-Site Internet Traffic Control
Authors:
Frank Cangialosi,
Akshay Narayan,
Prateesh Goyal,
Radhika Mittal,
Mohammad Alizadeh,
Hari Balakrishnan
Abstract:
Queues allow network operators to control traffic: where queues build, they can enforce scheduling and sha** policies. In the Internet today, however, there is a mismatch between where queues build and where control is most effectively enforced; queues build at bottleneck links that are often not under the control of the data sender. To resolve this mismatch, we propose a new kind of middlebox,…
▽ More
Queues allow network operators to control traffic: where queues build, they can enforce scheduling and sha** policies. In the Internet today, however, there is a mismatch between where queues build and where control is most effectively enforced; queues build at bottleneck links that are often not under the control of the data sender. To resolve this mismatch, we propose a new kind of middlebox, called Bundler. Bundler uses a novel inner control loop between a sendbox (in the sender's site) and a receivebox (in the receiver's site) to determine the aggregate rate for the bundle, leaving the end-to-end connections and their control loops intact. Enforcing this sending rate ensures that bottleneck queues that would have built up from the bundle's packets now shift from the bottleneck to the sendbox. The sendbox then exercises control over its traffic by scheduling packets to achieve higher-level objectives. We have implemented Bundler in Linux and evaluated it with real-world and emulation experiments. We find that Bundler allows the sender-chosen policy to be effective: when configured to implement Stochastic Fairness Queueing (SFQ), it improves median flow completion time (FCT) by between 28% and 97% across various scenarios.
△ Less
Submitted 27 April, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows
Authors:
Peter Benner,
Pawan Goyal,
Jan Heiland,
Igor Pontes Duff
Abstract:
Reduced-order modeling has a long tradition in computational fluid dynamics. The ever-increasing significance of data for the synthesis of low-order models is well reflected in the recent successes of data-driven approaches such as Dynamic Mode Decomposition and Operator Inference. With this work, we suggest a new approach to learning structured low-order models for incompressible flow from data t…
▽ More
Reduced-order modeling has a long tradition in computational fluid dynamics. The ever-increasing significance of data for the synthesis of low-order models is well reflected in the recent successes of data-driven approaches such as Dynamic Mode Decomposition and Operator Inference. With this work, we suggest a new approach to learning structured low-order models for incompressible flow from data that can be used for engineering studies such as control, optimization, and simulation. To that end, we utilize the intrinsic structure of the Navier-Stokes equations for incompressible flows and show that learning dynamics of the velocity and pressure can be decoupled, thus leading to an efficient operator inference approach for learning the underlying dynamics of incompressible flows. Furthermore, we show the operator inference performance in learning low-order models using two benchmark problems and compare with an intrusive method, namely proper orthogonal decomposition, and other data-driven approaches.
△ Less
Submitted 7 December, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature
Authors:
Souradip Guha,
Ankan Mullick,
Jatin Agrawal,
Swetarekha Ram,
Samir Ghui,
Seung-Cheol Lee,
Satadeep Bhattacharjee,
Pawan Goyal
Abstract:
The number of published articles in the field of materials science is growing rapidly every year. This comparatively unstructured data source, which contains a large amount of information, has a restriction on its re-usability, as the information needed to carry out further calculations using the data in it must be extracted manually. It is very important to obtain valid and contextually correct i…
▽ More
The number of published articles in the field of materials science is growing rapidly every year. This comparatively unstructured data source, which contains a large amount of information, has a restriction on its re-usability, as the information needed to carry out further calculations using the data in it must be extracted manually. It is very important to obtain valid and contextually correct information from the online (offline) data, as it can be useful not only to generate inputs for further calculations, but also to incorporate them into a querying framework. Retaining this context as a priority, we have developed an automated tool, MatScIE (Material Scince Information Extractor) that can extract relevant information from material science literature and make a structured database that is much easier to use for material simulations. Specifically, we extract the material details, methods, code, parameters, and structure from the various research articles. Finally, we created a web application where users can upload published articles and view/download the information obtained from this tool and can create their own databases for their personal uses.
△ Less
Submitted 22 January, 2021; v1 submitted 14 September, 2020;
originally announced September 2020.
-
PixL2R: Guiding Reinforcement Learning Using Natural Language by Map** Pixels to Rewards
Authors:
Prasoon Goyal,
Scott Niekum,
Raymond J. Mooney
Abstract:
Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environmen…
▽ More
Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy learning. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improves the sample efficiency of policy learning, both in sparse and dense reward settings.
△ Less
Submitted 19 November, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Data-Driven Learning of Reduced-order Dynamics for a Parametrized Shallow Water Equation
Authors:
Süleyman Yıldız,
Pawan Goyal,
Peter Benner,
Bülent Karasözen
Abstract:
This paper discusses a non-intrusive data-driven model order reduction method that learns low-dimensional dynamical models for a parametrized shallow water equation. We consider the shallow water equation in non-traditional form (NTSWE). We focus on learning low-dimensional models in a non-intrusive way. That means, we assume not to have access to a discretized form of the NTSWE in any form. Inste…
▽ More
This paper discusses a non-intrusive data-driven model order reduction method that learns low-dimensional dynamical models for a parametrized shallow water equation. We consider the shallow water equation in non-traditional form (NTSWE). We focus on learning low-dimensional models in a non-intrusive way. That means, we assume not to have access to a discretized form of the NTSWE in any form. Instead, we have snapshots that are obtained using a black-box solver. Consequently, we aim at learning reduced-order models only from the snapshots. Precisely, a reduced-order model is learnt by solving an appropriate least-squares optimization problem in a low-dimensional subspace. Furthermore, we discuss computational challenges that particularly arise from the optimization problem being ill-conditioned. Moreover, we extend the non-intrusive model order reduction framework to a parametric case where we make use of the parameter dependency at the level of the partial differential equation. We illustrate the efficiency of the proposed non-intrusive method to construct reduced-order models for NTSWE and compare it with an intrusive method (proper orthogonal decomposition). We furthermore discuss the predictive capabilities of both models outside the range of the training data.
△ Less
Submitted 4 August, 2020; v1 submitted 28 July, 2020;
originally announced July 2020.
-
OccamNet: A Fast Neural Model for Symbolic Regression at Scale
Authors:
Owen Dugan,
Rumen Dangovski,
Allan Costa,
Samuel Kim,
Pawan Goyal,
Joseph Jacobson,
Marin Soljačić
Abstract:
Neural networks' expressiveness comes at the cost of complex, black-box models that often extrapolate poorly beyond the domain of the training dataset, conflicting with the goal of finding compact analytic expressions to describe scientific data. We introduce OccamNet, a neural network model that finds interpretable, compact, and sparse symbolic fits to data, à la Occam's razor. Our model defines…
▽ More
Neural networks' expressiveness comes at the cost of complex, black-box models that often extrapolate poorly beyond the domain of the training dataset, conflicting with the goal of finding compact analytic expressions to describe scientific data. We introduce OccamNet, a neural network model that finds interpretable, compact, and sparse symbolic fits to data, à la Occam's razor. Our model defines a probability distribution over functions with efficient sampling and function evaluation. We train by sampling functions and biasing the probability mass toward better fitting solutions, backpropagating using cross-entropy matching in a reinforcement-learning loss. OccamNet can identify symbolic fits for a variety of problems, including analytic and non-analytic functions, implicit functions, and simple image classification, and can outperform state-of-the-art symbolic regression methods on real-world regression datasets. Our method requires a minimal memory footprint, fits complicated functions in minutes on a single CPU, and scales on a GPU.
△ Less
Submitted 27 November, 2023; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Logic Constrained Pointer Networks for Interpretable Textual Similarity
Authors:
Subhadeep Maji,
Rohan Kumar,
Manish Bansal,
Kalyani Roy,
Pawan Goyal
Abstract:
Systematically discovering semantic relationships in text is an important and extensively studied area in Natural Language Processing, with various tasks such as entailment, semantic similarity, etc. Decomposability of sentence-level scores via subsequence alignments has been proposed as a way to make models more interpretable. We study the problem of aligning components of sentences leading to an…
▽ More
Systematically discovering semantic relationships in text is an important and extensively studied area in Natural Language Processing, with various tasks such as entailment, semantic similarity, etc. Decomposability of sentence-level scores via subsequence alignments has been proposed as a way to make models more interpretable. We study the problem of aligning components of sentences leading to an interpretable model for semantic textual similarity. In this paper, we introduce a novel pointer network based model with a sentinel gating function to align constituent chunks, which are represented using BERT. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. Finally, to guide the network with structured external knowledge, we introduce first-order logic constraints based on ConceptNet and syntactic knowledge. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task, showing large improvements over the existing solutions. Source code is available at https://github.com/manishb89/interpretable_sentence_similarity
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Authors:
Mathilde Caron,
Ishan Misra,
Julien Mairal,
Priya Goyal,
Piotr Bojanowski,
Armand Joulin
Abstract:
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of…
▽ More
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.
△ Less
Submitted 8 January, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews
Authors:
Rajdeep Mukherjee,
Hari Chandana Peruri,
Uppada Vishnu,
Pawan Goyal,
Sourangshu Bhattacharya,
Niloy Ganguly
Abstract:
Manually extracting relevant aspects and opinions from large volumes of user-generated text is a time-consuming process. Summaries, on the other hand, help readers with limited time budgets to quickly consume the key ideas from the data. State-of-the-art approaches for multi-document summarization, however, do not consider user preferences while generating summaries. In this work, we argue the nee…
▽ More
Manually extracting relevant aspects and opinions from large volumes of user-generated text is a time-consuming process. Summaries, on the other hand, help readers with limited time budgets to quickly consume the key ideas from the data. State-of-the-art approaches for multi-document summarization, however, do not consider user preferences while generating summaries. In this work, we argue the need and propose a solution for generating personalized aspect-based opinion summaries from large collections of online tourist reviews. We let our readers decide and control several attributes of the summary such as the length and specific aspects of interest among others. Specifically, we take an unsupervised approach to extract coherent aspects from tourist reviews posted on TripAdvisor. We then propose an Integer Linear Programming (ILP) based extractive technique to select an informative subset of opinions around the identified aspects while respecting the user-specified values for various control parameters. Finally, we evaluate and compare our summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results.
△ Less
Submitted 9 June, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Hierarchical Class-Based Curriculum Loss
Authors:
Palash Goyal,
Shalini Ghosh
Abstract:
Classification algorithms in machine learning often assume a flat label space. However, most real world data have dependencies between the labels, which can often be captured by using a hierarchy. Utilizing this relation can help develop a model capable of satisfying the dependencies and improving model accuracy and interpretability. Further, as different levels in the hierarchy correspond to diff…
▽ More
Classification algorithms in machine learning often assume a flat label space. However, most real world data have dependencies between the labels, which can often be captured by using a hierarchy. Utilizing this relation can help develop a model capable of satisfying the dependencies and improving model accuracy and interpretability. Further, as different levels in the hierarchy correspond to different granularities, penalizing each label equally can be detrimental to model learning. In this paper, we propose a loss function, hierarchical curriculum loss, with two properties: (i) satisfy hierarchical constraints present in the label space, and (ii) provide non-uniform weights to labels based on their levels in the hierarchy, learned implicitly by the training paradigm. We theoretically show that the proposed loss function is a tighter bound of 0-1 loss compared to any other loss satisfying the hierarchical constraints. We test our loss function on real world image data sets, and show that it significantly substantially outperforms multiple baselines.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Aspect-based Sentiment Analysis of Scientific Reviews
Authors:
Souvic Chakraborty,
Pawan Goyal,
Animesh Mukherjee
Abstract:
Scientific papers are complex and understanding the usefulness of these papers requires prior knowledge. Peer reviews are comments on a paper provided by designated experts on that field and hold a substantial amount of information, not only for the editors and chairs to make the final decision, but also to judge the potential impact of the paper. In this paper, we propose to use aspect-based sent…
▽ More
Scientific papers are complex and understanding the usefulness of these papers requires prior knowledge. Peer reviews are comments on a paper provided by designated experts on that field and hold a substantial amount of information, not only for the editors and chairs to make the final decision, but also to judge the potential impact of the paper. In this paper, we propose to use aspect-based sentiment analysis of scientific reviews to be able to extract useful information, which correlates well with the accept/reject decision.
While working on a dataset of close to 8k reviews from ICLR, one of the top conferences in the field of machine learning, we use an active learning framework to build a training dataset for aspect prediction, which is further used to obtain the aspects and sentiments for the entire dataset. We show that the distribution of aspect-based sentiments obtained from a review is significantly different for accepted and rejected papers. We use the aspect sentiments from these reviews to make an intriguing observation, certain aspects present in a paper and discussed in the review strongly determine the final recommendation. As a second objective, we quantify the extent of disagreement among the reviewers refereeing a paper. We also investigate the extent of disagreement between the reviewers and the chair and find that the inter-reviewer disagreement may have a link to the disagreement with the chair. One of the most interesting observations from this study is that reviews, where the reviewer score and the aspect sentiments extracted from the review text written by the reviewer are consistent, are also more likely to be concurrent with the chair's decision.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Using Large Pretrained Language Models for Answering User Queries from Product Specifications
Authors:
Kalyani Roy,
Smit Shah,
Nithish Pai,
Jaidam Ramtej,
Prajit Prashant Nadkarn,
Jyotirmoy Banerjee,
Pawan Goyal,
Surender Kumar
Abstract:
While buying a product from the e-commerce websites, customers generally have a plethora of questions. From the perspective of both the e-commerce service provider as well as the customers, there must be an effective question answering system to provide immediate answers to the user queries. While certain questions can only be answered after using the product, there are many questions which can be…
▽ More
While buying a product from the e-commerce websites, customers generally have a plethora of questions. From the perspective of both the e-commerce service provider as well as the customers, there must be an effective question answering system to provide immediate answers to the user queries. While certain questions can only be answered after using the product, there are many questions which can be answered from the product specification itself. Our work takes a first step in this direction by finding out the relevant product specifications, that can help answering the user questions. We propose an approach to automatically create a training dataset for this problem. We utilize recently proposed XLNet and BERT architectures for this problem and find that they provide much better performance than the Siamese model, previously applied for this problem. Our model gives a good performance even when trained on one vertical and tested across different verticals.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Evaluating Neural Morphological Taggers for Sanskrit
Authors:
Ashim Gupta,
Amrith Krishna,
Pawan Goyal,
Oliver Hellwig
Abstract:
Neural sequence labelling approaches have achieved state of the art results in morphological tagging. We evaluate the efficacy of four standard sequence labelling models on Sanskrit, a morphologically rich, fusional Indian language. As its label space can theoretically contain more than 40,000 labels, systems that explicitly model the internal structure of a label are more suited for the task, bec…
▽ More
Neural sequence labelling approaches have achieved state of the art results in morphological tagging. We evaluate the efficacy of four standard sequence labelling models on Sanskrit, a morphologically rich, fusional Indian language. As its label space can theoretically contain more than 40,000 labels, systems that explicitly model the internal structure of a label are more suited for the task, because of their ability to generalise to labels not seen during training. We find that although some neural models perform better than others, one of the common causes for error for all of these models is mispredictions due to syncretism.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
A Non-Intrusive Method to Inferring Linear Port-Hamiltonian Realizations using Time-Domain Data
Authors:
Karim Cherifi,
Pawan Goyal,
Peter Benner
Abstract:
Port-Hamiltonian systems have gained a lot of attention in recent years due to their inherent valuable properties in modeling and control. In this paper, we are interested in constructing linear port-Hamiltonian systems from time-domain input-output data. We discuss a non-intrusive methodology that is comprised of two main ingredients -- (a) inferring frequency response data from time-domain data…
▽ More
Port-Hamiltonian systems have gained a lot of attention in recent years due to their inherent valuable properties in modeling and control. In this paper, we are interested in constructing linear port-Hamiltonian systems from time-domain input-output data. We discuss a non-intrusive methodology that is comprised of two main ingredients -- (a) inferring frequency response data from time-domain data and (b) constructing an underlying port-Hamiltonian realization using the inferred frequency response data. We illustrate the proposed methodology by means of two numerical examples and also compare it with two other system identification methods to infer the frequency response from the input-output data.
△ Less
Submitted 17 November, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Neural Approaches for Data Driven Dependency Parsing in Sanskrit
Authors:
Amrith Krishna,
Ashim Gupta,
Deepak Garasangi,
Jivnesh Sandhan,
Pavankumar Satuluri,
Pawan Goyal
Abstract:
Data-driven approaches for dependency parsing have been of great interest in Natural Language Processing for the past couple of decades. However, Sanskrit still lacks a robust purely data-driven dependency parser, probably with an exception to Krishna (2019). This can primarily be attributed to the lack of availability of task-specific labelled data and the morphologically rich nature of the langu…
▽ More
Data-driven approaches for dependency parsing have been of great interest in Natural Language Processing for the past couple of decades. However, Sanskrit still lacks a robust purely data-driven dependency parser, probably with an exception to Krishna (2019). This can primarily be attributed to the lack of availability of task-specific labelled data and the morphologically rich nature of the language. In this work, we evaluate four different data-driven machine learning models, originally proposed for different languages, and compare their performances on Sanskrit data. We experiment with 2 graph based and 2 transition based parsers. We compare the performance of each of the models in a low-resource setting, with 1,500 sentences for training. Further, since our focus is on the learning power of each of the models, we do not incorporate any Sanskrit specific features explicitly into the models, and rather use the default settings in each of the paper for obtaining the feature functions. In this work, we analyse the performance of the parsers using both an in-domain and an out-of-domain test dataset. We also investigate the impact of word ordering in which the sentences are provided as input to these systems, by parsing verses and their corresponding prose order (anvaya) sentences.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion
Authors:
Bishal Santra,
Prakhar Sharma,
Sumegh Roychowdhury,
Pawan Goyal
Abstract:
In this paper, we have explored the effects of different minibatch sampling techniques in Knowledge Graph Completion. Knowledge Graph Completion (KGC) or Link Prediction is the task of predicting missing facts in a knowledge graph. KGC models are usually trained using margin, soft-margin or cross-entropy loss function that promotes assigning a higher score or probability for true fact triplets. Mi…
▽ More
In this paper, we have explored the effects of different minibatch sampling techniques in Knowledge Graph Completion. Knowledge Graph Completion (KGC) or Link Prediction is the task of predicting missing facts in a knowledge graph. KGC models are usually trained using margin, soft-margin or cross-entropy loss function that promotes assigning a higher score or probability for true fact triplets. Minibatch gradient descent is used to optimize these loss functions for training the KGC models. But, as each minibatch consists of only a few randomly sampled triplets from a large knowledge graph, any entity that occurs in a minibatch, occurs only once in most cases. Because of this, these loss functions ignore all other neighbors of any entity, whose embedding is being updated at some minibatch step. In this paper, we propose a new random-walk based minibatch sampling technique for training KGC models that optimizes the loss incurred by a minibatch of closely connected subgraph of triplets instead of randomly selected ones. We have shown results of experiments for different models and datasets with our sampling technique and found that the proposed sampling algorithm has varying effects on these datasets/models. Specifically, we find that our proposed method achieves state-of-the-art performance on the DB100K dataset.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Low-dimensional approximations of high-dimensional asset price models
Authors:
Martin Redmann,
Christian Bayer,
Pawan Goyal
Abstract:
We consider high-dimensional asset price models that are reduced in their dimension in order to reduce the complexity of the problem or the effect of the curse of dimensionality in the context of option pricing. We apply model order reduction (MOR) to obtain a reduced system. MOR has been previously studied for asymptotically stable controlled stochastic systems with zero initial conditions. Howev…
▽ More
We consider high-dimensional asset price models that are reduced in their dimension in order to reduce the complexity of the problem or the effect of the curse of dimensionality in the context of option pricing. We apply model order reduction (MOR) to obtain a reduced system. MOR has been previously studied for asymptotically stable controlled stochastic systems with zero initial conditions. However, stochastic differential equations modeling price processes are uncontrolled, have non-zero initial states and are often unstable. Therefore, we extend MOR schemes and combine ideas of techniques known for deterministic systems. This leads to a method providing a good pathwise approximation. After explaining the reduction procedure, the error of the approximation is analyzed and the performance of the algorithm is shown conducting several numerical experiments. Within the numerics section, the benefit of the algorithm in the context of option pricing is pointed out.
△ Less
Submitted 1 April, 2021; v1 submitted 15 March, 2020;
originally announced March 2020.
-
Low-Rank and Total Variation Regularization and Its Application to Image Recovery
Authors:
Pawan Goyal,
Hussam Al Daas,
Peter Benner
Abstract:
In this paper, we study the problem of image recovery from given partial (corrupted) observations. Recovering an image using a low-rank model has been an active research area in data analysis and machine learning. But often, images are not only of low-rank but they also exhibit sparsity in a transformed space. In this work, we propose a new problem formulation in such a way that we seek to recover…
▽ More
In this paper, we study the problem of image recovery from given partial (corrupted) observations. Recovering an image using a low-rank model has been an active research area in data analysis and machine learning. But often, images are not only of low-rank but they also exhibit sparsity in a transformed space. In this work, we propose a new problem formulation in such a way that we seek to recover an image that is of low-rank and has sparsity in a transformed domain. We further discuss various non-convex non-smooth surrogates of the rank function, leading to a relaxed problem. Then, we present an efficient iterative scheme to solve the relaxed problem that essentially employs the (weighted) singular value thresholding at each iteration. Furthermore, we discuss the convergence properties of the proposed iterative method. We perform extensive experiments, showing that the proposed algorithm outperforms state-of-the-art methodologies in recovering images.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Cross-modal Learning for Multi-modal Video Categorization
Authors:
Palash Goyal,
Saurabh Sahu,
Shalini Ghosh,
Chul Lee
Abstract:
Multi-modal machine learning (ML) models can process data in multiple modalities (e.g., video, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding, activity recognition). In this paper, we focus on the problem of video categorization using a multi-modal ML technique. In particular, we have developed a novel multi-modal ML app…
▽ More
Multi-modal machine learning (ML) models can process data in multiple modalities (e.g., video, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding, activity recognition). In this paper, we focus on the problem of video categorization using a multi-modal ML technique. In particular, we have developed a novel multi-modal ML approach that we call "cross-modal learning", where one modality influences another but only when there is correlation between the modalities -- for that, we first train a correlation tower that guides the main multi-modal video categorization tower in the model. We show how this cross-modal principle can be applied to different types of models (e.g., RNN, Transformer, NetVLAD), and demonstrate through experiments how our proposed multi-modal video categorization models with cross-modal learning out-perform strong state-of-the-art baseline models.
△ Less
Submitted 5 June, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.