-
General Purpose Verification for Chain of Thought Prompting
Authors:
Robert Vacareanu,
Anurag Pratik,
Evangelia Spiliopoulou,
Zheng Qi,
Giovanni Paolini,
Neha Anna John,
Jie Ma,
Yassine Benajiba,
Miguel Ballesteros
Abstract:
Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should…
▽ More
Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards high-quality solutions, we use the perplexity of the reasoning steps as an additional verifier. We evaluate our method on 4 distinct types of reasoning tasks, spanning a total of 9 different datasets. Experiments show that our method is always better than vanilla generation, and, in 6 out of the 9 datasets, it is better than best-of N sampling which samples N reasoning chains and picks the lowest perplexity generation.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
NewsQs: Multi-Source Question Generation for the Inquiring Mind
Authors:
Alyssa Hwang,
Kalpit Dixit,
Miguel Ballesteros,
Yassine Benajiba,
Vittorio Castelli,
Markus Dreyer,
Mohit Bansal,
Kathleen McKeown
Abstract:
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judg…
▽ More
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.
△ Less
Submitted 15 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Detection of an Arbitrary Number of Communities in a Block Spin Ising Model
Authors:
Miguel Ballesteros,
Ramsés H. Mena,
José Luis Pérez,
Gabor Toth
Abstract:
We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical s…
▽ More
We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical size and identical interactions. Our results can be applied to the most realistic situations, in which there are many groups of different sizes and different interactions. In addition, we give an explicit algorithm that permits the reconstruction of the structure of the model from a sample of observations based on the comparison of empirical correlations of the spin variables, thus unveiling easy applications of the model to real-world voting data and communities in biology.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Characterizing and Measuring Linguistic Dataset Drift
Authors:
Tyler A. Chang,
Kishaloy Halder,
Neha Anna John,
Yogarshi Vyas,
Yassine Benajiba,
Miguel Ballesteros,
Dan Roth
Abstract:
NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often…
▽ More
NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often used in practice. In this paper, we propose three dimensions of linguistic dataset drift: vocabulary, structural, and semantic drift. These dimensions correspond to content word frequency divergences, syntactic divergences, and meaning changes not captured by word frequencies (e.g. lexical semantic change). We propose interpretable metrics for all three drift dimensions, and we modify past performance prediction methods to predict model performance at both the example and dataset level for English sentiment classification and natural language inference. We find that our drift metrics are more effective than previous metrics at predicting out-of-domain model accuracies (mean 16.8% root mean square error decrease), particularly when compared to popular fine-tuned embedding distances (mean 47.7% error decrease). Fine-tuned embedding distances are much more effective at ranking individual examples by expected performance, but decomposing into vocabulary, structural, and semantic drift produces the best example rankings of all considered model-agnostic drift metrics (mean 6.7% ROC AUC increase).
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Taxonomy Expansion for Named Entity Recognition
Authors:
Karthikeyan K,
Yogarshi Vyas,
Jie Ma,
Giovanni Paolini,
Neha Anna John,
Shuai Wang,
Yassine Benajiba,
Vittorio Castelli,
Dan Roth,
Miguel Ballesteros
Abstract:
Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To re…
▽ More
Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To remedy this, we propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets. We experiment with 6 diverse datasets and show that PLM consistently performs better than most other approaches (0.5 - 2.5 F1), including in novel settings for taxonomy expansion not considered in prior work. The gap between PLM and all other approaches is especially large in settings where there is limited data available for the additional entity types (as much as 11 F1), thus suggesting a more cost effective approaches to taxonomy expansion.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
A Weak Supervision Approach for Few-Shot Aspect Based Sentiment
Authors:
Robert Vacareanu,
Siddharth Varia,
Kishaloy Halder,
Shuai Wang,
Giovanni Paolini,
Neha Anna John,
Miguel Ballesteros,
Smaranda Muresan
Abstract:
We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning.…
▽ More
We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning. Our proposed method preserves the full fine-tuning performance while showing significant improvements (15.84% absolute F1) in the few-shot learning scenario for the harder tasks. In zero-shot (i.e., without fine-tuning), our method outperforms the previous state of the art on the aspect extraction sentiment classification (AESC) task and is, additionally, capable of performing the harder aspect sentiment triplet extraction (ASTE) task.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Comparing Biases and the Impact of Multilingual Training across Multiple Languages
Authors:
Sharon Levy,
Neha Anna John,
Ling Liu,
Yogarshi Vyas,
Jie Ma,
Yoshinari Fu**uma,
Miguel Ballesteros,
Vittorio Castelli,
Dan Roth
Abstract:
Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases com…
▽ More
Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across languages and how the biases are affected when training a model on multilingual data versus monolingual data. We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task to observe whether specific demographics are viewed more positively. We study bias similarities and differences across these languages and investigate the impact of multilingual vs. monolingual training data. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture (e.g. majority religions and nationalities). Additionally, we find an increased variation in predictions across protected groups, indicating bias amplification, after multilingual finetuning in comparison to multilingual pretraining.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization
Authors:
Ming Shen,
Jie Ma,
Shuai Wang,
Yogarshi Vyas,
Kalpit Dixit,
Miguel Ballesteros,
Yassine Benajiba
Abstract:
Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets…
▽ More
Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views
Authors:
Katerina Margatina,
Shuai Wang,
Yogarshi Vyas,
Neha Anna John,
Yassine Benajiba,
Miguel Ballesteros
Abstract:
Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is c…
▽ More
Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is crucial that widely used language models remain up-to-date with the ever-evolving factual updates of the real world. Specifically, we provide a holistic framework that (1) dynamically creates temporal test sets of any time granularity (e.g. month, quarter, year) of factual data from Wikidata, (2) constructs fine-grained splits of tests (e.g. updated, new, unchanged facts) to ensure comprehensive analysis, and (3) evaluates MLMs in three distinct ways (single-token probing, multi-token generation, MLM scoring). In contrast to prior work, our framework aims to unveil how robust an MLM is over time and thus to provide a signal in case it has become outdated, by leveraging multiple views of evaluation.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Levinson theorem for discrete Schrödinger operators on the line with matrix potentials having a first moment
Authors:
Miguel Ballesteros,
Gerardo Franco Córdova,
Ivan Naumkin,
Hermann Schulz-Baldes
Abstract:
This paper proves new results on spectral and scattering theory for matrix-valued Schrödinger operators on the discrete line with non-compactly supported perturbations whose first moments are assumed to exist. In particular, a Levinson theorem is proved, in which a relation between scattering data and spectral properties (bound and half bound states) of the corresponding Hamiltonians is derived. T…
▽ More
This paper proves new results on spectral and scattering theory for matrix-valued Schrödinger operators on the discrete line with non-compactly supported perturbations whose first moments are assumed to exist. In particular, a Levinson theorem is proved, in which a relation between scattering data and spectral properties (bound and half bound states) of the corresponding Hamiltonians is derived. The proof is based on stationary scattering theory with prominent use of Jost solutions at complex energies that are controlled by Volterra-type integral equations.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection
Authors:
Hardy Hardy,
Miguel Ballesteros,
Faisal Ladhak,
Muhammad Khalifa,
Vittorio Castelli,
Kathleen McKeown
Abstract:
Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed data…
▽ More
Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed dataset towards negative instances for extractive summarization; we thus adopt a margin ranking loss for extraction to encourage separation between positive and negative examples. Our extraction component operates at the constituent level; our approach to this problem enriches the text with spinal tree information which provides syntactic context (in the form of constituents) to the extraction model. We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis
Authors:
Siddharth Varia,
Shuai Wang,
Kishaloy Halder,
Robert Vacareanu,
Miguel Ballesteros,
Yassine Benajiba,
Neha Anna John,
Rishita Anubhai,
Smaranda Muresan,
Dan Roth
Abstract:
Aspect-based Sentiment Analysis (ABSA) is a fine-grained sentiment analysis task which involves four elements from user-generated texts: aspect term, aspect category, opinion term, and sentiment polarity. Most computational approaches focus on some of the ABSA sub-tasks such as tuple (aspect term, sentiment polarity) or triplet (aspect term, opinion term, sentiment polarity) extraction using eithe…
▽ More
Aspect-based Sentiment Analysis (ABSA) is a fine-grained sentiment analysis task which involves four elements from user-generated texts: aspect term, aspect category, opinion term, and sentiment polarity. Most computational approaches focus on some of the ABSA sub-tasks such as tuple (aspect term, sentiment polarity) or triplet (aspect term, opinion term, sentiment polarity) extraction using either pipeline or joint modeling approaches. Recently, generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task. In this work, we take a step further and propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios. To this end, we fine-tune a T5 model with instructional prompts in a multi-task learning fashion covering all the sub-tasks, as well as the entire quadruple prediction task. In experiments with multiple benchmark datasets, we show that the proposed multi-task prompting approach brings performance boost (by absolute 8.29 F1) in the few-shot learning setting.
△ Less
Submitted 11 June, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents
Authors:
Muhammad Khalifa,
Yogarshi Vyas,
Shuai Wang,
Graham Horwood,
Sunil Mallya,
Miguel Ballesteros
Abstract:
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynam…
▽ More
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F$_1$ from the proposed pretraining step in both supervised and unsupervised zero-shot settings.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning
Authors:
Vishakh Padmakumar,
Leonard Lausen,
Miguel Ballesteros,
Sheng Zha,
He He,
George Karypis
Abstract:
Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task re…
▽ More
Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task representation learning. We find that, on average, increasing the scale of multi-task learning, in terms of the number of tasks, indeed results in better learned representations than smaller multi-task setups. However, if the target tasks are known ahead of time, then training on a smaller set of related tasks is competitive to the large-scale multi-task training at a reduced computational cost.
△ Less
Submitted 12 July, 2022; v1 submitted 23 April, 2022;
originally announced April 2022.
-
Label Semantics for Few Shot Named Entity Recognition
Authors:
Jie Ma,
Miguel Ballesteros,
Srikanth Doss,
Rishita Anubhai,
Sunil Mallya,
Yaser Al-Onaizan,
Dan Roth
Abstract:
We study the problem of few shot learning for named entity recognition. Specifically, we leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. We propose a neural architecture that consists of two BERT encoders, one to encode the document and its tokens and another one to encode each of the labels in natural language format…
▽ More
We study the problem of few shot learning for named entity recognition. Specifically, we leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. We propose a neural architecture that consists of two BERT encoders, one to encode the document and its tokens and another one to encode each of the labels in natural language format. Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder. The label semantics signal is shown to support improved state-of-the-art results in multiple few shot NER benchmarks and on-par performance in standard benchmarks. Our model is especially effective in low resource settings.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Reliable Multi-Object Tracking in the Presence of Unreliable Detections
Authors:
Travis Mandel,
Mark Jimenez,
Emily Risley,
Taishi Nammoto,
Rebekka Williams,
Max Panoff,
Meynard Ballesteros,
Bobbie Suarez
Abstract:
Recent multi-object tracking (MOT) systems have leveraged highly accurate object detectors; however, training such detectors requires large amounts of labeled data. Although such data is widely available for humans and vehicles, it is significantly more scarce for other animal species. We present Robust Confidence Tracking (RCT), an algorithm designed to maintain robust performance even when detec…
▽ More
Recent multi-object tracking (MOT) systems have leveraged highly accurate object detectors; however, training such detectors requires large amounts of labeled data. Although such data is widely available for humans and vehicles, it is significantly more scarce for other animal species. We present Robust Confidence Tracking (RCT), an algorithm designed to maintain robust performance even when detection quality is poor. In contrast to prior methods which discard detection confidence information, RCT takes a fundamentally different approach, relying on the exact detection confidence values to initialize tracks, extend tracks, and filter tracks. In particular, RCT is able to minimize identity switches by efficiently using low-confidence detections (along with a single object tracker) to keep continuous track of objects. To evaluate trackers in the presence of unreliable detections, we present a challenging real-world underwater fish tracking dataset, FISHTRAC. In an evaluation on FISHTRAC as well as the UA-DETRAC dataset, we find that RCT outperforms other algorithms when provided with imperfect detections, including state-of-the-art deep single and multi-object trackers as well as more classic approaches. Specifically, RCT has the best average HOTA across methods that successfully return results for all sequences, and has significantly less identity switches than other methods.
△ Less
Submitted 7 November, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
A Bag of Tricks for Dialogue Summarization
Authors:
Muhammad Khalifa,
Miguel Ballesteros,
Kathleen McKeown
Abstract:
Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization. In this work, we explore four different challenges of the task: handling and differentiating parts of the dialogue belonging to multiple speakers, negation understanding, reasoning about the situation, and informal language understanding. Using a pretrained sequence-to-sequence la…
▽ More
Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization. In this work, we explore four different challenges of the task: handling and differentiating parts of the dialogue belonging to multiple speakers, negation understanding, reasoning about the situation, and informal language understanding. Using a pretrained sequence-to-sequence language model, we explore speaker name substitution, negation scope highlighting, multi-task learning with relevant tasks, and pretraining on in-domain data. Our experiments show that our proposed techniques indeed improve summarization performance, outperforming strong baselines.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
How much pretraining data do language models need to learn syntax?
Authors:
Laura Pérez-Mayos,
Miguel Ballesteros,
Leo Wanner
Abstract:
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on i…
▽ More
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.
△ Less
Submitted 9 September, 2021; v1 submitted 7 September, 2021;
originally announced September 2021.
-
Sequential Cross-Document Coreference Resolution
Authors:
Emily Allaway,
Shuai Wang,
Miguel Ballesteros
Abstract:
Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings and achieves competitive results fo…
▽ More
Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings and achieves competitive results for both entity and event coreference while provides strong evidence of the efficacy of both sequential models and higher-order inference in cross-document settings. Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters, approximating a higher-order model. In addition, we conduct extensive ablation studies that provide new insights into the importance of various inputs and representation types in coreference.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations
Authors:
Laura Pérez-Mayos,
Roberto Carlini,
Miguel Ballesteros,
Leo Wanner
Abstract:
The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becom…
▽ More
The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure. Experimental results show that the encoded syntactic information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning process depending on the task.
△ Less
Submitted 10 February, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings
Authors:
Kailash Karthik Saravanakumar,
Miguel Ballesteros,
Muthu Kumar Chandrasekaran,
Kathleen McKeown
Abstract:
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a…
▽ More
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging
Authors:
Kasturi Bhattacharjee,
Miguel Ballesteros,
Rishita Anubhai,
Smaranda Muresan,
Jie Ma,
Faisal Ladhak,
Yaser Al-Onaizan
Abstract:
Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how t…
▽ More
Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas
Authors:
Yogarshi Vyas,
Miguel Ballesteros
Abstract:
In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of…
▽ More
In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of-the-art models for zero-shot linking. To improve the generalization of our model, we use two regularization schemes based on shuffling of entity attributes and handling of unseen attributes. Experiments on English datasets where models are trained on the CoNLL dataset, and tested on the TAC-KBP 2010 dataset show that our models outperform baseline models by over 12 points of accuracy. Unlike prior work, our approach also allows for seamlessly combining multiple training datasets. We test this ability by adding both a completely different dataset (Wikia), as well as increasing amount of training data from the TAC-KBP 2010 training set. Our models perform favorably across the board.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Transition-based Parsing with Stack-Transformers
Authors:
Ramon Fernandez Astudillo,
Miguel Ballesteros,
Tahira Naseem,
Austin Blodgett,
Radu Florian
Abstract:
Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores mod…
▽ More
Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores modifications of the sequence-to-sequence Transformer architecture to model either global or local parser states in transition-based parsing. We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Abstract Meaning Representation (AMR) parsing tasks, particularly for smaller models or limited training data.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models
Authors:
Ethan Wilcox,
Peng Qian,
Richard Futrell,
Ryosuke Kohita,
Roger Levy,
Miguel Ballesteros
Abstract:
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by develo**…
▽ More
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by develo** controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Resource-Enhanced Neural Model for Event Argument Extraction
Authors:
Jie Ma,
Shuai Wang,
Rishita Anubhai,
Miguel Ballesteros,
Yaser Al-Onaizan
Abstract:
Event argument extraction (EAE) aims to identify the arguments of an event and classify the roles that those arguments play. Despite great efforts made in prior work, there remain many challenges: (1) Data scarcity. (2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument. (3) Integrating event trigger information into candidate ar…
▽ More
Event argument extraction (EAE) aims to identify the arguments of an event and classify the roles that those arguments play. Despite great efforts made in prior work, there remain many challenges: (1) Data scarcity. (2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument. (3) Integrating event trigger information into candidate argument representation. For (1), we explore using unlabeled data in different ways. For (2), we propose to use a syntax-attending Transformer that can utilize dependency parses to guide the attention mechanism. For (3), we propose a trigger-aware sequence encoder with several types of trigger-dependent sequence representations. We also support argument extraction either from text annotated with gold entities or from plain text. Experiments on the English ACE2005 benchmark show that our approach achieves a new state-of-the-art.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Band edge limit of the scattering matrix for quasi-one-dimensional discrete Schrödinger operators
Authors:
Miguel Ballesteros,
Gerardo Franco Córdova,
Guillermo Garro,
Hermann Schulz-Baldes
Abstract:
This paper is about the scattering theory for one-dimensional matrix Schrödinger operators with a matrix potential having a finite first moment. The transmission coefficients are analytically continued and extended to the band edges. An explicit expression is given for these extensions. The limits of the reflection coefficients at the band edges is also calculated.
This paper is about the scattering theory for one-dimensional matrix Schrödinger operators with a matrix potential having a finite first moment. The transmission coefficients are analytically continued and extended to the band edges. An explicit expression is given for these extensions. The limits of the reflection coefficients at the band edges is also calculated.
△ Less
Submitted 28 March, 2022; v1 submitted 5 August, 2020;
originally announced August 2020.
-
The appearance of particle tracks in detectors
Authors:
Miguel Ballesteros,
Tristan Benoist,
Martin Fraas,
Jürg Fröhlich
Abstract:
The phenomenon that a quantum particle propagating in a detector, such as a Wilson cloud chamber, leaves a track close to a classical trajectory is analyzed. We introduce an idealized quantum-mechanical model of a charged particle that is periodically illuminated by pulses of laser light resulting in repeated indirect measurements of the approximate position of the particle. For this model we pres…
▽ More
The phenomenon that a quantum particle propagating in a detector, such as a Wilson cloud chamber, leaves a track close to a classical trajectory is analyzed. We introduce an idealized quantum-mechanical model of a charged particle that is periodically illuminated by pulses of laser light resulting in repeated indirect measurements of the approximate position of the particle. For this model we present a mathematically rigorous analysis of the appearance of particle tracks, assuming that the Hamiltonian of the particle is quadratic in the position- and momentum operators, as for a freely moving particle or a harmonic oscillator.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Analyticity properties of the scattering matrix for matrix Schrödinger operators on the discrete line
Authors:
Miguel Ballesteros,
Gerardo Franco Córdova,
Hermann Schulz-Baldes
Abstract:
Explicit formulas for the analytic extensions of the scattering matrix and the time delay of a quasi-one-dimensional discrete Schrödinger operator with a potential of finite support are derived. This includes a careful analysis of the band edge singularities and allows to prove a Levinson-type theorem. The main algebraic tool are the plane wave transfer matrices.
Explicit formulas for the analytic extensions of the scattering matrix and the time delay of a quasi-one-dimensional discrete Schrödinger operator with a potential of finite support are derived. This includes a careful analysis of the band edge singularities and allows to prove a Levinson-type theorem. The main algebraic tool are the plane wave transfer matrices.
△ Less
Submitted 22 January, 2021; v1 submitted 27 April, 2020;
originally announced April 2020.
-
Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events
Authors:
Miguel Ballesteros,
Rishita Anubhai,
Shuai Wang,
Nima Pourdamghani,
Yogarshi Vyas,
Jie Ma,
Parminder Bhatia,
Kathleen McKeown,
Yaser Al-Onaizan
Abstract:
In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrain…
▽ More
In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations (i.e. RoBERTa, BERT or ELMo), transfer and multi-task learning (by leveraging complementary datasets), and self-training techniques. Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Transition-Based Dependency Parsing using Perceptron Learner
Authors:
Rahul Radhakrishnan Iyer,
Miguel Ballesteros,
Chris Dyer,
Robert Frederking
Abstract:
Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learn…
▽ More
Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learner, outperforms a baseline arc-standard parser. We beat the UAS of the MALT and LSTM parsers. We also give possible ways to address parsing of non-projective trees.
△ Less
Submitted 28 January, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
One-Boson Scattering Processes in the Massless Spin-Boson Model -- A Non-Perturbative Formula
Authors:
Miguel Ballesteros,
Dirk-André Deckert,
Felix Hänle
Abstract:
In scattering experiments, physicists observe so-called resonances as peaks at certain energy values in the measured scattering cross sections per solid angle. These peaks are usually associate with certain scattering processes, e.g., emission, absorption, or excitation of certain particles and systems. On the other hand, mathematicians define resonances as poles of an analytic continuation of the…
▽ More
In scattering experiments, physicists observe so-called resonances as peaks at certain energy values in the measured scattering cross sections per solid angle. These peaks are usually associate with certain scattering processes, e.g., emission, absorption, or excitation of certain particles and systems. On the other hand, mathematicians define resonances as poles of an analytic continuation of the resolvent operator through complex dilations. A major challenge is to relate these scattering and resonance theoretical notions, e.g., to prove that the poles of the resolvent operator induce the above mentioned peaks in the scattering matrix. In the case of quantum mechanics, this problem was addressed in numerous works that culminated in Simon's seminal paper [33] in which a general solution was presented for a large class of pair potentials. However, in quantum field theory the analogous problem has been open for several decades despite the fact that scattering and resonance theories have been well-developed for many models. In certain regimes these models describe very fundamental phenomena, such as emission and absorption of photons by atoms, from which quantum mechanics originated. In this work we present a first non-perturbative formula that relates the scattering matrix to the resolvent operator in the massless Spin-Boson model. This result can be seen as a major progress compared to our previous works [13] and [12] in which we only managed to derive a perturbative formula.
△ Less
Submitted 15 May, 2020; v1 submitted 5 July, 2019;
originally announced July 2019.
-
Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning
Authors:
Tahira Naseem,
Abhishek Shah,
Hui Wan,
Radu Florian,
Salim Roukos,
Miguel Ballesteros
Abstract:
Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs. In addition, we also combined several AMR-to-text alignments with an attention mechanism and we supplemented the parser with pre-processed concept identification, named entities and contextualized emb…
▽ More
Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs. In addition, we also combined several AMR-to-text alignments with an attention mechanism and we supplemented the parser with pre-processed concept identification, named entities and contextualized embeddings. We achieve a highly competitive performance that is comparable to the best published results. We show an in-depth study ablating each of the new components of the parser
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State
Authors:
Richard Futrell,
Ethan Wilcox,
Takashi Morita,
Peng Qian,
Miguel Ballesteros,
Roger Levy
Abstract:
We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of Engl…
▽ More
We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of English (Jozefowicz et al., 2016; Gulordava et al., 2018) trained on large datasets; an RNNG (Dyer et al., 2016) trained on a small, parsed dataset; and an LSTM trained on the same small corpus as the RNNG. We find evidence that the LSTMs trained on large datasets represent syntactic state over large spans of text in a way that is comparable to the RNNG, while the LSTM trained on the small dataset does not or does so only weakly.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Structural Supervision Improves Learning of Non-Local Grammatical Dependencies
Authors:
Ethan Wilcox,
Peng Qian,
Richard Futrell,
Miguel Ballesteros,
Roger Levy
Abstract:
State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject…
▽ More
State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. Using controlled experimental methods from psycholinguistics, we compare the performance of word-based LSTM models versus two models that represent hierarchical structure and deploy it in left-to-right processing: Recurrent Neural Network Grammars (RNNGs) (Dyer et al., 2016) and a incrementalized version of the Parsing-as-Language-Modeling configuration from Chariak et al., (2016). Models are tested on a diverse range of configurations for two classes of non-local grammatical dependencies in English---Negative Polarity licensing and Filler--Gap Dependencies. Using the same training data across models, we find that structurally-supervised models outperform the LSTM, with the RNNG demonstrating best results on both types of grammatical dependencies and even learning many of the Island Constraints on the filler--gap dependency. Structural supervision thus provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical dependencies.
△ Less
Submitted 6 April, 2019; v1 submitted 3 March, 2019;
originally announced March 2019.
-
Recursive Subtree Composition in LSTM-Based Dependency Parsing
Authors:
Miryam de Lhoneux,
Miguel Ballesteros,
Joakim Nivre
Abstract:
The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs c…
▽ More
The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs capture information about subtrees. We perform model ablations to tease out the conditions under which composition helps. When ablating the backward LSTM, performance drops and composition does not recover much of the gap. When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information. We take the backward LSTM to be related to lookahead features and the forward LSTM to the rich history-based features both crucial for transition-based parsers. To capture history-based information, composition is better than a forward LSTM on its own, but it is even better to have a forward LSTM as part of a BiLSTM. We correlate results with language properties, showing that the improved lookahead of a backward LSTM is especially important for head-final languages.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Conditionally Free Reduced Products of Hilbert Spaces
Authors:
Octavio Arizmendi,
Miguel Ballesteros,
Francisco Torres-Ayala
Abstract:
We present a product of pairs of pointed Hilbert spaces that, in the context of Bozėjko, Leinert and Speicher's theory of conditionally free probability, plays the role of the reduced free product of pointed Hilbert spaces, and thus gives a unified construction for the natural notions of independence defined by Muraki.
We additionally provide important applications of this construction. We prove…
▽ More
We present a product of pairs of pointed Hilbert spaces that, in the context of Bozėjko, Leinert and Speicher's theory of conditionally free probability, plays the role of the reduced free product of pointed Hilbert spaces, and thus gives a unified construction for the natural notions of independence defined by Muraki.
We additionally provide important applications of this construction. We prove that, assuming minor restrictions, for any pair of conditionally free algebras there are copies of them that are conditionally free and also free, a property that is frequently assumed (as hypothesis) to prove several results in the literature. Finally, we give a short proof of the linearization property of the $^cR$-transform (the analog of Voiculescu's $R$-transform in the context of conditionally free probability).
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
One-Boson Scattering Processes in the massive Spin-Boson Model
Authors:
Miguel Ballesteros,
Dirk-André Deckert,
Jérémy Faupin,
Felix Hänle
Abstract:
The Spin-Boson model describes a two-level quantum system that interacts with a second-quantized boson scalar field. Recently the relation between the integral kernel of the scattering matrix and the resonance in this model has been established in [14] for the case of massless bosons. In the present work, we treat the massive case. On the one hand, one might rightfully expect that the massive case…
▽ More
The Spin-Boson model describes a two-level quantum system that interacts with a second-quantized boson scalar field. Recently the relation between the integral kernel of the scattering matrix and the resonance in this model has been established in [14] for the case of massless bosons. In the present work, we treat the massive case. On the one hand, one might rightfully expect that the massive case is easier to handle since, in contrast to the massless case, the corresponding Hamiltonian features a spectral gap. On the other hand, it turns out that the non-zero boson mass introduces a new complication as the spectrum of the complex dilated, free Hamiltonian exhibits lines of spectrum attached to every multiple of the boson rest mass energy starting from the ground and excited state energies. This leads to an absence of decay of the corresponding complex dilated resolvent close to the real line, which, in [14], was a crucial ingredient to control the time evolution in the scattering regime. With the new strategy presented here, we provide a proof of an analogous formula for the scattering kernel as compared to the massless case and use the opportunity to provide the required spectral information by a Mourre theory argument combined with a suitable application of the Feshbach-Schur map instead of complex dilation.
△ Less
Submitted 10 May, 2019; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Multilingual Neural Machine Translation with Task-Specific Attention
Authors:
Graeme Blackwood,
Miguel Ballesteros,
Todd Ward
Abstract:
Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for…
▽ More
Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task. Our experiments on four languages of the Europarl corpus show that using a target-specific model of attention provides consistent gains in translation quality for all possible translation directions, compared to a model in which all parameters are shared. We observe improved translation quality even in the (extreme) low-resource zero-shot translation directions for which the model never saw explicitly paired parallel data.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Scheduled Multi-Task Learning: From Syntax to Translation
Authors:
Eliyahu Kiperwasser,
Miguel Ballesteros
Abstract:
Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in ter…
▽ More
Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in terms of BLEU score on relatively large parallel corpus (WMT14 English to German) and a low-resource (WIT German to English) setup.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
Pieces of Eight: 8-bit Neural Machine Translation
Authors:
Jerry Quinn,
Miguel Ballesteros
Abstract:
Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with…
▽ More
Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with 8-bit quantization for models that have been trained using 32-bit floating point values. Results show that 8-bit translation makes a non-negligible impact in terms of speed with no degradation in accuracy and adequacy.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Multimodal Emoji Prediction
Authors:
Francesco Barbieri,
Miguel Ballesteros,
Francesco Ronzano,
Horacio Saggion
Abstract:
Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram…
▽ More
Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.
△ Less
Submitted 17 April, 2018; v1 submitted 6 March, 2018;
originally announced March 2018.
-
Relation between the Resonance and the Scattering Matrix in the massless Spin-Boson Model
Authors:
Miguel Ballesteros,
Dirk-André Deckert,
Felix Hänle
Abstract:
We establish the precise relation between the integral kernel of the scattering matrix and the resonance in the massless Spin-Boson model which describes the interaction of a two-level quantum system with a second-quantized scalar field. For this purpose, we derive an explicit formula for the two-body scattering matrix. We impose an ultraviolet cut-off and assume a slightly less singular behavior…
▽ More
We establish the precise relation between the integral kernel of the scattering matrix and the resonance in the massless Spin-Boson model which describes the interaction of a two-level quantum system with a second-quantized scalar field. For this purpose, we derive an explicit formula for the two-body scattering matrix. We impose an ultraviolet cut-off and assume a slightly less singular behavior of the boson form factor of the relativistic scalar field but no infrared cut-off. The purpose of this work is to bring together scattering and resonance theory and arrive at a similar result as provided by Simon in [38], where it was shown that the singularities of the meromorphic continuation of the integral kernel of the scattering matrix are located precisely at the resonance energies. The corresponding problem has been open in quantum field theory ever since. To the best of our knowledge, the presented formula provides the first rigorous connection between resonance and scattering theory in the sense of [38] in a model of quantum field theory.
△ Less
Submitted 23 February, 2019; v1 submitted 15 January, 2018;
originally announced January 2018.
-
Analyticity of Resonances and Eigenvalues and Spectral Properties of the massless Spin-Boson Model
Authors:
Miguel Ballesteros,
Dirk-André Deckert,
Felix Hänle
Abstract:
We extend the method of multiscale analysis for resonances introduced in [5] in order to infer analytic properties of resonances and eigenvalues (and their eigenprojections) as well as estimates for the localization of the spectrum of dilated Hamiltonians and norm-bounds for the corresponding resolvent operators, in neighborhoods of resonances and eigenvalues. We apply our method to the massless S…
▽ More
We extend the method of multiscale analysis for resonances introduced in [5] in order to infer analytic properties of resonances and eigenvalues (and their eigenprojections) as well as estimates for the localization of the spectrum of dilated Hamiltonians and norm-bounds for the corresponding resolvent operators, in neighborhoods of resonances and eigenvalues. We apply our method to the massless Spin-Boson model assuming a slight infrared regularization. We prove that the resonance and the ground-state eigenvalue (and their eigenprojections) are analytic with respect to the dilation parameter and the coupling constant. Moreover, we prove that the spectrum of the dilated Spin-Boson Hamiltonian in the neighborhood of the resonance and the ground-state eigenvalue is localized in two cones in the complex plane with vertices at the location of the resonance and the ground-state eigenvalue, respectively. Additionally, we provide norm-estimates for the resolvent of the dilated Spin-Boson Hamiltonian near the resonance and the ground-state eigenvalue. The topic of analyticity of eigenvalues and resonances has let to several studies and advances in the past. However, to the best of our knowledge, this is the first time that it is addressed from the perspective of multiscale analysis. Once the multiscale analysis is set up our method gives easy access to analyticity: Essentially, it amounts to proving it for isolated eigenvalues only and use that uniform limits of analytic functions are analytic. The type of spectral and resolvent estimates that we prove are needed to control the time evolution including the scattering regime. The latter will be demonstrated in a forthcoming publication. The introduced multiscale method to study spectral and resolvent estimates follows its own inductive scheme and is independent (and different) from the method we apply to construct resonances.
△ Less
Submitted 17 February, 2019; v1 submitted 11 January, 2018;
originally announced January 2018.
-
Perturbation Theory for Weak Measurements in Quantum Mechanics, I -- Systems with Finite-Dimensional State Space
Authors:
M. Ballesteros,
N. Crawford,
M. Fraas,
J. Fröhlich,
B. Schubnel
Abstract:
The quantum theory of indirect measurements in physical systems is studied. The example of an indirect measurement of an observable represented by a self-adjoint operator $\mathcal{N}$ with finite spectrum is analysed in detail. The Hamiltonian generating the time evolution of the system in the absence of direct measurements is assumed to be given by the sum of a term commuting with $\mathcal{N}$…
▽ More
The quantum theory of indirect measurements in physical systems is studied. The example of an indirect measurement of an observable represented by a self-adjoint operator $\mathcal{N}$ with finite spectrum is analysed in detail. The Hamiltonian generating the time evolution of the system in the absence of direct measurements is assumed to be given by the sum of a term commuting with $\mathcal{N}$ and a small perturbation not commuting with $\mathcal{N}$. The system is subject to repeated direct (projective) measurements using a single instrument whose action on the state of the system commutes with $\mathcal{N}$. If the Hamiltonian commutes with the observable $\mathcal{N}$ (i.e., if the perturbation vanishes) the state of the system approaches an eigenstate of $\mathcal{N}$, as the number of direct measurements tends to $\infty$. If the perturbation term in the Hamiltonian does \textit{not} commute with $\mathcal{N}$ the system exhibits "jumps" between different eigenstates of $\mathcal{N}$. We determine the rate of these jumps to leading order in the strength of the perturbation and show that if time is re-scaled appropriately a maximum likelihood estimate of $\mathcal{N}$ approaches a Markovian jump process on the spectrum of $\mathcal{N}$, as the strength of the perturbation tends to $0$.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
Arc-Standard Spinal Parsing with Stack-LSTMs
Authors:
Miguel Ballesteros,
Xavier Carreras
Abstract:
We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce u…
▽ More
We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce useful states by themselves.
△ Less
Submitted 1 September, 2017;
originally announced September 2017.
-
AMR Parsing using Stack-LSTMs
Authors:
Miguel Ballesteros,
Yaser Al-Onaizan
Abstract:
We present a transition-based AMR parser that directly generates AMR parses from plain text. We use Stack-LSTMs to represent our parser state and make decisions greedily. In our experiments, we show that our parser achieves very competitive scores on English using only AMR training data. Adding additional information, such as POS tags and dependency trees, improves the results further.
We present a transition-based AMR parser that directly generates AMR parses from plain text. We use Stack-LSTMs to represent our parser state and make decisions greedily. In our experiments, we show that our parser achieves very competitive scores on English using only AMR training data. Adding additional information, such as POS tags and dependency trees, improves the results further.
△ Less
Submitted 2 August, 2017; v1 submitted 24 July, 2017;
originally announced July 2017.
-
Non-demolition measurements of observables with general spectra
Authors:
M. Ballesteros,
N. Crawford,
M. Fraas,
J. Fröhlich,
B. Schubnel
Abstract:
It has recently been established that, in a non-demolition measurement of an observable $\mathcal{N}$ with a finite point spectrum, the density matrix of the system approaches an eigenstate of $\mathcal{N}$, i.e., it "purifies" over the spectrum of $\mathcal{N}$. We extend this result to observables with general spectra. It is shown that the spectral density of the state of the system converges to…
▽ More
It has recently been established that, in a non-demolition measurement of an observable $\mathcal{N}$ with a finite point spectrum, the density matrix of the system approaches an eigenstate of $\mathcal{N}$, i.e., it "purifies" over the spectrum of $\mathcal{N}$. We extend this result to observables with general spectra. It is shown that the spectral density of the state of the system converges to a delta function exponentially fast, in an appropriate sense. Furthermore, for observables with absolutely continuous spectra, we show that the spectral density approaches a Gaussian distribution over the spectrum of $\mathcal{N}$. Our methods highlight the connection between the theory of non-demolition measurements and classical estimation theory.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Are Emojis Predictable?
Authors:
Francesco Barbieri,
Miguel Ballesteros,
Horacio Saggion
Abstract:
Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are e…
▽ More
Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis.
△ Less
Submitted 24 February, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
DyNet: The Dynamic Neural Network Toolkit
Authors:
Graham Neubig,
Chris Dyer,
Yoav Goldberg,
Austin Matthews,
Waleed Ammar,
Antonios Anastasopoulos,
Miguel Ballesteros,
David Chiang,
Daniel Clothiaux,
Trevor Cohn,
Kevin Duh,
Manaal Faruqui,
Cynthia Gan,
Dan Garrette,
Yangfeng Ji,
Lingpeng Kong,
Adhiguna Kuncoro,
Gaurav Kumar,
Chaitanya Malaviya,
Paul Michel,
Yusuke Oda,
Matthew Richardson,
Naomi Saphra,
Swabha Swayamdipta,
Pengcheng Yin
Abstract:
We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva…
▽ More
We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet's dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet's speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at http://github.com/clab/dynet.
△ Less
Submitted 14 January, 2017;
originally announced January 2017.