Search | arXiv e-print repository

General Purpose Verification for Chain of Thought Prompting

Authors: Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel Ballesteros

Abstract: Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should… ▽ More Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards high-quality solutions, we use the perplexity of the reasoning steps as an additional verifier. We evaluate our method on 4 distinct types of reasoning tasks, spanning a total of 9 different datasets. Experiments show that our method is always better than vanilla generation, and, in 6 out of the 9 datasets, it is better than best-of N sampling which samples N reasoning chains and picks the lowest perplexity generation. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 22 pages, preprint

arXiv:2402.18479 [pdf, other]

NewsQs: Multi-Source Question Generation for the Inquiring Mind

Authors: Alyssa Hwang, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba, Vittorio Castelli, Markus Dreyer, Mohit Bansal, Kathleen McKeown

Abstract: We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judg… ▽ More We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization. △ Less

Submitted 15 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: minor wording change

arXiv:2311.18112 [pdf, ps, other]

Detection of an Arbitrary Number of Communities in a Block Spin Ising Model

Authors: Miguel Ballesteros, Ramsés H. Mena, José Luis Pérez, Gabor Toth

Abstract: We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical s… ▽ More We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical size and identical interactions. Our results can be applied to the most realistic situations, in which there are many groups of different sizes and different interactions. In addition, we give an explicit algorithm that permits the reconstruction of the structure of the model from a sample of observations based on the comparison of empirical correlations of the spin variables, thus unveiling easy applications of the model to real-world voting data and communities in biology. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 31 pages

MSC Class: 62H22; 82B20; 60F05

arXiv:2305.17127 [pdf, other]

Characterizing and Measuring Linguistic Dataset Drift

Authors: Tyler A. Chang, Kishaloy Halder, Neha Anna John, Yogarshi Vyas, Yassine Benajiba, Miguel Ballesteros, Dan Roth

Abstract: NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often… ▽ More NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often used in practice. In this paper, we propose three dimensions of linguistic dataset drift: vocabulary, structural, and semantic drift. These dimensions correspond to content word frequency divergences, syntactic divergences, and meaning changes not captured by word frequencies (e.g. lexical semantic change). We propose interpretable metrics for all three drift dimensions, and we modify past performance prediction methods to predict model performance at both the example and dataset level for English sentiment classification and natural language inference. We find that our drift metrics are more effective than previous metrics at predicting out-of-domain model accuracies (mean 16.8% root mean square error decrease), particularly when compared to popular fine-tuned embedding distances (mean 47.7% error decrease). Fine-tuned embedding distances are much more effective at ranking individual examples by expected performance, but decomposing into vocabulary, structural, and semantic drift produces the best example rankings of all considered model-agnostic drift metrics (mean 6.7% ROC AUC increase). △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.13191 [pdf, other]

Taxonomy Expansion for Named Entity Recognition

Authors: Karthikeyan K, Yogarshi Vyas, Jie Ma, Giovanni Paolini, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, Miguel Ballesteros

Abstract: Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To re… ▽ More Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To remedy this, we propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets. We experiment with 6 diverse datasets and show that PLM consistently performs better than most other approaches (0.5 - 2.5 F1), including in novel settings for taxonomy expansion not considered in prior work. The gap between PLM and all other approaches is especially large in settings where there is limited data available for the additional entity types (as much as 11 F1), thus suggesting a more cost effective approaches to taxonomy expansion. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11979 [pdf, other]

A Weak Supervision Approach for Few-Shot Aspect Based Sentiment

Authors: Robert Vacareanu, Siddharth Varia, Kishaloy Halder, Shuai Wang, Giovanni Paolini, Neha Anna John, Miguel Ballesteros, Smaranda Muresan

Abstract: We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning.… ▽ More We explore how weak supervision on abundant unlabeled data can be leveraged to improve few-shot performance in aspect-based sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pre-trained sequence-to-sequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after fine-tuning. Our proposed method preserves the full fine-tuning performance while showing significant improvements (15.84% absolute F1) in the few-shot learning scenario for the harder tasks. In zero-shot (i.e., without fine-tuning), our method outperforms the previous state of the art on the aspect extraction sentiment classification (AESC) task and is, additionally, capable of performing the harder aspect sentiment triplet extraction (ASTE) task. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.11242 [pdf, other]

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Authors: Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fu**uma, Miguel Ballesteros, Vittorio Castelli, Dan Roth

Abstract: Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases com… ▽ More Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across languages and how the biases are affected when training a model on multilingual data versus monolingual data. We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task to observe whether specific demographics are viewed more positively. We study bias similarities and differences across these languages and investigate the impact of multilingual vs. monolingual training data. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture (e.g. majority religions and nationalities). Additionally, we find an increased variation in predictions across protected groups, indicating bias amplification, after multilingual finetuning in comparison to multilingual pretraining. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2303.11660 [pdf, other]

Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization

Authors: Ming Shen, Jie Ma, Shuai Wang, Yogarshi Vyas, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba

Abstract: Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets… ▽ More Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: EACL 2023 Findings

arXiv:2302.12297 [pdf, other]

Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views

Authors: Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine Benajiba, Miguel Ballesteros

Abstract: Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is c… ▽ More Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is crucial that widely used language models remain up-to-date with the ever-evolving factual updates of the real world. Specifically, we provide a holistic framework that (1) dynamically creates temporal test sets of any time granularity (e.g. month, quarter, year) of factual data from Wikidata, (2) constructs fine-grained splits of tests (e.g. updated, new, unchanged facts) to ensure comprehensive analysis, and (3) evaluates MLMs in three distinct ways (single-token probing, multi-token generation, MLM scoring). In contrast to prior work, our framework aims to unveil how robust an MLM is over time and thus to provide a signal in case it has become outdated, by leveraging multiple views of evaluation. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: To appear at EACL 2023. Our code will be available at https://github.com/amazon-science/temporal-robustness

arXiv:2211.05021 [pdf, ps, other]

Levinson theorem for discrete Schrödinger operators on the line with matrix potentials having a first moment

Authors: Miguel Ballesteros, Gerardo Franco Córdova, Ivan Naumkin, Hermann Schulz-Baldes

Abstract: This paper proves new results on spectral and scattering theory for matrix-valued Schrödinger operators on the discrete line with non-compactly supported perturbations whose first moments are assumed to exist. In particular, a Levinson theorem is proved, in which a relation between scattering data and spectral properties (bound and half bound states) of the corresponding Hamiltonians is derived. T… ▽ More This paper proves new results on spectral and scattering theory for matrix-valued Schrödinger operators on the discrete line with non-compactly supported perturbations whose first moments are assumed to exist. In particular, a Levinson theorem is proved, in which a relation between scattering data and spectral properties (bound and half bound states) of the corresponding Hamiltonians is derived. The proof is based on stationary scattering theory with prominent use of Jost solutions at complex energies that are controlled by Volterra-type integral equations. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.04903 [pdf, other]

Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Authors: Hardy Hardy, Miguel Ballesteros, Faisal Ladhak, Muhammad Khalifa, Vittorio Castelli, Kathleen McKeown

Abstract: Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed data… ▽ More Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed dataset towards negative instances for extractive summarization; we thus adopt a margin ranking loss for extraction to encourage separation between positive and negative examples. Our extraction component operates at the constituent level; our approach to this problem enriches the text with spinal tree information which provides syntactic context (in the form of constituents) to the extraction model. We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.06629 [pdf, other]

Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis

Authors: Siddharth Varia, Shuai Wang, Kishaloy Halder, Robert Vacareanu, Miguel Ballesteros, Yassine Benajiba, Neha Anna John, Rishita Anubhai, Smaranda Muresan, Dan Roth

Abstract: Aspect-based Sentiment Analysis (ABSA) is a fine-grained sentiment analysis task which involves four elements from user-generated texts: aspect term, aspect category, opinion term, and sentiment polarity. Most computational approaches focus on some of the ABSA sub-tasks such as tuple (aspect term, sentiment polarity) or triplet (aspect term, opinion term, sentiment polarity) extraction using eithe… ▽ More Aspect-based Sentiment Analysis (ABSA) is a fine-grained sentiment analysis task which involves four elements from user-generated texts: aspect term, aspect category, opinion term, and sentiment polarity. Most computational approaches focus on some of the ABSA sub-tasks such as tuple (aspect term, sentiment polarity) or triplet (aspect term, opinion term, sentiment polarity) extraction using either pipeline or joint modeling approaches. Recently, generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task. In this work, we take a step further and propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios. To this end, we fine-tune a T5 model with instructional prompts in a multi-task learning fashion covering all the sub-tasks, as well as the entire quadruple prediction task. In experiments with multiple benchmark datasets, we show that the proposed multi-task prompting approach brings performance boost (by absolute 8.29 F1) in the few-shot learning setting. △ Less

Submitted 11 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Camera ready copy for WASSA at ACL 2023

arXiv:2210.05613 [pdf, other]

Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

Authors: Muhammad Khalifa, Yogarshi Vyas, Shuai Wang, Graham Horwood, Sunil Mallya, Miguel Ballesteros

Abstract: We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynam… ▽ More We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F$_1$ from the proposed pretraining step in both supervised and unsupervised zero-shot settings. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2204.11117 [pdf, other]

Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning

Authors: Vishakh Padmakumar, Leonard Lausen, Miguel Ballesteros, Sheng Zha, He He, George Karypis

Abstract: Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task re… ▽ More Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task representation learning. We find that, on average, increasing the scale of multi-task learning, in terms of the number of tasks, indeed results in better learned representations than smaller multi-task setups. However, if the target tasks are known ahead of time, then training on a smaller set of related tasks is competitive to the large-scale multi-task training at a reduced computational cost. △ Less

Submitted 12 July, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

Comments: NAACL 2022 - Camera ready version

arXiv:2203.08985 [pdf, other]

Label Semantics for Few Shot Named Entity Recognition

Authors: Jie Ma, Miguel Ballesteros, Srikanth Doss, Rishita Anubhai, Sunil Mallya, Yaser Al-Onaizan, Dan Roth

Abstract: We study the problem of few shot learning for named entity recognition. Specifically, we leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. We propose a neural architecture that consists of two BERT encoders, one to encode the document and its tokens and another one to encode each of the labels in natural language format… ▽ More We study the problem of few shot learning for named entity recognition. Specifically, we leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors. We propose a neural architecture that consists of two BERT encoders, one to encode the document and its tokens and another one to encode each of the labels in natural language format. Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder. The label semantics signal is shown to support improved state-of-the-art results in multiple few shot NER benchmarks and on-par performance in standard benchmarks. Our model is especially effective in low resource settings. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2112.08345 [pdf, other]

Reliable Multi-Object Tracking in the Presence of Unreliable Detections

Authors: Travis Mandel, Mark Jimenez, Emily Risley, Taishi Nammoto, Rebekka Williams, Max Panoff, Meynard Ballesteros, Bobbie Suarez

Abstract: Recent multi-object tracking (MOT) systems have leveraged highly accurate object detectors; however, training such detectors requires large amounts of labeled data. Although such data is widely available for humans and vehicles, it is significantly more scarce for other animal species. We present Robust Confidence Tracking (RCT), an algorithm designed to maintain robust performance even when detec… ▽ More Recent multi-object tracking (MOT) systems have leveraged highly accurate object detectors; however, training such detectors requires large amounts of labeled data. Although such data is widely available for humans and vehicles, it is significantly more scarce for other animal species. We present Robust Confidence Tracking (RCT), an algorithm designed to maintain robust performance even when detection quality is poor. In contrast to prior methods which discard detection confidence information, RCT takes a fundamentally different approach, relying on the exact detection confidence values to initialize tracks, extend tracks, and filter tracks. In particular, RCT is able to minimize identity switches by efficiently using low-confidence detections (along with a single object tracker) to keep continuous track of objects. To evaluate trackers in the presence of unreliable detections, we present a challenging real-world underwater fish tracking dataset, FISHTRAC. In an evaluation on FISHTRAC as well as the UA-DETRAC dataset, we find that RCT outperforms other algorithms when provided with imperfect detections, including state-of-the-art deep single and multi-object trackers as well as more classic approaches. Specifically, RCT has the best average HOTA across methods that successfully return results for all sequences, and has significantly less identity switches than other methods. △ Less

Submitted 7 November, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: The full journal version of this article (published in Pattern Recognition, Vol. 135) can be found at https://www.sciencedirect.com/science/article/pii/S0031320322005878. The article is open access. The source code and dataset can be found at https://github.com/tmandel/fish-detrac

arXiv:2109.08232 [pdf, other]

A Bag of Tricks for Dialogue Summarization

Authors: Muhammad Khalifa, Miguel Ballesteros, Kathleen McKeown

Abstract: Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization. In this work, we explore four different challenges of the task: handling and differentiating parts of the dialogue belonging to multiple speakers, negation understanding, reasoning about the situation, and informal language understanding. Using a pretrained sequence-to-sequence la… ▽ More Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization. In this work, we explore four different challenges of the task: handling and differentiating parts of the dialogue belonging to multiple speakers, negation understanding, reasoning about the situation, and informal language understanding. Using a pretrained sequence-to-sequence language model, we explore speaker name substitution, negation scope highlighting, multi-task learning with relevant tasks, and pretraining on in-domain data. Our experiments show that our proposed techniques indeed improve summarization performance, outperforming strong baselines. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: EMNLP 2021 - short paper

arXiv:2109.03160 [pdf, other]

How much pretraining data do language models need to learn syntax?

Authors: Laura Pérez-Mayos, Miguel Ballesteros, Leo Wanner

Abstract: Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on i… ▽ More Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost. △ Less

Submitted 9 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: To be published in proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)

arXiv:2104.08413 [pdf, other]

Sequential Cross-Document Coreference Resolution

Authors: Emily Allaway, Shuai Wang, Miguel Ballesteros

Abstract: Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings and achieves competitive results fo… ▽ More Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings and achieves competitive results for both entity and event coreference while provides strong evidence of the efficacy of both sequential models and higher-order inference in cross-document settings. Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters, approximating a higher-order model. In addition, we conduct extensive ablation studies that provide new insights into the importance of various inputs and representation types in coreference. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2101.11492 [pdf, other]

On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

Authors: Laura Pérez-Mayos, Roberto Carlini, Miguel Ballesteros, Leo Wanner

Abstract: The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becom… ▽ More The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure. Experimental results show that the encoded syntactic information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning process depending on the task. △ Less

Submitted 10 February, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2101.11059 [pdf, other]

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Authors: Kailash Karthik Saravanakumar, Miguel Ballesteros, Muthu Kumar Chandrasekaran, Kathleen McKeown

Abstract: We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a… ▽ More We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: To appear in Proceedings of The 16th Conference of the European Chapter of the Association for Computational Linguistics

ACM Class: I.2.7

arXiv:2010.14042 [pdf, other]

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Authors: Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai, Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Abstract: Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how t… ▽ More Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: Accepted in the Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)(https://2020.emnlp.org/papers/main)

arXiv:2010.11333 [pdf, other]

Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas

Authors: Yogarshi Vyas, Miguel Ballesteros

Abstract: In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of… ▽ More In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of-the-art models for zero-shot linking. To improve the generalization of our model, we use two regularization schemes based on shuffling of entity attributes and handling of unseen attributes. Experiments on English datasets where models are trained on the CoNLL dataset, and tested on the TAC-KBP 2010 dataset show that our models outperform baseline models by over 12 points of accuracy. Unlike prior work, our approach also allows for seamlessly combining multiple training datasets. We test this ability by adding both a completely different dataset (Wikia), as well as increasing amount of training data from the TAC-KBP 2010 training set. Our models perform favorably across the board. △ Less

Submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.10669 [pdf, other]

Transition-based Parsing with Stack-Transformers

Authors: Ramon Fernandez Astudillo, Miguel Ballesteros, Tahira Naseem, Austin Blodgett, Radu Florian

Abstract: Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores mod… ▽ More Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores modifications of the sequence-to-sequence Transformer architecture to model either global or local parser states in transition-based parsing. We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Abstract Meaning Representation (AMR) parsing tasks, particularly for smaller models or limited training data. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted to Findings of EMNLP2020, open review https://openreview.net/forum?id=b36spsuUAde, code https://github.com/IBM/transition-amr-parser

arXiv:2010.05725 [pdf, other]

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Authors: Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy, Miguel Ballesteros

Abstract: Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by develo**… ▽ More Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by develo** controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: To appear at EMNLP 2020

arXiv:2010.03022 [pdf, other]

Resource-Enhanced Neural Model for Event Argument Extraction

Authors: Jie Ma, Shuai Wang, Rishita Anubhai, Miguel Ballesteros, Yaser Al-Onaizan

Abstract: Event argument extraction (EAE) aims to identify the arguments of an event and classify the roles that those arguments play. Despite great efforts made in prior work, there remain many challenges: (1) Data scarcity. (2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument. (3) Integrating event trigger information into candidate ar… ▽ More Event argument extraction (EAE) aims to identify the arguments of an event and classify the roles that those arguments play. Despite great efforts made in prior work, there remain many challenges: (1) Data scarcity. (2) Capturing the long-range dependency, specifically, the connection between an event trigger and a distant event argument. (3) Integrating event trigger information into candidate argument representation. For (1), we explore using unlabeled data in different ways. For (2), we propose to use a syntax-attending Transformer that can utilize dependency parses to guide the attention mechanism. For (3), we propose a trigger-aware sequence encoder with several types of trigger-dependent sequence representations. We also support argument extraction either from text annotated with gold entities or from plain text. Experiments on the English ACE2005 benchmark show that our approach achieves a new state-of-the-art. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: Findings of EMNLP 2020

arXiv:2008.02177 [pdf, ps, other]

Band edge limit of the scattering matrix for quasi-one-dimensional discrete Schrödinger operators

Authors: Miguel Ballesteros, Gerardo Franco Córdova, Guillermo Garro, Hermann Schulz-Baldes

Abstract: This paper is about the scattering theory for one-dimensional matrix Schrödinger operators with a matrix potential having a finite first moment. The transmission coefficients are analytically continued and extended to the band edges. An explicit expression is given for these extensions. The limits of the reflection coefficients at the band edges is also calculated. This paper is about the scattering theory for one-dimensional matrix Schrödinger operators with a matrix potential having a finite first moment. The transmission coefficients are analytically continued and extended to the band edges. An explicit expression is given for these extensions. The limits of the reflection coefficients at the band edges is also calculated. △ Less

Submitted 28 March, 2022; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: minor corrections; appears in Complex Analysis and Operator Theory

arXiv:2007.00785 [pdf, ps, other]

doi 10.1007/s00220-021-03935-0

The appearance of particle tracks in detectors

Authors: Miguel Ballesteros, Tristan Benoist, Martin Fraas, Jürg Fröhlich

Abstract: The phenomenon that a quantum particle propagating in a detector, such as a Wilson cloud chamber, leaves a track close to a classical trajectory is analyzed. We introduce an idealized quantum-mechanical model of a charged particle that is periodically illuminated by pulses of laser light resulting in repeated indirect measurements of the approximate position of the particle. For this model we pres… ▽ More The phenomenon that a quantum particle propagating in a detector, such as a Wilson cloud chamber, leaves a track close to a classical trajectory is analyzed. We introduce an idealized quantum-mechanical model of a charged particle that is periodically illuminated by pulses of laser light resulting in repeated indirect measurements of the approximate position of the particle. For this model we present a mathematically rigorous analysis of the appearance of particle tracks, assuming that the Hamiltonian of the particle is quadratic in the position- and momentum operators, as for a freely moving particle or a harmonic oscillator. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: 40 pages

arXiv:2004.13099 [pdf, other]

Analyticity properties of the scattering matrix for matrix Schrödinger operators on the discrete line

Authors: Miguel Ballesteros, Gerardo Franco Córdova, Hermann Schulz-Baldes

Abstract: Explicit formulas for the analytic extensions of the scattering matrix and the time delay of a quasi-one-dimensional discrete Schrödinger operator with a potential of finite support are derived. This includes a careful analysis of the band edge singularities and allows to prove a Levinson-type theorem. The main algebraic tool are the plane wave transfer matrices. Explicit formulas for the analytic extensions of the scattering matrix and the time delay of a quasi-one-dimensional discrete Schrödinger operator with a potential of finite support are derived. This includes a careful analysis of the band edge singularities and allows to prove a Levinson-type theorem. The main algebraic tool are the plane wave transfer matrices. △ Less

Submitted 22 January, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: minor corrections, to appear in J. Math. Analysis and its Applications

arXiv:2004.04295 [pdf, ps, other]

Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events

Authors: Miguel Ballesteros, Rishita Anubhai, Shuai Wang, Nima Pourdamghani, Yogarshi Vyas, Jie Ma, Parminder Bhatia, Kathleen McKeown, Yaser Al-Onaizan

Abstract: In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrain… ▽ More In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations (i.e. RoBERTa, BERT or ELMo), transfer and multi-task learning (by leveraging complementary datasets), and self-training techniques. Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:2001.08279

Transition-Based Dependency Parsing using Perceptron Learner

Authors: Rahul Radhakrishnan Iyer, Miguel Ballesteros, Chris Dyer, Robert Frederking

Abstract: Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learn… ▽ More Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learner, outperforms a baseline arc-standard parser. We beat the UAS of the MALT and LSTM parsers. We also give possible ways to address parsing of non-projective trees. △ Less

Submitted 28 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

Comments: This was part of an assignment at my graduate course at LTI. This does not offer any major novelties

arXiv:1907.03013 [pdf, other]

One-Boson Scattering Processes in the Massless Spin-Boson Model -- A Non-Perturbative Formula

Authors: Miguel Ballesteros, Dirk-André Deckert, Felix Hänle

Abstract: In scattering experiments, physicists observe so-called resonances as peaks at certain energy values in the measured scattering cross sections per solid angle. These peaks are usually associate with certain scattering processes, e.g., emission, absorption, or excitation of certain particles and systems. On the other hand, mathematicians define resonances as poles of an analytic continuation of the… ▽ More In scattering experiments, physicists observe so-called resonances as peaks at certain energy values in the measured scattering cross sections per solid angle. These peaks are usually associate with certain scattering processes, e.g., emission, absorption, or excitation of certain particles and systems. On the other hand, mathematicians define resonances as poles of an analytic continuation of the resolvent operator through complex dilations. A major challenge is to relate these scattering and resonance theoretical notions, e.g., to prove that the poles of the resolvent operator induce the above mentioned peaks in the scattering matrix. In the case of quantum mechanics, this problem was addressed in numerous works that culminated in Simon's seminal paper [33] in which a general solution was presented for a large class of pair potentials. However, in quantum field theory the analogous problem has been open for several decades despite the fact that scattering and resonance theories have been well-developed for many models. In certain regimes these models describe very fundamental phenomena, such as emission and absorption of photons by atoms, from which quantum mechanics originated. In this work we present a first non-perturbative formula that relates the scattering matrix to the resolvent operator in the massless Spin-Boson model. This result can be seen as a major progress compared to our previous works [13] and [12] in which we only managed to derive a perturbative formula. △ Less

Submitted 15 May, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

Comments: 26 pages, 3 figure. arXiv admin note: text overlap with arXiv:1801.04843

arXiv:1905.13370 [pdf, ps, other]

Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning

Authors: Tahira Naseem, Abhishek Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros

Abstract: Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs. In addition, we also combined several AMR-to-text alignments with an attention mechanism and we supplemented the parser with pre-processed concept identification, named entities and contextualized emb… ▽ More Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs. In addition, we also combined several AMR-to-text alignments with an attention mechanism and we supplemented the parser with pre-processed concept identification, named entities and contextualized embeddings. We achieve a highly competitive performance that is comparable to the best published results. We show an in-depth study ablating each of the new components of the parser △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: Accepted as short paper at ACL 2019

arXiv:1903.03260 [pdf, other]

Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State

Authors: Richard Futrell, Ethan Wilcox, Takashi Morita, Peng Qian, Miguel Ballesteros, Roger Levy

Abstract: We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of Engl… ▽ More We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of English (Jozefowicz et al., 2016; Gulordava et al., 2018) trained on large datasets; an RNNG (Dyer et al., 2016) trained on a small, parsed dataset; and an LSTM trained on the same small corpus as the RNNG. We find evidence that the LSTMs trained on large datasets represent syntactic state over large spans of text in a way that is comparable to the RNNG, while the LSTM trained on the small dataset does not or does so only weakly. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: Accepted to NAACL 2019. Not yet edited into the camera-ready version

arXiv:1903.00943 [pdf, other]

Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

Authors: Ethan Wilcox, Peng Qian, Richard Futrell, Miguel Ballesteros, Roger Levy

Abstract: State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject… ▽ More State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. Using controlled experimental methods from psycholinguistics, we compare the performance of word-based LSTM models versus two models that represent hierarchical structure and deploy it in left-to-right processing: Recurrent Neural Network Grammars (RNNGs) (Dyer et al., 2016) and a incrementalized version of the Parsing-as-Language-Modeling configuration from Chariak et al., (2016). Models are tested on a diverse range of configurations for two classes of non-local grammatical dependencies in English---Negative Polarity licensing and Filler--Gap Dependencies. Using the same training data across models, we find that structurally-supervised models outperform the LSTM, with the RNNG demonstrating best results on both types of grammatical dependencies and even learning many of the Island Constraints on the filler--gap dependency. Structural supervision thus provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical dependencies. △ Less

Submitted 6 April, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

Comments: To appear: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

arXiv:1902.09781 [pdf, other]

Recursive Subtree Composition in LSTM-Based Dependency Parsing

Authors: Miryam de Lhoneux, Miguel Ballesteros, Joakim Nivre

Abstract: The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs c… ▽ More The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs capture information about subtrees. We perform model ablations to tease out the conditions under which composition helps. When ablating the backward LSTM, performance drops and composition does not recover much of the gap. When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information. We take the backward LSTM to be related to lookahead features and the forward LSTM to the rich history-based features both crucial for transition-based parsers. To capture history-based information, composition is better than a forward LSTM on its own, but it is even better to have a forward LSTM as part of a BiLSTM. We correlate results with language properties, showing that the improved lookahead of a backward LSTM is especially important for head-final languages. △ Less

Submitted 26 February, 2019; originally announced February 2019.

Comments: Accepted at NAACL 2019

arXiv:1902.02848 [pdf, ps, other]

Conditionally Free Reduced Products of Hilbert Spaces

Authors: Octavio Arizmendi, Miguel Ballesteros, Francisco Torres-Ayala

Abstract: We present a product of pairs of pointed Hilbert spaces that, in the context of Bozėjko, Leinert and Speicher's theory of conditionally free probability, plays the role of the reduced free product of pointed Hilbert spaces, and thus gives a unified construction for the natural notions of independence defined by Muraki. We additionally provide important applications of this construction. We prove… ▽ More We present a product of pairs of pointed Hilbert spaces that, in the context of Bozėjko, Leinert and Speicher's theory of conditionally free probability, plays the role of the reduced free product of pointed Hilbert spaces, and thus gives a unified construction for the natural notions of independence defined by Muraki. We additionally provide important applications of this construction. We prove that, assuming minor restrictions, for any pair of conditionally free algebras there are copies of them that are conditionally free and also free, a property that is frequently assumed (as hypothesis) to prove several results in the literature. Finally, we give a short proof of the linearization property of the $^cR$-transform (the analog of Voiculescu's $R$-transform in the context of conditionally free probability). △ Less

Submitted 7 February, 2019; originally announced February 2019.

arXiv:1810.09135 [pdf, ps, other]

One-Boson Scattering Processes in the massive Spin-Boson Model

Authors: Miguel Ballesteros, Dirk-André Deckert, Jérémy Faupin, Felix Hänle

Abstract: The Spin-Boson model describes a two-level quantum system that interacts with a second-quantized boson scalar field. Recently the relation between the integral kernel of the scattering matrix and the resonance in this model has been established in [14] for the case of massless bosons. In the present work, we treat the massive case. On the one hand, one might rightfully expect that the massive case… ▽ More The Spin-Boson model describes a two-level quantum system that interacts with a second-quantized boson scalar field. Recently the relation between the integral kernel of the scattering matrix and the resonance in this model has been established in [14] for the case of massless bosons. In the present work, we treat the massive case. On the one hand, one might rightfully expect that the massive case is easier to handle since, in contrast to the massless case, the corresponding Hamiltonian features a spectral gap. On the other hand, it turns out that the non-zero boson mass introduces a new complication as the spectrum of the complex dilated, free Hamiltonian exhibits lines of spectrum attached to every multiple of the boson rest mass energy starting from the ground and excited state energies. This leads to an absence of decay of the corresponding complex dilated resolvent close to the real line, which, in [14], was a crucial ingredient to control the time evolution in the scattering regime. With the new strategy presented here, we provide a proof of an analogous formula for the scattering kernel as compared to the massless case and use the opportunity to provide the required spectral information by a Mourre theory argument combined with a suitable application of the Feshbach-Schur map instead of complex dilation. △ Less

Submitted 10 May, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: 49 pages

arXiv:1806.03280 [pdf, other]

Multilingual Neural Machine Translation with Task-Specific Attention

Authors: Graeme Blackwood, Miguel Ballesteros, Todd Ward

Abstract: Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for… ▽ More Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task. Our experiments on four languages of the Europarl corpus show that using a target-specific model of attention provides consistent gains in translation quality for all possible translation directions, compared to a model in which all parameters are shared. We observe improved translation quality even in the (extreme) low-resource zero-shot translation directions for which the model never saw explicitly paired parallel data. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: COLING 2018

arXiv:1804.08915 [pdf, other]

Scheduled Multi-Task Learning: From Syntax to Translation

Authors: Eliyahu Kiperwasser, Miguel Ballesteros

Abstract: Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in ter… ▽ More Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in terms of BLEU score on relatively large parallel corpus (WMT14 English to German) and a low-resource (WIT German to English) setup. △ Less

Submitted 24 April, 2018; originally announced April 2018.

Journal ref: Transactions of the Association for Computational Linguistics, 6:225-240 (2018)

arXiv:1804.05038 [pdf, ps, other]

Pieces of Eight: 8-bit Neural Machine Translation

Authors: Jerry Quinn, Miguel Ballesteros

Abstract: Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with… ▽ More Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with 8-bit quantization for models that have been trained using 32-bit floating point values. Results show that 8-bit translation makes a non-negligible impact in terms of speed with no degradation in accuracy and adequacy. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: To appear at NAACL 2018 Industry Track

arXiv:1803.02392 [pdf, other]

Multimodal Emoji Prediction

Authors: Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, Horacio Saggion

Abstract: Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram… ▽ More Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other. △ Less

Submitted 17 April, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: NAACL 2018 (short)

arXiv:1801.04843 [pdf, ps, other]

doi 10.1007/s00220-019-03481-w

Relation between the Resonance and the Scattering Matrix in the massless Spin-Boson Model

Authors: Miguel Ballesteros, Dirk-André Deckert, Felix Hänle

Abstract: We establish the precise relation between the integral kernel of the scattering matrix and the resonance in the massless Spin-Boson model which describes the interaction of a two-level quantum system with a second-quantized scalar field. For this purpose, we derive an explicit formula for the two-body scattering matrix. We impose an ultraviolet cut-off and assume a slightly less singular behavior… ▽ More We establish the precise relation between the integral kernel of the scattering matrix and the resonance in the massless Spin-Boson model which describes the interaction of a two-level quantum system with a second-quantized scalar field. For this purpose, we derive an explicit formula for the two-body scattering matrix. We impose an ultraviolet cut-off and assume a slightly less singular behavior of the boson form factor of the relativistic scalar field but no infrared cut-off. The purpose of this work is to bring together scattering and resonance theory and arrive at a similar result as provided by Simon in [38], where it was shown that the singularities of the meromorphic continuation of the integral kernel of the scattering matrix are located precisely at the resonance energies. The corresponding problem has been open in quantum field theory ever since. To the best of our knowledge, the presented formula provides the first rigorous connection between resonance and scattering theory in the sense of [38] in a model of quantum field theory. △ Less

Submitted 23 February, 2019; v1 submitted 15 January, 2018; originally announced January 2018.

Comments: 46 pages, 2 figures. arXiv admin note: text overlap with arXiv:1810.09135

Journal ref: Commun. Math. Phys. (2019) 370: 249; The final publication is available at link.springer.com

arXiv:1801.04021 [pdf, ps, other]

doi 10.1016/j.jfa.2019.02.008

Analyticity of Resonances and Eigenvalues and Spectral Properties of the massless Spin-Boson Model

Authors: Miguel Ballesteros, Dirk-André Deckert, Felix Hänle

Abstract: We extend the method of multiscale analysis for resonances introduced in [5] in order to infer analytic properties of resonances and eigenvalues (and their eigenprojections) as well as estimates for the localization of the spectrum of dilated Hamiltonians and norm-bounds for the corresponding resolvent operators, in neighborhoods of resonances and eigenvalues. We apply our method to the massless S… ▽ More We extend the method of multiscale analysis for resonances introduced in [5] in order to infer analytic properties of resonances and eigenvalues (and their eigenprojections) as well as estimates for the localization of the spectrum of dilated Hamiltonians and norm-bounds for the corresponding resolvent operators, in neighborhoods of resonances and eigenvalues. We apply our method to the massless Spin-Boson model assuming a slight infrared regularization. We prove that the resonance and the ground-state eigenvalue (and their eigenprojections) are analytic with respect to the dilation parameter and the coupling constant. Moreover, we prove that the spectrum of the dilated Spin-Boson Hamiltonian in the neighborhood of the resonance and the ground-state eigenvalue is localized in two cones in the complex plane with vertices at the location of the resonance and the ground-state eigenvalue, respectively. Additionally, we provide norm-estimates for the resolvent of the dilated Spin-Boson Hamiltonian near the resonance and the ground-state eigenvalue. The topic of analyticity of eigenvalues and resonances has let to several studies and advances in the past. However, to the best of our knowledge, this is the first time that it is addressed from the perspective of multiscale analysis. Once the multiscale analysis is set up our method gives easy access to analyticity: Essentially, it amounts to proving it for isolated eigenvalues only and use that uniform limits of analytic functions are analytic. The type of spectral and resolvent estimates that we prove are needed to control the time evolution including the scattering regime. The latter will be demonstrated in a forthcoming publication. The introduced multiscale method to study spectral and resolvent estimates follows its own inductive scheme and is independent (and different) from the method we apply to construct resonances. △ Less

Submitted 17 February, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

Comments: 47 pages, 3 figures

Journal ref: Journal of Functional Analysis, Vol. 276(8), 2019, Pages 2524-2581

arXiv:1709.03149 [pdf, ps, other]

Perturbation Theory for Weak Measurements in Quantum Mechanics, I -- Systems with Finite-Dimensional State Space

Authors: M. Ballesteros, N. Crawford, M. Fraas, J. Fröhlich, B. Schubnel

Abstract: The quantum theory of indirect measurements in physical systems is studied. The example of an indirect measurement of an observable represented by a self-adjoint operator $\mathcal{N}$ with finite spectrum is analysed in detail. The Hamiltonian generating the time evolution of the system in the absence of direct measurements is assumed to be given by the sum of a term commuting with $\mathcal{N}$… ▽ More The quantum theory of indirect measurements in physical systems is studied. The example of an indirect measurement of an observable represented by a self-adjoint operator $\mathcal{N}$ with finite spectrum is analysed in detail. The Hamiltonian generating the time evolution of the system in the absence of direct measurements is assumed to be given by the sum of a term commuting with $\mathcal{N}$ and a small perturbation not commuting with $\mathcal{N}$. The system is subject to repeated direct (projective) measurements using a single instrument whose action on the state of the system commutes with $\mathcal{N}$. If the Hamiltonian commutes with the observable $\mathcal{N}$ (i.e., if the perturbation vanishes) the state of the system approaches an eigenstate of $\mathcal{N}$, as the number of direct measurements tends to $\infty$. If the perturbation term in the Hamiltonian does \textit{not} commute with $\mathcal{N}$ the system exhibits "jumps" between different eigenstates of $\mathcal{N}$. We determine the rate of these jumps to leading order in the strength of the perturbation and show that if time is re-scaled appropriately a maximum likelihood estimate of $\mathcal{N}$ approaches a Markovian jump process on the spectrum of $\mathcal{N}$, as the strength of the perturbation tends to $0$. △ Less

Submitted 10 September, 2017; originally announced September 2017.

Comments: 42 pages

arXiv:1709.00489 [pdf, ps, other]

Arc-Standard Spinal Parsing with Stack-LSTMs

Authors: Miguel Ballesteros, Xavier Carreras

Abstract: We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce u… ▽ More We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce useful states by themselves. △ Less

Submitted 1 September, 2017; originally announced September 2017.

Comments: IWPT 2017

arXiv:1707.07755 [pdf, ps, other]

AMR Parsing using Stack-LSTMs

Authors: Miguel Ballesteros, Yaser Al-Onaizan

Abstract: We present a transition-based AMR parser that directly generates AMR parses from plain text. We use Stack-LSTMs to represent our parser state and make decisions greedily. In our experiments, we show that our parser achieves very competitive scores on English using only AMR training data. Adding additional information, such as POS tags and dependency trees, improves the results further. We present a transition-based AMR parser that directly generates AMR parses from plain text. We use Stack-LSTMs to represent our parser state and make decisions greedily. In our experiments, we show that our parser achieves very competitive scores on English using only AMR training data. Adding additional information, such as POS tags and dependency trees, improves the results further. △ Less

Submitted 2 August, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

Comments: EMNLP 2017

arXiv:1706.09584 [pdf, ps, other]

Non-demolition measurements of observables with general spectra

Authors: M. Ballesteros, N. Crawford, M. Fraas, J. Fröhlich, B. Schubnel

Abstract: It has recently been established that, in a non-demolition measurement of an observable $\mathcal{N}$ with a finite point spectrum, the density matrix of the system approaches an eigenstate of $\mathcal{N}$, i.e., it "purifies" over the spectrum of $\mathcal{N}$. We extend this result to observables with general spectra. It is shown that the spectral density of the state of the system converges to… ▽ More It has recently been established that, in a non-demolition measurement of an observable $\mathcal{N}$ with a finite point spectrum, the density matrix of the system approaches an eigenstate of $\mathcal{N}$, i.e., it "purifies" over the spectrum of $\mathcal{N}$. We extend this result to observables with general spectra. It is shown that the spectral density of the state of the system converges to a delta function exponentially fast, in an appropriate sense. Furthermore, for observables with absolutely continuous spectra, we show that the spectral density approaches a Gaussian distribution over the spectrum of $\mathcal{N}$. Our methods highlight the connection between the theory of non-demolition measurements and classical estimation theory. △ Less

Submitted 29 June, 2017; originally announced June 2017.

Comments: 22 pages

arXiv:1702.07285 [pdf, other]

Are Emojis Predictable?

Authors: Francesco Barbieri, Miguel Ballesteros, Horacio Saggion

Abstract: Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are e… ▽ More Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis. △ Less

Submitted 24 February, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

Comments: To appear at EACL 2017

arXiv:1701.03980 [pdf, other]

DyNet: The Dynamic Neural Network Toolkit

Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet's dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet's speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at http://github.com/clab/dynet. △ Less

Submitted 14 January, 2017; originally announced January 2017.

Comments: 33 pages

Showing 1–50 of 72 results for author: Ballesteros, M