-
Scalable and Domain-General Abstractive Proposition Segmentation
Authors:
Mohammad Javad Hosseini,
Yang Gao,
Tim Baumgärtner,
Alex Fabrikant,
Reinald Kim Amplayo
Abstract:
Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming t…
▽ More
Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed sentences. Several recent works have demonstrated the utility of proposition segmentation with few-shot prompted LLMs for downstream tasks such as retrieval-augmented grounding and fact verification. However, this approach does not scale to large amounts of text and may not always extract all the facts from the input text. In this paper, we first introduce evaluation metrics for the task to measure several dimensions of quality. We then propose a scalable, yet accurate, proposition segmentation model. We model proposition segmentation as a supervised task by training LLMs on existing annotated datasets and show that training yields significantly improved results. We further show that by using the fine-tuned LLMs as teachers for annotating large amounts of multi-domain synthetic distillation data, we can train smaller student models with results similar to the teacher LLMs. We then demonstrate that our technique leads to effective domain generalization, by annotating data in two domains outside the original training data and evaluating on them. Finally, as a key contribution of the paper, we share an easy-to-use API for NLP practitioners to use.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
A synthetic data approach for domain generalization of NLI models
Authors:
Mohammad Javad Hosseini,
Andrey Petrov,
Alex Fabrikant,
Annie Louis
Abstract:
Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performan…
▽ More
Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performance on out-of-distribution/domain data is less well-understood. We explore the opportunity for synthetic high-quality datasets to adapt NLI models for zero-shot use in downstream applications across new and unseen text domains. We demonstrate a new approach for generating NLI data in diverse domains and lengths, so far not covered by existing training sets. The resulting examples have meaningful premises, the hypotheses are formed in creative ways rather than simple edits to a few premise tokens, and the labels have high accuracy. We show that models trained on this data ($685$K synthetic examples) have the best generalization to completely new downstream test settings. On the TRUE benchmark, a T5-small model trained with our data improves around $7\%$ on average compared to training on the best alternative dataset. The improvements are more pronounced for smaller models, while still meaningful on a T5 XXL model. We also demonstrate gains on test sets when in-domain training data is augmented with our domain-general synthetic data.
△ Less
Submitted 28 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction
Authors:
Jeremiah Milbauer,
Annie Louis,
Mohammad Javad Hosseini,
Alex Fabrikant,
Donald Metzler,
Tal Schuster
Abstract:
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segm…
▽ More
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages.
To this end, we introduce Layer-Adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. The LAIT framework effectively leverages existing pretrained Transformers and converts them into the hybrid of the two aforementioned architectures, allowing for easy and intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy; in some practical settings, LAIT could reduce actual latency by orders of magnitude.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Sources of Hallucination by Large Language Models on Inference Tasks
Authors:
Nick McKenna,
Tianyi Li,
Liang Cheng,
Mohammad Javad Hosseini,
Mark Johnson,
Mark Steedman
Abstract:
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavi…
▽ More
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level of sentences: we show that, regardless of the premise, models falsely label NLI test samples as entailing when the hypothesis is attested in training data, and that entities are used as ``indices'' to access the memorized data. Second, statistical patterns of usage learned at the level of corpora: we further show a similar effect when the premise predicate is less frequent than that of the hypothesis in the training data, a bias following from previous studies. We demonstrate that LLMs perform significantly worse on NLI test samples which do not conform to these biases than those which do, and we offer these as valuable controls for future LLM evaluation.
△ Less
Submitted 22 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Resolving Indirect Referring Expressions for Entity Selection
Authors:
Mohammad Javad Hosseini,
Filip Radlinski,
Silvia Pareti,
Annie Louis
Abstract:
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural res…
▽ More
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a dialog participant may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
△ Less
Submitted 26 May, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Language Models Are Poor Learners of Directional Inference
Authors:
Tianyi Li,
Mohammad Javad Hosseini,
Sabine Weber,
Mark Steedman
Abstract:
We examine LMs' competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimisti…
▽ More
We examine LMs' competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimistic results. In response, we present BoOQA (Boolean Open QA), a robust multi-lingual evaluation benchmark for directional predicate entailments, extrinsic to existing training sets. On BoOQA, we establish baselines and show evidence of existing LM-prompting models being incompetent directional entailment learners, in contrast to entailment graphs, however limited by sparsity.
△ Less
Submitted 14 October, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Organic log-domain integrator synapse
Authors:
Mohammad Javad Mirshojaeian Hosseini,
Elisa Donati,
Giacomo Indiveri,
Robert A. Nawrocki
Abstract:
Synapses play a critical role in memory, learning, and cognition. Their main functions include converting pre-synaptic voltage spikes to post-synaptic currents, as well as scaling the input signal. Several brain-inspired architectures have been proposed to emulate the behavior of biological synapses. While these are useful to explore the properties of nervous systems, the challenge of making bioco…
▽ More
Synapses play a critical role in memory, learning, and cognition. Their main functions include converting pre-synaptic voltage spikes to post-synaptic currents, as well as scaling the input signal. Several brain-inspired architectures have been proposed to emulate the behavior of biological synapses. While these are useful to explore the properties of nervous systems, the challenge of making biocompatible and flexible circuits with biologically plausible time constants and tunable gain remains. Here, a physically flexible organic log-domain integrator synaptic circuit is shown to address this challenge. In particular, the circuit is fabricated using organic-based materials that are electrically active, offer flexibility and biocompatibility, as well as time constants (critical in learning neural codes and encoding spatiotemporal patterns) that are biologically plausible. Using a 10 nF synaptic capacitor, the time constant reached 126 ms and 221 ms before and during bending, respectively. The flexible synaptic circuit is characterized before and during bending, followed by studies on the effects of weighting voltage, synaptic capacitance, and disparity in pre-synaptic signals on the time constant.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Cross-lingual Inference with A Chinese Entailment Graph
Authors:
Tianyi Li,
Sabine Weber,
Mohammad Javad Hosseini,
Liane Guillou,
Mark Steedman
Abstract:
Predicate entailment detection is a crucial task for question-answering from text, where previous work has explored unsupervised learning of entailment graphs from typed open relation triples. In this paper, we present the first pipeline for building Chinese entailment graphs, which involves a novel high-recall open relation extraction (ORE) method and the first Chinese fine-grained entity ty**…
▽ More
Predicate entailment detection is a crucial task for question-answering from text, where previous work has explored unsupervised learning of entailment graphs from typed open relation triples. In this paper, we present the first pipeline for building Chinese entailment graphs, which involves a novel high-recall open relation extraction (ORE) method and the first Chinese fine-grained entity ty** dataset under the FIGER type ontology. Through experiments on the Levy-Holt dataset, we verify the strength of our Chinese entailment graph, and reveal the cross-lingual complementarity: on the parallel Levy-Holt dataset, an ensemble of Chinese and English entailment graphs outperforms both monolingual graphs, and raises unsupervised SOTA by 4.7 AUC points.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Incorporating Temporal Information in Entailment Graph Mining
Authors:
Liane Guillou,
Sander Bijl de Vroe,
Mohammad Javad Hosseini,
Mark Johnson,
Mark Steedman
Abstract:
We present a novel method for injecting temporality into entailment graphs to address the problem of spurious entailments, which may arise from similar but temporally distinct events involving the same pair of entities. We focus on the sports domain in which the same pairs of teams play on different occasions, with different outcomes. We present an unsupervised model that aims to learn entailments…
▽ More
We present a novel method for injecting temporality into entailment graphs to address the problem of spurious entailments, which may arise from similar but temporally distinct events involving the same pair of entities. We focus on the sports domain in which the same pairs of teams play on different occasions, with different outcomes. We present an unsupervised model that aims to learn entailments such as win/lose $\rightarrow$ play, while avoiding the pitfall of learning non-entailments such as win $\not\rightarrow$ lose. We evaluate our model on a manually constructed dataset, showing that incorporating time intervals and applying a temporal window around them, are effective strategies.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Multivalent Entailment Graphs for Question Answering
Authors:
Nick McKenna,
Liane Guillou,
Mohammad Javad Hosseini,
Sander Bijl de Vroe,
Mark Johnson,
Mark Steedman
Abstract:
Drawing inferences between open-domain natural language predicates is a necessity for true language understanding. There has been much progress in unsupervised learning of entailment graphs for this purpose. We make three contributions: (1) we reinterpret the Distributional Inclusion Hypothesis to model entailment between predicates of different valencies, like DEFEAT(Biden, Trump) entails WIN(Bid…
▽ More
Drawing inferences between open-domain natural language predicates is a necessity for true language understanding. There has been much progress in unsupervised learning of entailment graphs for this purpose. We make three contributions: (1) we reinterpret the Distributional Inclusion Hypothesis to model entailment between predicates of different valencies, like DEFEAT(Biden, Trump) entails WIN(Biden); (2) we actualize this theory by learning unsupervised Multivalent Entailment Graphs of open-domain predicates; and (3) we demonstrate the capabilities of these graphs on a novel question answering task. We show that directional entailment is more helpful for inference than bidirectional similarity on questions of fine-grained semantics. We also show that drawing on evidence across valencies answers more questions than by using only the same valency evidence.
△ Less
Submitted 19 September, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Spatial Language Representation with Multi-Level Geocoding
Authors:
Sayali Kulkarni,
Shailee Jain,
Mohammad Javad Hosseini,
Jason Baldridge,
Eugene Ie,
Li Zhang
Abstract:
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations. The Earth's surface is represented using space-filling curves that decompose the sphere into a hierarchy of similarly sized, non-overlap** cells. MLG balances generalization and accuracy by combining losses across multiple levels and predicting cells at each level simultaneously. Without using…
▽ More
We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations. The Earth's surface is represented using space-filling curves that decompose the sphere into a hierarchy of similarly sized, non-overlap** cells. MLG balances generalization and accuracy by combining losses across multiple levels and predicting cells at each level simultaneously. Without using any dataset-specific tuning, we show that MLG obtains state-of-the-art results for toponym resolution on three English datasets. Furthermore, it obtains large gains without any knowledge base metadata, demonstrating that it can effectively learn the connection between text spans and coordinates - and thus can be extended to toponymns not present in knowledge bases.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Jointly Modeling Hierarchical and Horizontal Features for Relational Triple Extraction
Authors:
Zhepei Wei,
Yantao Jia,
Yuan Tian,
Mohammad Javad Hosseini,
Sujian Li,
Mark Steedman,
Yi Chang
Abstract:
Recent works on relational triple extraction have shown the superiority of jointly extracting entities and relations over the pipelined extraction manner. However, most existing joint models fail to balance the modeling of entity features and the joint decoding strategy, and thus the interactions between the entity level and triple level are not fully investigated. In this work, we first introduce…
▽ More
Recent works on relational triple extraction have shown the superiority of jointly extracting entities and relations over the pipelined extraction manner. However, most existing joint models fail to balance the modeling of entity features and the joint decoding strategy, and thus the interactions between the entity level and triple level are not fully investigated. In this work, we first introduce the hierarchical dependency and horizontal commonality between the two levels, and then propose an entity-enhanced dual tagging framework that enables the triple extraction (TE) task to utilize such interactions with self-learned entity features through an auxiliary entity extraction (EE) task, without breaking the joint decoding of relational triples. Specifically, we align the EE and TE tasks in a position-wise manner by formulating them as two sequence labeling problems with identical encoder-decoder structure. Moreover, the two tasks are organized in a carefully designed parameter sharing setting so that the learned entity features could be naturally shared via multi-task learning. Empirical experiments on the NYT benchmark demonstrate the effectiveness of the proposed framework compared to the state-of-the-art methods.
△ Less
Submitted 3 May, 2022; v1 submitted 23 August, 2019;
originally announced August 2019.