Search | arXiv e-print repository

Differentiable Reasoning about Knowledge Graphs with Region-based Graph Neural Networks

Authors: Aleksandar Pavlovic, Emanuel Sallinger, Steven Schockaert

Abstract: Methods for knowledge graph (KG) completion need to capture semantic regularities and use these regularities to infer plausible knowledge that is not explicitly stated. Most embedding-based methods are opaque in the kinds of regularities they can capture, although region-based KG embedding models have emerged as a more transparent alternative. By modeling relations as geometric regions in high-dim… ▽ More Methods for knowledge graph (KG) completion need to capture semantic regularities and use these regularities to infer plausible knowledge that is not explicitly stated. Most embedding-based methods are opaque in the kinds of regularities they can capture, although region-based KG embedding models have emerged as a more transparent alternative. By modeling relations as geometric regions in high-dimensional vector spaces, such models can explicitly capture semantic regularities in terms of the spatial arrangement of these regions. Unfortunately, existing region-based approaches are severely limited in the kinds of rules they can capture. We argue that this limitation arises because the considered regions are defined as the Cartesian product of two-dimensional regions. As an alternative, in this paper, we propose RESHUFFLE, a simple model based on ordering constraints that can faithfully capture a much larger class of rule bases than existing approaches. Moreover, the embeddings in our framework can be learned by a monotonic Graph Neural Network (GNN), which effectively acts as a differentiable rule base. This approach has the important advantage that embeddings can be easily updated as new knowledge is added to the KG. At the same time, since the resulting representations can be used similarly to standard KG embeddings, our approach is significantly more efficient than existing approaches to differentiable reasoning. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.17216 [pdf, other]

Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis

Authors: Na Li, Thomas Bailleux, Zied Bouraoui, Steven Schockaert

Abstract: We consider the problem of finding plausible knowledge that is missing from a given ontology, as a generalisation of the well-studied taxonomy expansion task. One line of work treats this task as a Natural Language Inference (NLI) problem, thus relying on the knowledge captured by language models to identify the missing knowledge. Another line of work uses concept embeddings to identify what diffe… ▽ More We consider the problem of finding plausible knowledge that is missing from a given ontology, as a generalisation of the well-studied taxonomy expansion task. One line of work treats this task as a Natural Language Inference (NLI) problem, thus relying on the knowledge captured by language models to identify the missing knowledge. Another line of work uses concept embeddings to identify what different concepts have in common, taking inspiration from cognitive models for category based induction. These two approaches are intuitively complementary, but their effectiveness has not yet been compared. In this paper, we introduce a benchmark for evaluating ontology completion methods and thoroughly analyse the strengths and weaknesses of both approaches. We find that both approaches are indeed complementary, with hybrid strategies achieving the best overall results. We also find that the task is highly challenging for Large Language Models, even after fine-tuning. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16984 [pdf, other]

Modelling Commonsense Commonalities with Multi-Facet Concept Embeddings

Authors: Hanane Kteich, Na Li, Usashi Chatterjee, Zied Bouraoui, Steven Schockaert

Abstract: Concept embeddings offer a practical and efficient mechanism for injecting commonsense knowledge into downstream tasks. Their core purpose is often not to predict the commonsense properties of concepts themselves, but rather to identify commonalities, i.e.\ sets of concepts which share some property of interest. Such commonalities are the basis for inductive generalisation, hence high-quality conc… ▽ More Concept embeddings offer a practical and efficient mechanism for injecting commonsense knowledge into downstream tasks. Their core purpose is often not to predict the commonsense properties of concepts themselves, but rather to identify commonalities, i.e.\ sets of concepts which share some property of interest. Such commonalities are the basis for inductive generalisation, hence high-quality concept embeddings can make learning easier and more robust. Unfortunately, standard embeddings primarily reflect basic taxonomic categories, making them unsuitable for finding commonalities that refer to more specific aspects (e.g.\ the colour of objects or the materials they are made of). In this paper, we address this limitation by explicitly modelling the different facets of interest when learning concept embeddings. We show that this leads to embeddings which capture a more diverse range of commonsense properties, and consistently improves results in downstream tasks such as ultra-fine entity ty** and ontology completion. △ Less

Submitted 4 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.15337 [pdf, other]

Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies

Authors: Nitesh Kumar, Usashi Chatterjee, Steven Schockaert

Abstract: Conceptual spaces represent entities in terms of their primitive semantic features. Such representations are highly valuable but they are notoriously difficult to learn, especially when it comes to modelling perceptual and subjective features. Distilling conceptual spaces from Large Language Models (LLMs) has recently emerged as a promising strategy, but existing work has been limited to probing p… ▽ More Conceptual spaces represent entities in terms of their primitive semantic features. Such representations are highly valuable but they are notoriously difficult to learn, especially when it comes to modelling perceptual and subjective features. Distilling conceptual spaces from Large Language Models (LLMs) has recently emerged as a promising strategy, but existing work has been limited to probing pre-trained LLMs using relatively simple zero-shot strategies. We focus in particular on the task of ranking entities according to a given conceptual space dimension. Unfortunately, we cannot directly fine-tune LLMs on this task, because ground truth rankings for conceptual space dimensions are rare. We therefore use more readily available features as training data and analyse whether the ranking capabilities of the resulting models transfer to perceptual and subjective features. We find that this is indeed the case, to some extent, but having at least some perceptual and subjective features in the training data seems essential for achieving the best results. △ Less

Submitted 5 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted in ACL 2024 (Findings)

arXiv:2401.16270 [pdf, other]

Capturing Knowledge Graphs and Rules with Octagon Embeddings

Authors: Victor Charpenay, Steven Schockaert

Abstract: Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to… ▽ More Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to model rules, thus failing to deliver on the main promise of region based models. With the aim of addressing these limitations, we investigate regions which are composed of axis-aligned octagons. Such octagons are particularly easy to work with, as intersections and compositions can be straightforwardly computed, while they are still sufficiently expressive to model arbitrary knowledge graphs. Among others, we also show that our octagon embeddings can properly capture a non-trivial class of rule bases. Finally, we show that our model achieves competitive experimental results. △ Less

Submitted 18 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted for IJCAI 2024

arXiv:2312.11062 [pdf, other]

Entity or Relation Embeddings? An Analysis of Encoding Strategies for Relation Extraction

Authors: Frank Mtumbuka, Steven Schockaert

Abstract: Relation extraction is essentially a text classification problem, which can be tackled by fine-tuning a pre-trained language model (LM). However, a key challenge arises from the fact that relation extraction cannot straightforwardly be reduced to sequence or token classification. Existing approaches therefore solve the problem in an indirect way: they fine-tune an LM to learn embeddings of the hea… ▽ More Relation extraction is essentially a text classification problem, which can be tackled by fine-tuning a pre-trained language model (LM). However, a key challenge arises from the fact that relation extraction cannot straightforwardly be reduced to sequence or token classification. Existing approaches therefore solve the problem in an indirect way: they fine-tune an LM to learn embeddings of the head and tail entities, and then predict the relationship from these entity embeddings. Our hypothesis in this paper is that relation extraction models can be improved by capturing relationships in a more direct way. In particular, we experiment with appending a prompt with a [MASK] token, whose contextualised representation is treated as a relation embedding. While, on its own, this strategy significantly underperforms the aforementioned approach, we find that the resulting relation embeddings are highly complementary to what is captured by embeddings of the head and tail entity. By jointly considering both types of representations, we end up with a simple model that outperforms the state-of-the-art across several relation extraction benchmarks. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.14793 [pdf, other]

What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies

Authors: Amit Gajbhiye, Zied Bouraoui, Na Li, Usashi Chatterjee, Luis Espinosa Anke, Steven Schockaert

Abstract: Concepts play a central role in many applications. This includes settings where concepts have to be modelled in the absence of sentence context. Previous work has therefore focused on distilling decontextualised concept embeddings from language models. But concepts can be modelled from different perspectives, whereas concept embeddings typically mostly capture taxonomic structure. To address this… ▽ More Concepts play a central role in many applications. This includes settings where concepts have to be modelled in the absence of sentence context. Previous work has therefore focused on distilling decontextualised concept embeddings from language models. But concepts can be modelled from different perspectives, whereas concept embeddings typically mostly capture taxonomic structure. To address this issue, we propose a strategy for identifying what different concepts, from a potentially large concept vocabulary, have in common with others. We then represent concepts in terms of the properties they share with the other concepts. To demonstrate the practical usefulness of this way of modelling concepts, we consider the task of ultra-fine entity ty**, which is a challenging multi-label classification problem. We show that by augmenting the label set with shared properties, we can improve the performance of the state-of-the-art models for this task. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted for EMNLP 2023

arXiv:2310.12379 [pdf, other]

Solving Hard Analogy Questions with Relation Embedding Chains

Authors: Nitesh Kumar, Steven Schockaert

Abstract: Modelling how concepts are related is a central topic in Lexical Semantics. A common strategy is to rely on knowledge graphs (KGs) such as ConceptNet, and to model the relation between two concepts as a set of paths. However, KGs are limited to a fixed set of relation types, and they are incomplete and often noisy. Another strategy is to distill relation embeddings from a fine-tuned language model… ▽ More Modelling how concepts are related is a central topic in Lexical Semantics. A common strategy is to rely on knowledge graphs (KGs) such as ConceptNet, and to model the relation between two concepts as a set of paths. However, KGs are limited to a fixed set of relation types, and they are incomplete and often noisy. Another strategy is to distill relation embeddings from a fine-tuned language model. However, this is less suitable for words that are only indirectly related and it does not readily allow us to incorporate structured domain knowledge. In this paper, we aim to combine the best of both worlds. We model relations as paths but associate their edges with relation embeddings. The paths are obtained by first identifying suitable intermediate words and then selecting those words for which informative relation embeddings can be obtained. We empirically show that our proposed representations are useful for solving hard analogy questions. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.05481 [pdf, other]

Cabbage Sweeter than Cake? Analysing the Potential of Large Language Models for Learning Conceptual Spaces

Authors: Usashi Chatterjee, Amit Gajbhiye, Steven Schockaert

Abstract: The theory of Conceptual Spaces is an influential cognitive-linguistic framework for representing the meaning of concepts. Conceptual spaces are constructed from a set of quality dimensions, which essentially correspond to primitive perceptual features (e.g. hue or size). These quality dimensions are usually learned from human judgements, which means that applications of conceptual spaces tend to… ▽ More The theory of Conceptual Spaces is an influential cognitive-linguistic framework for representing the meaning of concepts. Conceptual spaces are constructed from a set of quality dimensions, which essentially correspond to primitive perceptual features (e.g. hue or size). These quality dimensions are usually learned from human judgements, which means that applications of conceptual spaces tend to be limited to narrow domains (e.g. modelling colour or taste). Encouraged by recent findings about the ability of Large Language Models (LLMs) to learn perceptually grounded representations, we explore the potential of such models for learning conceptual spaces. Our experiments show that LLMs can indeed be used for learning meaningful representations to some extent. However, we also find that fine-tuned models of the BERT family are able to match or even outperform the largest GPT-3 model, despite being 2 to 3 orders of magnitude smaller. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted for EMNLP 2023

arXiv:2310.00299 [pdf, other]

RelBERT: Embedding Relations with Language Models

Authors: Asahi Ushio, Jose Camacho-Collados, Steven Schockaert

Abstract: Many applications need access to background knowledge about how different concepts and entities are related. Although Knowledge Graphs (KG) and Large Language Models (LLM) can address this need to some extent, KGs are inevitably incomplete and their relational schema is often too coarse-grained, while LLMs are inefficient and difficult to control. As an alternative, we propose to extract relation… ▽ More Many applications need access to background knowledge about how different concepts and entities are related. Although Knowledge Graphs (KG) and Large Language Models (LLM) can address this need to some extent, KGs are inevitably incomplete and their relational schema is often too coarse-grained, while LLMs are inefficient and difficult to control. As an alternative, we propose to extract relation embeddings from relatively small language models. In particular, we show that masked language models such as RoBERTa can be straightforwardly fine-tuned for this purpose, using only a small amount of training data. The resulting model, which we call RelBERT, captures relational similarity in a surprisingly fine-grained way, allowing us to set a new state-of-the-art in analogy benchmarks. Crucially, RelBERT is capable of modelling relations that go well beyond what the model has seen during training. For instance, we obtained strong results on relations between named entities with a model that was only trained on lexical relations between concepts, and we observed that RelBERT can recognise morphological analogies despite not being trained on such examples. Overall, we find that RelBERT significantly outperforms strategies based on prompting language models that are several orders of magnitude larger, including recent GPT-based models and open source models. △ Less

Submitted 8 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.15217 [pdf, other]

RAGAS: Automated Evaluation of Retrieval Augmented Generation

Authors: Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert

Abstract: We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines. RAG systems are composed of a retrieval and an LLM based generation module, and provide LLMs with knowledge from a reference textual database, which enables them to act as a natural language layer between a user and textual databases, reducing… ▽ More We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines. RAG systems are composed of a retrieval and an LLM based generation module, and provide LLMs with knowledge from a reference textual database, which enables them to act as a natural language layer between a user and textual databases, reducing the risk of hallucinations. Evaluating RAG architectures is, however, challenging because there are several dimensions to consider: the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself. With RAGAs, we put forward a suite of metrics which can be used to evaluate these different dimensions \textit{without having to rely on ground truth human annotations}. We posit that such a framework can crucially contribute to faster evaluation cycles of RAG architectures, which is especially important given the fast adoption of LLMs. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Reference-free (not tied to having ground truth available) evaluation framework for retrieval agumented generation

arXiv:2308.07942 [pdf, other]

Inductive Knowledge Graph Completion with GNNs and Rules: An Analysis

Authors: Akash Anil, Víctor Gutiérrez-Basulto, Yazmín Ibañéz-García, Steven Schockaert

Abstract: The task of inductive knowledge graph completion requires models to learn inference patterns from a training graph, which can then be used to make predictions on a disjoint test graph. Rule-based methods seem like a natural fit for this task, but in practice they significantly underperform state-of-the-art methods based on Graph Neural Networks (GNNs), such as NBFNet. We hypothesise that the under… ▽ More The task of inductive knowledge graph completion requires models to learn inference patterns from a training graph, which can then be used to make predictions on a disjoint test graph. Rule-based methods seem like a natural fit for this task, but in practice they significantly underperform state-of-the-art methods based on Graph Neural Networks (GNNs), such as NBFNet. We hypothesise that the underperformance of rule-based methods is due to two factors: (i) implausible entities are not ranked at all and (ii) only the most informative path is taken into account when determining the confidence in a given link prediction answer. To analyse the impact of these factors, we study a number of variants of a rule-based approach, which are specifically aimed at addressing the aforementioned issues. We find that the resulting models can achieve a performance which is close to that of NBFNet. Crucially, the considered variants only use a small fraction of the evidence that NBFNet relies on, which means that they largely keep the interpretability advantage of rule-based methods. Moreover, we show that a further variant, which does look at the full KG, consistently outperforms NBFNet. △ Less

Submitted 24 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2305.15002 [pdf, other]

A RelEntLess Benchmark for Modelling Graded Relations between Named Entities

Authors: Asahi Ushio, Jose Camacho Collados, Steven Schockaert

Abstract: Relations such as "is influenced by", "is known for" or "is a competitor of" are inherently graded: we can rank entity pairs based on how well they satisfy these relations, but it is hard to draw a line between those pairs that satisfy them and those that do not. Such graded relations play a central role in many applications, yet they are typically not covered by existing Knowledge Graphs. In this… ▽ More Relations such as "is influenced by", "is known for" or "is a competitor of" are inherently graded: we can rank entity pairs based on how well they satisfy these relations, but it is hard to draw a line between those pairs that satisfy them and those that do not. Such graded relations play a central role in many applications, yet they are typically not covered by existing Knowledge Graphs. In this paper, we consider the possibility of using Large Language Models (LLMs) to fill this gap. To this end, we introduce a new benchmark, in which entity pairs have to be ranked according to how much they satisfy a given graded relation. The task is formulated as a few-shot ranking problem, where models only have access to a description of the relation and five prototypical instances. We use the proposed benchmark to evaluate state-of-the-art relation embedding strategies as well as several recent LLMs, covering both publicly available LLMs and closed models such as GPT-4. Overall, we find a strong correlation between model size and performance, with smaller Language Models struggling to outperform a naive baseline. The results of the largest Flan-T5 and OPT models are remarkably strong, although a clear gap with human performance remains. △ Less

Submitted 30 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: EACL 2024 main conference

arXiv:2305.12924 [pdf, other]

EnCore: Fine-Grained Entity Ty** by Pre-Training Entity Encoders on Coreference Chains

Authors: Frank Mtumbuka, Steven Schockaert

Abstract: Entity ty** is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity ty** (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this proc… ▽ More Entity ty** is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity ty** (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this process by pre-training an entity encoder such that embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. The main problem with this strategy, which helps to explain why it has not previously been considered, is that predicted coreference links are often too noisy. We show that this problem can be addressed by using a simple trick: we only consider coreference links that are predicted by two different off-the-shelf systems. With this prudent use of coreference links, our pre-training strategy allows us to improve the state-of-the-art in benchmarks on fine-grained entity ty**, as well as traditional entity extraction. △ Less

Submitted 27 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: To appear at EACL 2024

arXiv:2305.12802 [pdf, other]

Ultra-Fine Entity Ty** with Prior Knowledge about Labels: A Simple Clustering Based Strategy

Authors: Na Li, Zied Bouraoui, Steven Schockaert

Abstract: Ultra-fine entity ty** (UFET) is the task of inferring the semantic types, from a large set of fine-grained candidates, that apply to a given entity mention. This task is especially challenging because we only have a small number of training examples for many of the types, even with distant supervision strategies. State-of-the-art models, therefore, have to rely on prior knowledge about the type… ▽ More Ultra-fine entity ty** (UFET) is the task of inferring the semantic types, from a large set of fine-grained candidates, that apply to a given entity mention. This task is especially challenging because we only have a small number of training examples for many of the types, even with distant supervision strategies. State-of-the-art models, therefore, have to rely on prior knowledge about the type labels in some way. In this paper, we show that the performance of existing methods can be improved using a simple technique: we use pre-trained label embeddings to cluster the labels into semantic domains and then treat these domains as additional types. We show that this strategy consistently leads to improved results, as long as high-quality label embeddings are used. We furthermore use the label clusters as part of a simple post-processing technique, which results in further performance gains. Both strategies treat the UFET model as a black box and can thus straightforwardly be used to improve a wide range of existing models. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.09785 [pdf, other]

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

Authors: Na Li, Hanane Kteich, Zied Bouraoui, Steven Schockaert

Abstract: Learning vectors that capture the meaning of concepts remains a fundamental challenge. Somewhat surprisingly, perhaps, pre-trained language models have thus far only enabled modest improvements to the quality of such concept embeddings. Current strategies for using language models typically represent a concept by averaging the contextualised representations of its mentions in some corpus. This is… ▽ More Learning vectors that capture the meaning of concepts remains a fundamental challenge. Somewhat surprisingly, perhaps, pre-trained language models have thus far only enabled modest improvements to the quality of such concept embeddings. Current strategies for using language models typically represent a concept by averaging the contextualised representations of its mentions in some corpus. This is potentially sub-optimal for at least two reasons. First, contextualised word vectors have an unusual geometry, which hampers downstream tasks. Second, concept embeddings should capture the semantic properties of concepts, whereas contextualised word vectors are also affected by other factors. To address these issues, we propose two contrastive learning strategies, based on the view that whenever two sentences reveal similar properties, the corresponding contextualised vectors should also be similar. One strategy is fully unsupervised, estimating the properties which are expressed in a sentence from the neighbourhood structure of the contextualised word embeddings. The second strategy instead relies on a distant supervision signal from ConceptNet. Our experimental results show that the resulting vectors substantially outperform existing concept embeddings in predicting the semantic properties of concepts, with the ConceptNet-based strategy achieving the best results. These findings are furthermore confirmed in a clustering task and in the downstream task of ontology completion. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.08414 [pdf, other]

What's the Meaning of Superhuman Performance in Today's NLU?

Authors: Simone Tedeschi, Johan Bos, Thierry Declerck, Jan Hajic, Daniel Hershcovich, Eduard H. Hovy, Alexander Koller, Simon Krek, Steven Schockaert, Rico Sennrich, Ekaterina Shutova, Roberto Navigli

Abstract: In the last five years, there has been a significant focus in Natural Language Processing (NLP) on develo** larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in… ▽ More In the last five years, there has been a significant focus in Natural Language Processing (NLP) on develo** larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 9 pages, long paper at ACL 2023 proceedings

arXiv:2210.05723 [pdf, other]

Embeddings as Epistemic States: Limitations on the Use of Pooling Operators for Accumulating Knowledge

Authors: Steven Schockaert

Abstract: Various neural network architectures rely on pooling operators to aggregate information coming from different sources. It is often implicitly assumed in such contexts that vectors encode epistemic states, i.e. that vectors capture the evidence that has been obtained about some properties of interest, and that pooling these vectors yields a vector that combines this evidence. We study, for a number… ▽ More Various neural network architectures rely on pooling operators to aggregate information coming from different sources. It is often implicitly assumed in such contexts that vectors encode epistemic states, i.e. that vectors capture the evidence that has been obtained about some properties of interest, and that pooling these vectors yields a vector that combines this evidence. We study, for a number of standard pooling operators, under what conditions they are compatible with this idea, which we call the epistemic pooling principle. While we find that all the considered pooling operators can satisfy the epistemic pooling principle, this only holds when embeddings are sufficiently high-dimensional and, for most pooling operators, when the embeddings satisfy particular constraints (e.g. having non-negative coordinates). We furthermore show that these constraints have important implications on how the embeddings can be used in practice. In particular, we find that when the epistemic pooling principle is satisfied, in most cases it is impossible to verify the satisfaction of propositional formulas using linear scoring functions, with two exceptions: (i) max-pooling with embeddings that are upper-bounded and (ii) Hadamard pooling with non-negative embeddings. This finding helps to clarify, among others, why Graph Neural Networks sometimes under-perform in reasoning tasks. Finally, we also study an extension of the epistemic pooling principle to weighted epistemic states, which are important in the context of non-monotonic reasoning, where max-pooling emerges as the most suitable operator. △ Less

Submitted 3 July, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.02771 [pdf, other]

Modelling Commonsense Properties using Pre-Trained Bi-Encoders

Authors: Amit Gajbhiye, Luis Espinosa-Anke, Steven Schockaert

Abstract: Gras** the commonsense properties of everyday concepts is an important prerequisite to language understanding. While contextualised language models are reportedly capable of predicting such commonsense properties with human-level accuracy, we argue that such results have been inflated because of the high similarity between training and test concepts. This means that models which capture concept… ▽ More Gras** the commonsense properties of everyday concepts is an important prerequisite to language understanding. While contextualised language models are reportedly capable of predicting such commonsense properties with human-level accuracy, we argue that such results have been inflated because of the high similarity between training and test concepts. This means that models which capture concept similarity can perform well, even if they do not capture any knowledge of the commonsense properties themselves. In settings where there is no overlap between the properties that are considered during training and testing, we find that the empirical performance of standard language models drops dramatically. To address this, we study the possibility of fine-tuning language models to explicitly model concepts and their properties. In particular, we train separate concept and property encoders on two types of readily available data: extracted hyponym-hypernym pairs and generic sentences. Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible by directly fine-tuning language models. We also present experimental results for the related task of unsupervised hypernym discovery. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: COLING 2022

arXiv:2112.01037 [pdf, other]

Inferring Prototypes for Multi-Label Few-Shot Image Classification with Word Vector Guided Attention

Authors: Kun Yan, Chenbin Zhang, Jun Hou, ** Wang, Zied Bouraoui, Shoaib Jameel, Steven Schockaert

Abstract: Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images, based on a small number of training examples. A key feature of the multi-label setting is that images often have multiple labels, which typically refer to different regions of the image. When estimating prototypes, in a metric-based setting, it is thus important to determine… ▽ More Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images, based on a small number of training examples. A key feature of the multi-label setting is that images often have multiple labels, which typically refer to different regions of the image. When estimating prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data makes this highly challenging. As a solution, in this paper we propose to use word embeddings as a form of prior knowledge about the meaning of the labels. In particular, visual prototypes are obtained by aggregating the local feature maps of the support images, using an attention mechanism that relies on the label embeddings. As an important advantage, our model can infer prototypes for unseen labels without the need for fine-tuning any model parameters, which demonstrates its strong generalization abilities. Experiments on COCO and PASCAL VOC furthermore show that our model substantially improves the current state-of-the-art. △ Less

Submitted 7 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI2022

arXiv:2110.15705 [pdf, other]

doi 10.18653/v1/2021.emnlp-main.712

Distilling Relation Embeddings from Pre-trained Language Models

Authors: Asahi Ushio, Jose Camacho-Collados, Steven Schockaert

Abstract: Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relat… ▽ More Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relation embeddings, i.e. vectors that characterize the relationship between two words. Such relation embeddings are appealing because they can, in principle, encode relational knowledge in a more fine-grained way than is possible with knowledge graphs. To obtain relation embeddings from a pre-trained language model, we encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model such that relationally similar word pairs yield similar output vectors. We find that the resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks, even without any task-specific fine-tuning. Source code to reproduce our experimental results and the model checkpoints are available in the following repository: https://github.com/asahi417/relbert △ Less

Submitted 21 September, 2021; originally announced October 2021.

Comments: EMNLP 2021 main conference

arXiv:2106.14431 [pdf, ps, other]

Modelling Monotonic and Non-Monotonic Attribute Dependencies with Embeddings: A Theoretical Analysis

Authors: Steven Schockaert

Abstract: During the last decade, entity embeddings have become ubiquitous in Artificial Intelligence. Such embeddings essentially serve as compact but semantically meaningful representations of the entities of interest. In most approaches, vectors are used for representing the entities themselves, as well as for representing their associated attributes. An important advantage of using attribute embeddings… ▽ More During the last decade, entity embeddings have become ubiquitous in Artificial Intelligence. Such embeddings essentially serve as compact but semantically meaningful representations of the entities of interest. In most approaches, vectors are used for representing the entities themselves, as well as for representing their associated attributes. An important advantage of using attribute embeddings is that (some of the) semantic dependencies between the attributes can thus be captured. However, little is known about what kinds of semantic dependencies can be modelled in this way. The aim of this paper is to shed light on this question, focusing on settings where the embedding of an entity is obtained by pooling the embeddings of its known attributes. Our particular focus is on studying the theoretical limitations of different embedding strategies, rather than their ability to effectively learn attribute dependencies in practice. We first show a number of negative results, revealing that some of the most popular embedding models are not able to capture even basic Horn rules. However, we also find that some embedding strategies are capable, in principle, of modelling both monotonic and non-monotonic attribute dependencies. △ Less

Submitted 14 September, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Accepted for AKBC 2021

arXiv:2106.07947 [pdf, other]

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Authors: Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert

Abstract: One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows t… ▽ More One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.07285 [pdf, other]

Probing Pre-Trained Language Models for Disease Knowledge

Authors: Israa Alghanmi, Luis Espinosa-Anke, Steven Schockaert

Abstract: Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as map** symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To… ▽ More Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as map** symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: Accepted by ACL 2021 Findings

arXiv:2105.10195 [pdf, other]

Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning

Authors: Kun Yan, Zied Bouraoui, ** Wang, Shoaib Jameel, Steven Schockaert

Abstract: Few-shot learning (FSL) is the task of learning to recognize previously unseen categories of images from a small number of training examples. This is a challenging task, as the available examples may not be enough to unambiguously determine which visual features are most characteristic of the considered categories. To alleviate this issue, we propose a method that additionally takes into account t… ▽ More Few-shot learning (FSL) is the task of learning to recognize previously unseen categories of images from a small number of training examples. This is a challenging task, as the available examples may not be enough to unambiguously determine which visual features are most characteristic of the considered categories. To alleviate this issue, we propose a method that additionally takes into account the names of the image classes. While the use of class names has already been explored in previous work, our approach differs in two key aspects. First, while previous work has aimed to directly predict visual prototypes from word embeddings, we found that better results can be obtained by treating visual and text-based prototypes separately. Second, we propose a simple strategy for learning class name embeddings using the BERT language model, which we found to substantially outperform the GloVe vectors that were used in previous work. We furthermore propose a strategy for dealing with the high dimensionality of these vectors, inspired by models for aligning cross-lingual word embeddings. We provide experiments on miniImageNet, CUB and tieredImageNet, showing that our approach consistently improves the state-of-the-art in metric-based FSL. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted by ICMR2021

arXiv:2105.04949 [pdf, other]

BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Authors: Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, Jose Camacho-Collados

Abstract: Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we an… ▽ More Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations. △ Less

Submitted 9 September, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

Comments: Accepted by ACL 2021 main conference

arXiv:2105.04620 [pdf, ps, other]

A Description Logic for Analogical Reasoning

Authors: Steven Schockaert, Yazmín Ibáñez-García, Víctor Gutiérrez-Basulto

Abstract: Ontologies formalise how the concepts from a given domain are interrelated. Despite their clear potential as a backbone for explainable AI, existing ontologies tend to be highly incomplete, which acts as a significant barrier to their more widespread adoption. To mitigate this issue, we present a mechanism to infer plausible missing knowledge, which relies on reasoning by analogy. To the best of o… ▽ More Ontologies formalise how the concepts from a given domain are interrelated. Despite their clear potential as a backbone for explainable AI, existing ontologies tend to be highly incomplete, which acts as a significant barrier to their more widespread adoption. To mitigate this issue, we present a mechanism to infer plausible missing knowledge, which relies on reasoning by analogy. To the best of our knowledge, this is the first paper that studies analogical reasoning within the setting of description logic ontologies. After showing that the standard formalisation of analogical proportion has important limitations in this setting, we introduce an alternative semantics based on bijective map**s between sets of features. We then analyse the properties of analogies under the proposed semantics, and show among others how it enables two plausible inference patterns: rule translation and rule extrapolation. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: Accepted for IJCAI 2021

arXiv:2102.00801 [pdf, ps, other]

Few-shot Image Classification with Multi-Facet Prototypes

Authors: Kun Yan, Zied Bouraoui, ** Wang, Shoaib Jameel, Steven Schockaert

Abstract: The aim of few-shot learning (FSL) is to learn how to recognize image categories from a small number of training examples. A central challenge is that the available training examples are normally insufficient to determine which visual features are most characteristic of the considered categories. To address this challenge, we organize these visual features into facets, which intuitively group feat… ▽ More The aim of few-shot learning (FSL) is to learn how to recognize image categories from a small number of training examples. A central challenge is that the available training examples are normally insufficient to determine which visual features are most characteristic of the considered categories. To address this challenge, we organize these visual features into facets, which intuitively group features of the same kind (e.g. features that are relevant to shape, color, or texture). This is motivated from the assumption that (i) the importance of each facet differs from category to category and (ii) it is possible to predict facet importance from a pre-trained embedding of the category names. In particular, we propose an adaptive similarity measure, relying on predicted facet importance weights for a given set of categories. This measure can be used in combination with a wide array of existing metric-based methods. Experiments on miniImageNet and CUB show that our approach improves the state-of-the-art in metric-based FSL. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: Accepted by ICASSP2021

arXiv:2012.07580 [pdf, other]

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

Authors: Na Li, Zied Bouraoui, Jose Camacho Collados, Luis Espinosa-Anke, Qing Gu, Steven Schockaert

Abstract: While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, such vectors continue to play an important role in tasks where words need to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors fo… ▽ More While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, such vectors continue to play an important role in tasks where words need to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the resulting vectors focus less on idiosyncratic properties and more on general semantic properties. Inspired by this view, we propose a filtering strategy which is aimed at removing the most idiosyncratic mention vectors, allowing us to obtain further performance gains in property induction. △ Less

Submitted 17 May, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.08320 [pdf, ps, other]

Don't Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities

Authors: Carla Pérez-Almendros, Luis Espinosa-Anke, Steven Schockaert

Abstract: In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of h… ▽ More In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of harmful language, in that it is generally used unconsciously and with good intentions. We furthermore believe that the often subtle nature of patronizing and condescending language (PCL) presents an interesting technical challenge for the NLP community. Our analysis of the proposed dataset shows that identifying PCL is hard for standard NLP models, with language models such as BERT achieving the best results. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2006.14437 [pdf, other]

Plausible Reasoning about EL-Ontologies using Concept Interpolation

Authors: Yazmín Ibáñez-García, Víctor Gutiérrez-Basulto, Steven Schockaert

Abstract: Description logics (DLs) are standard knowledge representation languages for modelling ontologies, i.e. knowledge about concepts and the relations between them. Unfortunately, DL ontologies are difficult to learn from data and time-consuming to encode manually. As a result, ontologies for broad domains are almost inevitably incomplete. In recent years, several data-driven approaches have been prop… ▽ More Description logics (DLs) are standard knowledge representation languages for modelling ontologies, i.e. knowledge about concepts and the relations between them. Unfortunately, DL ontologies are difficult to learn from data and time-consuming to encode manually. As a result, ontologies for broad domains are almost inevitably incomplete. In recent years, several data-driven approaches have been proposed for automatically extending such ontologies. One family of methods rely on characterizations of concepts that are derived from text descriptions. While such characterizations do not capture ontological knowledge directly, they encode information about the similarity between different concepts, which can be exploited for filling in the gaps in existing ontologies. To this end, several inductive inference mechanisms have already been proposed, but these have been defined and used in a heuristic fashion. In this paper, we instead propose an inductive inference mechanism which is based on a clear model-theoretic semantics, and can thus be tightly integrated with standard deductive reasoning. We particularly focus on interpolation, a powerful commonsense reasoning mechanism which is closely related to cognitive models of category-based induction. Apart from the formalization of the underlying semantics, as our main technical contribution we provide computational complexity bounds for reasoning in EL with this interpolation mechanism. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 16 pages, 3 figures, accepted at KR 2020

arXiv:1912.06612 [pdf, ps, other]

From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)

Authors: Zied Bouraoui, Antoine Cornuéjols, Thierry Denœux, Sébastien Destercke, Didier Dubois, Romain Guillaume, João Marques-Silva, Jérôme Mengin, Henri Prade, Steven Schockaert, Mathieu Serrurier, Christel Vrain

Abstract: This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been develo** quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or th… ▽ More This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been develo** quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or the need for explanations and causal understanding. Then some methodologies combining reasoning and learning are reviewed (such as inductive logic programming, neuro-symbolic reasoning, formal concept analysis, rule-based representations and ML, uncertainty in ML, or case-based reasoning and analogical reasoning), before discussing examples of synergies between KRR and ML (including topics such as belief functions on regression, EM algorithm versus revision, the semantic description of vector representations, the combination of deep learning with high level inference, knowledge graph completion, declarative frameworks for data mining, or preferences and recommendation). This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate. △ Less

Submitted 13 December, 2019; originally announced December 2019.

Comments: 53 pages

arXiv:1912.01220 [pdf, other]

Modelling Semantic Categories using Conceptual Neighborhood

Authors: Zied Bouraoui, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert

Abstract: While many methods for learning vector space embeddings have been proposed in the field of Natural Language Processing, these methods typically do not distinguish between categories and individuals. Intuitively, if individuals are represented as vectors, we can think of categories as (soft) regions in the embedding space. Unfortunately, meaningful regions can be difficult to estimate, especially s… ▽ More While many methods for learning vector space embeddings have been proposed in the field of Natural Language Processing, these methods typically do not distinguish between categories and individuals. Intuitively, if individuals are represented as vectors, we can think of categories as (soft) regions in the embedding space. Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category. To address this issue, we rely on the fact that different categories are often highly interdependent. In particular, categories often have conceptual neighbors, which are disjoint from but closely related to the given category (e.g.\ fruit and vegetable). Our hypothesis is that more accurate category representations can be learned by relying on the assumption that the regions representing such conceptual neighbors should be adjacent in the embedding space. We propose a simple method for identifying conceptual neighbors and then show that incorporating these conceptual neighbors indeed leads to more accurate region based representations. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: Accepted to AAAI 2020

arXiv:1911.12753 [pdf, ps, other]

Inducing Relational Knowledge from BERT

Authors: Zied Bouraoui, Jose Camacho-Collados, Steven Schockaert

Abstract: One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured b… ▽ More One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input. △ Less

Submitted 28 November, 2019; originally announced November 2019.

Comments: Accepted to AAAI 2020

arXiv:1910.07221 [pdf, other]

Meemi: A Simple Method for Post-processing and Integrating Cross-lingual Word Embeddings

Authors: Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert

Abstract: Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddi… ▽ More Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces as well as the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference. △ Less

Submitted 11 November, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: 22 pages, 2 figures, 9 tables. Preprint submitted to Natural Language Engineering

MSC Class: 68T50

arXiv:1909.06414 [pdf, other]

Learning Household Task Knowledge from WikiHow Descriptions

Authors: Yilun Zhou, Julie A. Shah, Steven Schockaert

Abstract: Commonsense procedural knowledge is important for AI agents and robots that operate in a human environment. While previous attempts at constructing procedural knowledge are mostly rule- and template-based, recent advances in deep learning provide the possibility of acquiring such knowledge directly from natural language sources. As a first step in this direction, we propose a model to learn embedd… ▽ More Commonsense procedural knowledge is important for AI agents and robots that operate in a human environment. While previous attempts at constructing procedural knowledge are mostly rule- and template-based, recent advances in deep learning provide the possibility of acquiring such knowledge directly from natural language sources. As a first step in this direction, we propose a model to learn embeddings for tasks, as well as the individual steps that need to be taken to solve them, based on WikiHow articles. We learn these embeddings such that they are predictive of both step relevance and step ordering. We also experiment with the use of integer programming for inferring consistent global step orderings from noisy pairwise predictions. △ Less

Submitted 13 September, 2019; originally announced September 2019.

Comments: IJCAI 2019 Workshop on Semantic Deep Learning

arXiv:1908.07742 [pdf, other]

On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning

Authors: Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert

Abstract: Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language. Recent developments which construct these embeddings by aligning monolingual spaces have shown that accurate alignments can be obtained with little or no supervision. However, the focus has been on a particular con… ▽ More Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language. Recent developments which construct these embeddings by aligning monolingual spaces have shown that accurate alignments can be obtained with little or no supervision. However, the focus has been on a particular controlled scenario for evaluation, and there is no strong evidence on how current state-of-the-art systems would fare with noisy text or for language pairs with major linguistic differences. In this paper we present an extensive evaluation over multiple cross-lingual embedding models, analyzing their strengths and limitations with respect to different variables such as target language, training corpora and amount of supervision. Our conclusions put in doubt the view that high-quality cross-lingual embeddings can always be learned without much supervision. △ Less

Submitted 3 March, 2020; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: 11 pages, 2 figures, 7 tables. Camera-ready submitted to LREC 2020

arXiv:1906.01373 [pdf, other]

Relational Word Embeddings

Authors: Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert

Abstract: While word embeddings have been shown to implicitly encode various forms of attributional knowledge, the extent to which they capture relational information is far more limited. In previous work, this limitation has been addressed by incorporating relational knowledge from external knowledge bases when learning the word embedding. Such strategies may not be optimal, however, as they are limited by… ▽ More While word embeddings have been shown to implicitly encode various forms of attributional knowledge, the extent to which they capture relational information is far more limited. In previous work, this limitation has been addressed by incorporating relational knowledge from external knowledge bases when learning the word embedding. Such strategies may not be optimal, however, as they are limited by the coverage of available resources and conflate similarity with other forms of relatedness. As an alternative, in this paper we propose to encode relational knowledge in a separate word embedding, which is aimed to be complementary to a given standard word embedding. This relational word embedding is still learned from co-occurrence statistics, and can thus be used even when no external knowledge base is available. Our analysis shows that relational word vectors do indeed capture information that is complementary to what is encoded in standard word embeddings. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: To appear at ACL 2019. 11 pages

arXiv:1905.07358 [pdf, other]

Learning Cross-lingual Embeddings from Twitter via Distant Supervision

Authors: Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert

Abstract: Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprising… ▽ More Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprisingly neglected in the literature: leveraging noisy user-generated text to learn cross-lingual embeddings particularly tailored towards social media applications. While the noisiness and informal nature of the social media genre poses additional challenges to cross-lingual embedding methods, we find that it also provides key opportunities due to the abundance of code-switching and the existence of a shared vocabulary of emoji and named entities. Our contribution consists of a very simple post-processing step that exploits these phenomena to significantly improve the performance of state-of-the-art alignment methods. △ Less

Submitted 31 March, 2020; v1 submitted 17 May, 2019; originally announced May 2019.

Comments: Accepted to ICWSM 2020. 11 pages, 1 appendix. Pre-trained embeddings available at https://github.com/pedrada88/crossembeddings-twitter

arXiv:1902.07831 [pdf, other]

doi 10.1145/3308558.3313486

Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

Authors: Yilun Zhou, Steven Schockaert, Julie A. Shah

Abstract: In many applications, it is important to characterize the way in which two concepts are semantically related. Knowledge graphs such as ConceptNet provide a rich source of information for such characterizations by encoding relations between concepts as edges in a graph. When two concepts are not directly connected by an edge, their relationship can still be described in terms of the paths that conn… ▽ More In many applications, it is important to characterize the way in which two concepts are semantically related. Knowledge graphs such as ConceptNet provide a rich source of information for such characterizations by encoding relations between concepts as edges in a graph. When two concepts are not directly connected by an edge, their relationship can still be described in terms of the paths that connect them. Unfortunately, many of these paths are uninformative and noisy, which means that the success of applications that use such path features crucially relies on their ability to select high-quality paths. In existing applications, this path selection process is based on relatively simple heuristics. In this paper we instead propose to learn to predict path quality from crowdsourced human assessments. Since we are interested in a generic task-independent notion of quality, we simply ask human participants to rank paths according to their subjective assessment of the paths' naturalness, without attempting to define naturalness or steering the participants towards particular indicators of quality. We show that a neural network model trained on these assessments is able to predict human judgments on unseen paths with near optimal performance. Most notably, we find that the resulting path selection method is substantially better than the current heuristic approaches at identifying meaningful paths. △ Less

Submitted 20 February, 2019; originally announced February 2019.

Comments: In Proceedings of the Web Conference (WWW) 2019

arXiv:1810.12091 [pdf, other]

Embedding Geographic Locations for Modelling the Natural Environment using Flickr Tags and Structured Data

Authors: Shelan S. Jeawak, Christopher B. Jones, Steven Schockaert

Abstract: Meta-data from photo-sharing websites such as Flickr can be used to obtain rich bag-of-words descriptions of geographic locations, which have proven valuable, among others, for modelling and predicting ecological features. One important insight from previous work is that the descriptions obtained from Flickr tend to be complementary to the structured information that is available from traditional… ▽ More Meta-data from photo-sharing websites such as Flickr can be used to obtain rich bag-of-words descriptions of geographic locations, which have proven valuable, among others, for modelling and predicting ecological features. One important insight from previous work is that the descriptions obtained from Flickr tend to be complementary to the structured information that is available from traditional scientific resources. To better integrate these two diverse sources of information, in this paper we consider a method for learning vector space embeddings of geographic locations. We show experimentally that this method improves on existing approaches, especially in cases where structured information is available. △ Less

Submitted 12 October, 2018; originally announced October 2018.

arXiv:1808.08780 [pdf, other]

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

Authors: Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert

Abstract: Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignmen… ▽ More Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks. △ Less

Submitted 27 August, 2018; originally announced August 2018.

Comments: 11 pages, 4 tables, 1 figure. EMNLP 2018 camera-ready

arXiv:1808.06068 [pdf, ps, other]

SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors

Authors: Luis Espinosa-Anke, Steven Schockaert

Abstract: We present SeVeN (Semantic Vector Networks), a hybrid resource that encodes relationships between words in the form of a graph. Different from traditional semantic networks, these relations are represented as vectors in a continuous vector space. We propose a simple pipeline for learning such relation vectors, which is based on word vector averaging in combination with an ad hoc autoencoder. We sh… ▽ More We present SeVeN (Semantic Vector Networks), a hybrid resource that encodes relationships between words in the form of a graph. Different from traditional semantic networks, these relations are represented as vectors in a continuous vector space. We propose a simple pipeline for learning such relation vectors, which is based on word vector averaging in combination with an ad hoc autoencoder. We show that by explicitly encoding relational information in a dedicated vector space we can capture aspects of word meaning that are complementary to what is captured by word embeddings. For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences. Finally, we test the effectiveness of semantic vector networks in two tasks: measuring word similarity and neural text categorization. SeVeN is available at bitbucket.org/luisespinosa/seven. △ Less

Submitted 18 August, 2018; originally announced August 2018.

arXiv:1805.10461 [pdf, other]

From Knowledge Graph Embedding to Ontology Embedding? An Analysis of the Compatibility between Vector Space Representations and Rules

Authors: Víctor Gutiérrez-Basulto, Steven Schockaert

Abstract: Recent years have witnessed the successful application of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. However, it is not yet well-understood to what extent ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a general fram… ▽ More Recent years have witnessed the successful application of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. However, it is not yet well-understood to what extent ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a general framework based on a view of relations as regions, which allows us to study the compatibility between ontological knowledge and different types of vector space embeddings. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding methods are not capable of modelling even very simple types of rules, which in particular also means that they are not able to learn the type of dependencies captured by such rules. Second, we study a model in which relations are modelled as convex regions. We show particular that ontologies which are expressed using so-called quasi-chained existential rules can be exactly represented using convex regions, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology. △ Less

Submitted 21 August, 2018; v1 submitted 26 May, 2018; originally announced May 2018.

Comments: Full version of a paper accepted at KR-2018

arXiv:1805.01276 [pdf, other]

Learning Conceptual Space Representations of Interrelated Concepts

Authors: Zied Bouraoui, Steven Schockaert

Abstract: Several recently proposed methods aim to learn conceptual space representations from large text collections. These learned representations asso- ciate each object from a given domain of interest with a point in a high-dimensional Euclidean space, but they do not model the concepts from this do- main, and can thus not directly be used for catego- rization and related cognitive tasks. A natural solu… ▽ More Several recently proposed methods aim to learn conceptual space representations from large text collections. These learned representations asso- ciate each object from a given domain of interest with a point in a high-dimensional Euclidean space, but they do not model the concepts from this do- main, and can thus not directly be used for catego- rization and related cognitive tasks. A natural solu- tion is to represent concepts as Gaussians, learned from the representations of their instances, but this can only be reliably done if sufficiently many in- stances are given, which is often not the case. In this paper, we introduce a Bayesian model which addresses this problem by constructing informative priors from background knowledge about how the concepts of interest are interrelated with each other. We show that this leads to substantially better pre- dictions in a knowledge base completion task. △ Less

Submitted 4 May, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

arXiv:1804.06188 [pdf, ps, other]

VC-Dimension Based Generalization Bounds for Relational Learning

Authors: Ondrej Kuzelka, Yuyi Wang, Steven Schockaert

Abstract: In many applications of relational learning, the available data can be seen as a sample from a larger relational structure (e.g. we may be given a small fragment from some social network). In this paper we are particularly concerned with scenarios in which we can assume that (i) the domain elements appearing in the given sample have been uniformly sampled without replacement from the (unknown) ful… ▽ More In many applications of relational learning, the available data can be seen as a sample from a larger relational structure (e.g. we may be given a small fragment from some social network). In this paper we are particularly concerned with scenarios in which we can assume that (i) the domain elements appearing in the given sample have been uniformly sampled without replacement from the (unknown) full domain and (ii) the sample is complete for these domain elements (i.e. it is the full substructure induced by these elements). Within this setting, we study bounds on the error of sufficient statistics of relational models that are estimated on the available data. As our main result, we prove a bound based on a variant of the Vapnik-Chervonenkis dimension which is suitable for relational data. △ Less

Submitted 4 July, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

Comments: Longer version of paper accepted at ECML PKDD 2018

arXiv:1803.05768 [pdf, ps, other]

PAC-Reasoning in Relational Domains

Authors: Ondrej Kuzelka, Yuyi Wang, Jesse Davis, Steven Schockaert

Abstract: We consider the problem of predicting plausible missing facts in relational data, given a set of imperfect logical rules. In particular, our aim is to provide bounds on the (expected) number of incorrect inferences that are made in this way. Since for classical inference it is in general impossible to bound this number in a non-trivial way, we consider two inference relations that weaken, but rema… ▽ More We consider the problem of predicting plausible missing facts in relational data, given a set of imperfect logical rules. In particular, our aim is to provide bounds on the (expected) number of incorrect inferences that are made in this way. Since for classical inference it is in general impossible to bound this number in a non-trivial way, we consider two inference relations that weaken, but remain close in spirit to classical inference. △ Less

Submitted 4 July, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: Longer version of paper appearing in UAI 2018

arXiv:1711.05294 [pdf, other]

Modeling Semantic Relatedness using Global Relation Vectors

Authors: Shoaib Jameel, Zied Bouraoui, Steven Schockaert

Abstract: Word embedding models such as GloVe rely on co-occurrence statistics from a large corpus to learn vector representations of word meaning. These vectors have proven to capture surprisingly fine-grained semantic and syntactic information. While we may similarly expect that co-occurrence statistics can be used to capture rich information about the relationships between different words, existing appro… ▽ More Word embedding models such as GloVe rely on co-occurrence statistics from a large corpus to learn vector representations of word meaning. These vectors have proven to capture surprisingly fine-grained semantic and syntactic information. While we may similarly expect that co-occurrence statistics can be used to capture rich information about the relationships between different words, existing approaches for modeling such relationships have mostly relied on manipulating pre-trained word vectors. In this paper, we introduce a novel method which directly learns relation vectors from co-occurrence statistics. To this end, we first introduce a variant of GloVe, in which there is an explicit connection between word vectors and PMI weighted co-occurrence vectors. We then show how relation vectors can be naturally embedded into the resulting vector space. △ Less

Submitted 14 November, 2017; originally announced November 2017.

arXiv:1710.02221 [pdf, other]

Stacked Structure Learning for Lifted Relational Neural Networks

Authors: Gustav Sourek, Martin Svatos, Filip Zelezny, Steven Schockaert, Ondrej Kuzelka

Abstract: Lifted Relational Neural Networks (LRNNs) describe relational domains using weighted first-order rules which act as templates for constructing feed-forward neural networks. While previous work has shown that using LRNNs can lead to state-of-the-art results in various ILP tasks, these results depended on hand-crafted rules. In this paper, we extend the framework of LRNNs with structure learning, th… ▽ More Lifted Relational Neural Networks (LRNNs) describe relational domains using weighted first-order rules which act as templates for constructing feed-forward neural networks. While previous work has shown that using LRNNs can lead to state-of-the-art results in various ILP tasks, these results depended on hand-crafted rules. In this paper, we extend the framework of LRNNs with structure learning, thus enabling a fully automated learning process. Similarly to many ILP methods, our structure learning algorithm proceeds in an iterative fashion by top-down searching through the hypothesis space of all possible Horn clauses, considering the predicates that occur in the training examples as well as invented soft concepts entailed by the best weighted rules found so far. In the experiments, we demonstrate the ability to automatically induce useful hierarchical soft concepts leading to deep LRNNs with a competitive predictive power. △ Less

Submitted 5 October, 2017; originally announced October 2017.

Comments: Presented at ILP 2017

arXiv:1709.05825 [pdf, ps, other]

Relational Marginal Problems: Theory and Estimation

Authors: Ondrej Kuzelka, Yuyi Wang, Jesse Davis, Steven Schockaert

Abstract: In the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the par… ▽ More In the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the parameters of relational models, which generalizes a well-known duality from the propositional setting. Third, by exploiting the relational marginal formulation, we present a statistically sound method to learn the parameters of relational models that will be applied in settings where the number of constants differs between the training and test data. Furthermore, based on a relational generalization of marginal polytopes, we characterize cases where the standard estimators based on feature's number of true groundings needs to be adjusted and we quantitatively characterize the consequences of these adjustments. Fourth, we prove bounds on expected errors of the estimated parameters, which allows us to lower-bound, among other things, the effective sample size of relational training data. △ Less

Submitted 25 April, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: Long version of a paper that appeared in AAAI 2018; added a paragraph to Related Work

Showing 1–50 of 64 results for author: Schockaert, S