-
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models
Authors:
Stephan Linzbach,
Dimitar Dimitrov,
Laura Kallmeyer,
Kilian Evang,
Hajira Jabeen,
Stefan Dietze
Abstract:
Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance.…
▽ More
Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance. Simultaneously, evaluating the impact of either prompt syntax or information is challenging due to their interdependence. We designed CONPARE-LAMA - a dedicated probe, consisting of 34 million distinct prompts that facilitate comparison across minimal paraphrases. These paraphrases follow a unified meta-template enabling the controlled variation of syntax and semantics across arbitrary relations. CONPARE-LAMA enables insights into the independent impact of either syntactical form or semantic information of paraphrases on the knowledge retrieval performance of PLMs. Extensive knowledge retrieval experiments using our probe reveal that prompts following clausal syntax have several desirable properties in comparison to appositive syntax: i) they are more useful when querying PLMs with a combination of supplementary information, ii) knowledge is more consistently recalled across different combinations of supplementary information, and iii) they decrease response uncertainty when retrieving known facts. In addition, range information can boost knowledge retrieval performance more than domain information, even though domain information is more reliably helpful across syntactic forms.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Authors:
Jeff Z. Pan,
Simon Razniewski,
Jan-Christoph Kalo,
Sneha Singhania,
Jiaoyan Chen,
Stefan Dietze,
Hajira Jabeen,
Janna Omeliyanenko,
Wen Zhang,
Matteo Lissandrini,
Russa Biswas,
Gerard de Melo,
Angela Bonifati,
Edlira Vakaj,
Mauro Dragoni,
Damien Graux
Abstract:
Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and…
▽ More
Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
The FAIRy Tale of Genetic Algorithms
Authors:
Fahad Maqbool,
Muhammad Saad Razzaq,
Hajira Jabeen
Abstract:
Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability o…
▽ More
Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability of GA. In this paper, we have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable the reproducibility and reusability of algorithms. We have chosen GA as a usecase to the demonstrate the applicability of the proposed principles. Also we have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source. Additionally, to enable FAIR algorithms, we propose a vocabulary (i.e. $evo$) using light weight RDF format, facilitating the reproducibility. Given the stochastic nature of GAs, this work can be extended to numerous Optimization and machine learning algorithms/methods.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
A Scalable Framework for Quality Assessment of RDF Datasets
Authors:
Gezim Sejdiu,
Anisa Rula,
Jens Lehmann,
Hajira Jabeen
Abstract:
Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few appro…
▽ More
Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine. In this paper, we present DistQualityAssessment -- an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines. This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data. The work presented here is integrated with the SANSA framework and has been applied to at least three use cases beyond the SANSA community. The results show that our approach is more generic, efficient, and scalable as compared to previously proposed approaches.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability
Authors:
Mehdi Ali,
Hajira Jabeen,
Charles Tapley Hoyt,
Jens Lehman
Abstract:
There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can…
▽ More
There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge EmbeddiNgs), BioKEEN (Biological KnowlEdge EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Improved Quality of Service Protocol for Real Time Traffic in MANET
Authors:
Iftikhar Ahmad,
Humaira Jabeen,
Faisal Riaz
Abstract:
The technologies like Wi-Fi, Blue tooth, WiMax etc. have made Mobile Ad hoc Networks common in our Real life. Multi-media applications need to be supported on MANET. A certain level of QoS (Quality of Service) support is essential for Real time data. Our proposed protocol provides the required QoS without having negative impact on Best Effort data traffic. An efficient rout discovery mechanism for…
▽ More
The technologies like Wi-Fi, Blue tooth, WiMax etc. have made Mobile Ad hoc Networks common in our Real life. Multi-media applications need to be supported on MANET. A certain level of QoS (Quality of Service) support is essential for Real time data. Our proposed protocol provides the required QoS without having negative impact on Best Effort data traffic. An efficient rout discovery mechanism for AODV routing protocol as well as transmission technique for real time data are proposed. This technique gives more transmission opportunities to real time data traffic results in decreasing transmission delay and increasing throughput. A modified version of the popular AODV routing protocol to provide QoS guarantee for real time traffic in MANETs is proposed. The simulation shows better performance results for proposed protocol over the basic AODV.
△ Less
Submitted 13 August, 2013;
originally announced August 2013.