Search | arXiv e-print repository

arXiv:2404.01992 [pdf, other]

Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models

Authors: Stephan Linzbach, Dimitar Dimitrov, Laura Kallmeyer, Kilian Evang, Hajira Jabeen, Stefan Dietze

Abstract: Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance.… ▽ More Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance. Simultaneously, evaluating the impact of either prompt syntax or information is challenging due to their interdependence. We designed CONPARE-LAMA - a dedicated probe, consisting of 34 million distinct prompts that facilitate comparison across minimal paraphrases. These paraphrases follow a unified meta-template enabling the controlled variation of syntax and semantics across arbitrary relations. CONPARE-LAMA enables insights into the independent impact of either syntactical form or semantic information of paraphrases on the knowledge retrieval performance of PLMs. Extensive knowledge retrieval experiments using our probe reveal that prompts following clausal syntax have several desirable properties in comparison to appositive syntax: i) they are more useful when querying PLMs with a combination of supplementary information, ii) knowledge is more consistently recalled across different combinations of supplementary information, and iii) they decrease response uncertainty when retrieving known facts. In addition, range information can boost knowledge retrieval performance more than domain information, even though domain information is more reliably helpful across syntactic forms. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted for NAACL 2024

arXiv:2308.06374 [pdf, other]

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, Damien Graux

Abstract: Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and… ▽ More Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 30 pages

arXiv:2305.00238 [pdf, other]

The FAIRy Tale of Genetic Algorithms

Authors: Fahad Maqbool, Muhammad Saad Razzaq, Hajira Jabeen

Abstract: Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability o… ▽ More Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability of GA. In this paper, we have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable the reproducibility and reusability of algorithms. We have chosen GA as a usecase to the demonstrate the applicability of the proposed principles. Also we have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source. Additionally, to enable FAIR algorithms, we propose a vocabulary (i.e. $evo$) using light weight RDF format, facilitating the reproducibility. Given the stochastic nature of GAs, this work can be extended to numerous Optimization and machine learning algorithms/methods. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2001.11100 [pdf, other]

A Scalable Framework for Quality Assessment of RDF Datasets

Authors: Gezim Sejdiu, Anisa Rula, Jens Lehmann, Hajira Jabeen

Abstract: Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few appro… ▽ More Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine. In this paper, we present DistQualityAssessment -- an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines. This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data. The work presented here is integrated with the SANSA framework and has been applied to at least three use cases beyond the SANSA community. The results show that our approach is more generic, efficient, and scalable as compared to previously proposed approaches. △ Less

Submitted 29 January, 2020; originally announced January 2020.

Comments: International Semantic Web Conference (ISWC 2019)

arXiv:2001.10560 [pdf, other]

The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability

Authors: Mehdi Ali, Hajira Jabeen, Charles Tapley Hoyt, Jens Lehman

Abstract: There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can… ▽ More There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge EmbeddiNgs), BioKEEN (Biological KnowlEdge EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:1308.2797 [pdf]

doi 10.5121/ijcnc.2013.5407

Improved Quality of Service Protocol for Real Time Traffic in MANET

Authors: Iftikhar Ahmad, Humaira Jabeen, Faisal Riaz

Abstract: The technologies like Wi-Fi, Blue tooth, WiMax etc. have made Mobile Ad hoc Networks common in our Real life. Multi-media applications need to be supported on MANET. A certain level of QoS (Quality of Service) support is essential for Real time data. Our proposed protocol provides the required QoS without having negative impact on Best Effort data traffic. An efficient rout discovery mechanism for… ▽ More The technologies like Wi-Fi, Blue tooth, WiMax etc. have made Mobile Ad hoc Networks common in our Real life. Multi-media applications need to be supported on MANET. A certain level of QoS (Quality of Service) support is essential for Real time data. Our proposed protocol provides the required QoS without having negative impact on Best Effort data traffic. An efficient rout discovery mechanism for AODV routing protocol as well as transmission technique for real time data are proposed. This technique gives more transmission opportunities to real time data traffic results in decreasing transmission delay and increasing throughput. A modified version of the popular AODV routing protocol to provide QoS guarantee for real time traffic in MANETs is proposed. The simulation shows better performance results for proposed protocol over the basic AODV. △ Less

Submitted 13 August, 2013; originally announced August 2013.

Comments: 13 pages

Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.4, July 2013

Showing 1–6 of 6 results for author: Jabeen, H