-
Relation Extraction with Fine-Tuned Large Language Models in Retrieval Augmented Generation Frameworks
Authors:
Sefika Efeoglu,
Adrian Paschke
Abstract:
Information Extraction (IE) is crucial for converting unstructured data into structured formats like Knowledge Graphs (KGs). A key task within IE is Relation Extraction (RE), which identifies relationships between entities in text. Various RE methods exist, including supervised, unsupervised, weakly supervised, and rule-based approaches. Recent studies leveraging pre-trained language models (PLMs)…
▽ More
Information Extraction (IE) is crucial for converting unstructured data into structured formats like Knowledge Graphs (KGs). A key task within IE is Relation Extraction (RE), which identifies relationships between entities in text. Various RE methods exist, including supervised, unsupervised, weakly supervised, and rule-based approaches. Recent studies leveraging pre-trained language models (PLMs) have shown significant success in this area. In the current era dominated by Large Language Models (LLMs), fine-tuning these models can overcome limitations associated with zero-shot LLM prompting-based RE methods, especially regarding domain adaptation challenges and identifying implicit relations between entities in sentences. These implicit relations, which cannot be easily extracted from a sentence's dependency tree, require logical inference for accurate identification. This work explores the performance of fine-tuned LLMs and their integration into the Retrieval Augmented-based (RAG) RE approach to address the challenges of identifying implicit relations at the sentence level, particularly when LLMs act as generators within the RAG framework. Empirical evaluations on the TACRED, TACRED-Revisited (TACREV), Re-TACRED, and SemEVAL datasets show significant performance improvements with fine-tuned LLMs, including Llama2-7B, Mistral-7B, and T5 (Large). Notably, our approach achieves substantial gains on SemEVAL, where implicit relations are common, surpassing previous results on this dataset. Additionally, our method outperforms previous works on TACRED, TACREV, and Re-TACRED, demonstrating exceptional performance across diverse evaluation scenarios.
△ Less
Submitted 24 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Quantum Architecture Search: A Survey
Authors:
Darya Martyniuk,
Johannes Jung,
Adrian Paschke
Abstract:
Quantum computing has made significant progress in recent years, attracting immense interest not only in research laboratories but also in various industries. However, the application of quantum computing to solve real-world problems is still hampered by a number of challenges, including hardware limitations and a relatively under-explored landscape of quantum algorithms, especially when compared…
▽ More
Quantum computing has made significant progress in recent years, attracting immense interest not only in research laboratories but also in various industries. However, the application of quantum computing to solve real-world problems is still hampered by a number of challenges, including hardware limitations and a relatively under-explored landscape of quantum algorithms, especially when compared to the extensive development of classical computing. The design of quantum circuits, in particular parameterized quantum circuits (PQCs), which contain learnable parameters optimized by classical methods, is a non-trivial and time-consuming task requiring expert knowledge. As a result, research on the automated generation of PQCs, known as quantum architecture search (QAS), has gained considerable interest. QAS focuses on the use of machine learning and optimization-driven techniques to generate PQCs tailored to specific problems and characteristics of quantum hardware. In this paper, we provide an overview of QAS methods by examining relevant research studies in the field. We discuss main challenges in designing and performing an automated search for an optimal PQC, and survey ways to address them to ease future research.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
TR2MTL: LLM based framework for Metric Temporal Logic Formalization of Traffic Rules
Authors:
Kumar Manas,
Stefan Zwicklbauer,
Adrian Paschke
Abstract:
Traffic rules formalization is crucial for verifying the compliance and safety of autonomous vehicles (AVs). However, manual translation of natural language traffic rules as formal specification requires domain knowledge and logic expertise, which limits its adaptation. This paper introduces TR2MTL, a framework that employs large language models (LLMs) to automatically translate traffic rules (TR)…
▽ More
Traffic rules formalization is crucial for verifying the compliance and safety of autonomous vehicles (AVs). However, manual translation of natural language traffic rules as formal specification requires domain knowledge and logic expertise, which limits its adaptation. This paper introduces TR2MTL, a framework that employs large language models (LLMs) to automatically translate traffic rules (TR) into metric temporal logic (MTL). It is envisioned as a human-in-loop system for AV rule formalization. It utilizes a chain-of-thought in-context learning approach to guide the LLM in step-by-step translation and generating valid and grammatically correct MTL formulas. It can be extended to various forms of temporal logic and rules. We evaluated the framework on a challenging dataset of traffic rules we created from various sources and compared it against LLMs using different in-context learning methods. Results show that TR2MTL is domain-agnostic, achieving high accuracy and generalization capability even with a small dataset. Moreover, the method effectively predicts formulas with varying degrees of logical and semantic structure in unstructured traffic rules.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Fine-Grained Named Entities for Corona News
Authors:
Sefika Efeoglu,
Adrian Paschke
Abstract:
Information resources such as newspapers have produced unstructured text data in various languages related to the corona outbreak since December 2019. Analyzing these unstructured texts is time-consuming without representing them in a structured format; therefore, representing them in a structured format is crucial. An information extraction pipeline with essential tasks -- named entity tagging an…
▽ More
Information resources such as newspapers have produced unstructured text data in various languages related to the corona outbreak since December 2019. Analyzing these unstructured texts is time-consuming without representing them in a structured format; therefore, representing them in a structured format is crucial. An information extraction pipeline with essential tasks -- named entity tagging and relation extraction -- to accomplish this goal might be applied to these texts. This study proposes a data annotation pipeline to generate training data from corona news articles, including generic and domain-specific entities. Named entity recognition models are trained on this annotated corpus and then evaluated on test sentences manually annotated by domain experts evaluating the performance of a trained model. The code base and demonstration are available at https://github.com/sefeoglu/coronanews-ner.git.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Retrieval-Augmented Generation-based Relation Extraction
Authors:
Sefika Efeoglu,
Adrian Paschke
Abstract:
Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to…
▽ More
Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to labeled data and substantial computational resources. In addressing these challenges, Large Language Models (LLMs) emerge as promising solutions; however, they might return hallucinating responses due to their own training data. To overcome these limitations, Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) in this work is proposed, offering a pathway to enhance the performance of relation extraction tasks.
This work evaluated the effectiveness of our RAG4RE approach utilizing different LLMs. Through the utilization of established benchmarks, such as TACRED, TACREV, Re-TACRED, and SemEval RE datasets, our aim is to comprehensively evaluate the efficacy of our RAG4RE approach. In particularly, we leverage prominent LLMs including Flan T5, Llama2, and Mistral in our investigation. The results of our study demonstrate that our RAG4RE approach surpasses performance of traditional RE approaches based solely on LLMs, particularly evident in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Hateful Messages: A Conversational Data Set of Hate Speech produced by Adolescents on Discord
Authors:
Jan Fillies,
Silvio Peikert,
Adrian Paschke
Abstract:
With the rise of social media, a rise of hateful content can be observed. Even though the understanding and definitions of hate speech varies, platforms, communities, and legislature all acknowledge the problem. Therefore, adolescents are a new and active group of social media users. The majority of adolescents experience or witness online hate speech. Research in the field of automated hate speec…
▽ More
With the rise of social media, a rise of hateful content can be observed. Even though the understanding and definitions of hate speech varies, platforms, communities, and legislature all acknowledge the problem. Therefore, adolescents are a new and active group of social media users. The majority of adolescents experience or witness online hate speech. Research in the field of automated hate speech classification has been on the rise and focuses on aspects such as bias, generalizability, and performance. To increase generalizability and performance, it is important to understand biases within the data. This research addresses the bias of youth language within hate speech classification and contributes by providing a modern and anonymized hate speech youth language data set consisting of 88.395 annotated chat messages. The data set consists of publicly available online messages from the chat platform Discord. ~6,42% of the messages were classified by a self-developed annotation schema as hate speech. For 35.553 messages, the user profiles provided age annotations setting the average author age to under 20 years old.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis
Authors:
Ričards Marcinkevičs,
Patricia Reis Wolfertstetter,
Ugne Klimiene,
Kieran Chin-Cheong,
Alyssia Paschke,
Julia Zerres,
Markus Denzinger,
David Niederberger,
Sven Wellmann,
Ece Ozkan,
Christian Knorr,
Julia E. Vogt
Abstract:
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis…
▽ More
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.
△ Less
Submitted 24 November, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
ContCommRTD: A Distributed Content-based Misinformation-aware Community Detection System for Real-Time Disaster Reporting
Authors:
Elena-Simona Apostol,
Ciprian-Octavian Truică,
Adrian Paschke
Abstract:
Real-time social media data can provide useful information on evolving hazards. Alongside traditional methods of disaster detection, the integration of social media data can considerably enhance disaster management. In this paper, we investigate the problem of detecting geolocation-content communities on Twitter and propose a novel distributed system that provides in near real-time information on…
▽ More
Real-time social media data can provide useful information on evolving hazards. Alongside traditional methods of disaster detection, the integration of social media data can considerably enhance disaster management. In this paper, we investigate the problem of detecting geolocation-content communities on Twitter and propose a novel distributed system that provides in near real-time information on hazard-related events and their evolution. We show that content-based community analysis leads to better and faster dissemination of reports on hazards. Our distributed disaster reporting system analyzes the social relationship among worldwide geolocated tweets, and applies topic modeling to group tweets by topics. Considering for each tweet the following information: user, timestamp, geolocation, retweets, and replies, we create a publisher-subscriber distribution model for topics. We use content similarity and the proximity of nodes to create a new model for geolocation-content based communities. Users can subscribe to different topics in specific geographical areas or worldwide and receive real-time reports regarding these topics. As misinformation can lead to increase damage if propagated in hazards related tweets, we propose a new deep learning model to detect fake news. The misinformed tweets are then removed from display. We also show empirically the scalability capabilities of the proposed system.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
EDSA-Ensemble: an Event Detection Sentiment Analysis Ensemble Architecture
Authors:
Alexandru Petrescu,
Ciprian-Octavian Truică,
Elena-Simona Apostol,
Adrian Paschke
Abstract:
As global digitization continues to grow, technology becomes more affordable and easier to use, and social media platforms thrive, becoming the new means of spreading information and news. Communities are built around sharing and discussing current events. Within these communities, users are enabled to share their opinions about each event. Using Sentiment Analysis to understand the polarity of ea…
▽ More
As global digitization continues to grow, technology becomes more affordable and easier to use, and social media platforms thrive, becoming the new means of spreading information and news. Communities are built around sharing and discussing current events. Within these communities, users are enabled to share their opinions about each event. Using Sentiment Analysis to understand the polarity of each message belonging to an event, as well as the entire event, can help to better understand the general and individual feelings of significant trends and the dynamics on online social networks. In this context, we propose a new ensemble architecture, EDSA-Ensemble (Event Detection Sentiment Analysis Ensemble), that uses Event Detection and Sentiment Analysis to improve the detection of the polarity for current events from Social Media. For Event Detection, we use techniques based on Information Diffusion taking into account both the time span and the topics. To detect the polarity of each event, we preprocess the text and employ several Machine and Deep Learning models to create an ensemble model. The preprocessing step includes several word representation models, i.e., raw frequency, TFIDF, Word2Vec, and Transformers. The proposed EDSA-Ensemble architecture improves the event sentiment classification over the individual Machine and Deep Learning models.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Authors:
Julian Wörmann,
Daniel Bogdoll,
Christian Brunner,
Etienne Bührle,
Han Chen,
Evaristus Fuh Chuo,
Kostadin Cvejoski,
Ludger van Elst,
Philip Gottschall,
Stefan Griesche,
Christian Hellert,
Christian Hesels,
Sebastian Houben,
Tim Joseph,
Niklas Keil,
Johann Kelsch,
Mert Keser,
Hendrik Königshof,
Erwin Kraft,
Leonie Kreuser,
Kevin Krone,
Tobias Latka,
Denny Mattern,
Stefan Matthes,
Franz Motzkus
, et al. (27 additional authors not shown)
Abstract:
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical con…
▽ More
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
△ Less
Submitted 20 November, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Variational Quanvolutional Neural Networks with enhanced image encoding
Authors:
Denny Mattern,
Darya Martyniuk,
Henri Willems,
Fabian Bergmann,
Adrian Paschke
Abstract:
Image classification is an important task in various machine learning applications. In recent years, a number of classification methods based on quantum machine learning and different quantum image encoding techniques have been proposed. In this paper, we study the effect of three different quantum image encoding approaches on the performance of a convolution-inspired hybrid quantum-classical imag…
▽ More
Image classification is an important task in various machine learning applications. In recent years, a number of classification methods based on quantum machine learning and different quantum image encoding techniques have been proposed. In this paper, we study the effect of three different quantum image encoding approaches on the performance of a convolution-inspired hybrid quantum-classical image classification algorithm called quanvolutional neural network (QNN). We furthermore examine the effect of variational - i.e. trainable - quantum circuits on the classification results. Our experiments indicate that some image encodings are better suited for variational circuits. However, our experiments show as well that there is not one best image encoding, but that the choice of the encoding depends on the specific constraints of the application.
△ Less
Submitted 23 June, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data
Authors:
Malik Yousef,
Jamal Al Qundus,
Silvio Peikert,
Adrian Paschke
Abstract:
In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects top…
▽ More
In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
AI supported Topic Modeling using KNIME-Workflows
Authors:
Jamal Al Qundus,
Silvio Peikert,
Adrian Paschke
Abstract:
Topic modeling algorithms traditionally model topics as list of weighted terms. These topic models can be used effectively to classify texts or to support text mining tasks such as text summarization or fact extraction. The general procedure relies on statistical analysis of term frequencies. The focus of this work is on the implementation of the knowledge-based topic modelling services in a KNIME…
▽ More
Topic modeling algorithms traditionally model topics as list of weighted terms. These topic models can be used effectively to classify texts or to support text mining tasks such as text summarization or fact extraction. The general procedure relies on statistical analysis of term frequencies. The focus of this work is on the implementation of the knowledge-based topic modelling services in a KNIME workflow. A brief description and evaluation of the DBPedia-based enrichment approach and the comparative evaluation of enriched topic models will be outlined based on our previous work. DBpedia-Spotlight is used to identify entities in the input text and information from DBpedia is used to extend these entities. We provide a workflow developed in KNIME implementing this approach and perform a result comparison of topic modeling supported by knowledge base information to traditional LDA. This topic modeling approach allows semantic interpretation both by algorithms and by humans.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
ROC: An Ontology for Country Responses towards COVID-19
Authors:
Jamal Al Qundus,
Ralph Schäfermeier,
Naouel Karam,
Silvio Peikert,
Adrian Paschke
Abstract:
The ROC ontology for country responses to COVID-19 provides a model for collecting, linking and sharing data on the COVID-19 pandemic. It follows semantic standardization (W3C standards RDF, OWL, SPARQL) for the representation of concepts and creation of vocabularies. ROC focuses on country measures and enables the integration of data from heterogeneous data sources. The proposed ontology is inten…
▽ More
The ROC ontology for country responses to COVID-19 provides a model for collecting, linking and sharing data on the COVID-19 pandemic. It follows semantic standardization (W3C standards RDF, OWL, SPARQL) for the representation of concepts and creation of vocabularies. ROC focuses on country measures and enables the integration of data from heterogeneous data sources. The proposed ontology is intended to facilitate statistical analysis to study and evaluate the effectiveness and side effects of government responses to COVID-19 in different countries. The ontology contains data collected by OxCGRT from publicly available information. This data has been compiled from information provided by ECDC for most countries, as well as from various repositories used to collect data on COVID-19.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
QURATOR: Innovative Technologies for Content and Data Curation
Authors:
Georg Rehm,
Peter Bourgonje,
Stefanie Hegele,
Florian Kintzel,
Julián Moreno Schneider,
Malte Ostendorff,
Karolina Zaczynska,
Armin Berger,
Stefan Grill,
Sören Räuchle,
Jens Rauenbusch,
Lisa Rutenburg,
André Schmidt,
Mikka Wild,
Henry Hoffmann,
Julian Fink,
Sarah Schulz,
Jurica Seva,
Joachim Quantz,
Joachim Böttger,
Josefine Matthey,
Rolf Fricke,
Jan Thomsen,
Adrian Paschke,
Jamal Al Qundus
, et al. (15 additional authors not shown)
Abstract:
In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industr…
▽ More
In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industry and a broad range of expertise in AI, Machine Learning and Language Technologies, the QURATOR project, funded by the German Federal Ministry of Education and Research, develops a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content. The project's vision and ambition is to establish an ecosystem for content curation technologies that significantly pushes the current state of the art and transforms its region, the metropolitan area Berlin-Brandenburg, into a global centre of excellence for curation technologies.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Investigating the Effect of Attributes on User Trust in Social Media
Authors:
Jamal Al Qundus,
Adrian Paschke
Abstract:
One main challenge in social media is to identify trustworthy information. If we cannot recognize information as trustworthy, that information may become useless or be lost. Opposite, we could consume wrong or fake information with major consequences. How does a user handle the information provided before consuming it? Are the comments on a post, the author or votes essential for taking such a dec…
▽ More
One main challenge in social media is to identify trustworthy information. If we cannot recognize information as trustworthy, that information may become useless or be lost. Opposite, we could consume wrong or fake information with major consequences. How does a user handle the information provided before consuming it? Are the comments on a post, the author or votes essential for taking such a decision? Are these attributes considered together and which attribute is more important? To answer these questions, we developed a trust model to support knowledge sharing of user content in social media. This trust model is based on the dimensions of stability, quality, and credibility. Each dimension contains metrics (user role, user IQ, votes, etc.) that are important to the user based on data analysis. We present in this paper, an evaluation of the proposed trust model using conjoint analysis (CA) as an evaluation method. The results obtained from 348 responses, validate the trust model. A trust degree translator interprets the content as very trusted, trusted, untrusted, and very untrusted based on the calculated value of trust. Furthermore, the results show different importance for each dimension: stability 24%, credibility 35% and quality 41%.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
The Role of Pragmatics in Legal Norm Representation
Authors:
Shashishekar Ramakrishna,
Lukasz Gorski,
Adrian Paschke
Abstract:
Despite the 'apparent clarity' of a given legal provision, its application may result in an outcome that does not exactly conform to the semantic level of a statute. The vagueness within a legal text is induced intentionally to accommodate all possible scenarios under which such norms should be applied, thus making the role of pragmatics an important aspect also in the representation of a legal no…
▽ More
Despite the 'apparent clarity' of a given legal provision, its application may result in an outcome that does not exactly conform to the semantic level of a statute. The vagueness within a legal text is induced intentionally to accommodate all possible scenarios under which such norms should be applied, thus making the role of pragmatics an important aspect also in the representation of a legal norm and reasoning on top of it. The notion of pragmatics considered in this paper does not focus on the aspects associated with judicial decision making. The paper aims to shed light on the aspects of pragmatics in legal linguistics, mainly focusing on the domain of patent law, only from a knowledge representation perspective. The philosophical discussions presented in this paper are grounded based on the legal theories from Grice and Marmor.
△ Less
Submitted 8 July, 2015;
originally announced July 2015.
-
Aspect OntoMaven - Aspect-Oriented Ontology Development and Configuration With OntoMaven
Authors:
Adrian Paschke,
Ralph Schaefermeier
Abstract:
In agile ontology-based software engineering projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The contribution of this paper is a new design artifact called OntoMaven combined with a unified approach to ontology modularization, aspect-oriented ontology developm…
▽ More
In agile ontology-based software engineering projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The contribution of this paper is a new design artifact called OntoMaven combined with a unified approach to ontology modularization, aspect-oriented ontology development, which was inspired by aspect-oriented programming. OntoMaven adopts the Apache Maven-based development methodology and adapts its concepts to knowledge engineering for Maven-based ontology development and management of ontology artifacts in distributed ontology repositories. The combination with aspect-oriented ontology development allows for fine-grained, declarative configuration of ontology modules.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Rule reasoning for legal norm validation of FSTP facts
Authors:
Naouel Karam,
Shashishekar Ramakrishna,
Adrian Paschke
Abstract:
Non-obviousness or inventive step is a general requirement for patentability in most patent law systems. An invention should be at an adequate distance beyond its prior art in order to be patented. This short paper provides an overview on a methodology proposed for legal norm validation of FSTP facts using rule reasoning approach.
Non-obviousness or inventive step is a general requirement for patentability in most patent law systems. An invention should be at an adequate distance beyond its prior art in order to be patented. This short paper provides an overview on a methodology proposed for legal norm validation of FSTP facts using rule reasoning approach.
△ Less
Submitted 5 December, 2014;
originally announced December 2014.
-
Bridging the gap between Legal Practitioners and Knowledge Engineers using semi-formal KR
Authors:
Shashishekar Ramakrishna,
Adrian Paschke
Abstract:
The use of Structured English as a computation independent knowledge representation format for non-technical users in business rules representation has been proposed in OMGs Semantics and Business Vocabulary Representation (SBVR). In the legal domain we face a similar problem. Formal representation languages, such as OASIS LegalRuleML and legal ontologies (LKIF, legal OWL2 ontologies etc.) support…
▽ More
The use of Structured English as a computation independent knowledge representation format for non-technical users in business rules representation has been proposed in OMGs Semantics and Business Vocabulary Representation (SBVR). In the legal domain we face a similar problem. Formal representation languages, such as OASIS LegalRuleML and legal ontologies (LKIF, legal OWL2 ontologies etc.) support the technical knowledge engineer and the automated reasoning. But, they can be hardly used directly by the legal domain experts who do not have a computer science background. In this paper we adapt the SBVR Structured English approach for the legal domain and implement a proof-of-concept, called KR4IPLaw, which enables legal domain experts to represent their knowledge in Structured English in a computational independent and hence, for them, more usable way. The benefit of this approach is that the underlying pre-defined semantics of the Structured English approach makes transformations into formal languages such as OASIS LegalRuleML and OWL2 ontologies possible. We exemplify our approach in the domain of patent law.
△ Less
Submitted 31 May, 2014;
originally announced June 2014.
-
Semantic Jira - Semantic Expert Finder in the Bug Tracking Tool Jira
Authors:
Velten Heyn,
Adrian Paschke
Abstract:
The semantic expert recommender extension for the Jira bug tracking system semantically searches for similar tickets in Jira and recommends experts and links to existing organizational (Wiki) knowledge for each ticket. This helps to avoid redundant work and supports the search and collaboration with experts in the project management and maintenance phase based on semantically enriched tickets in J…
▽ More
The semantic expert recommender extension for the Jira bug tracking system semantically searches for similar tickets in Jira and recommends experts and links to existing organizational (Wiki) knowledge for each ticket. This helps to avoid redundant work and supports the search and collaboration with experts in the project management and maintenance phase based on semantically enriched tickets in Jira.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.
-
OntoMaven: Maven-based Ontology Development and Management of Distributed Ontology Repositories
Authors:
Adrian Paschke
Abstract:
In collaborative agile ontology development projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The Apache Maven approach has proven its success in distributed collaborative Software Engineering by its widespread adoption. The contribution of this paper is a new d…
▽ More
In collaborative agile ontology development projects support for modular reuse of ontologies from large existing remote repositories, ontology project life cycle management, and transitive dependency management are important needs. The Apache Maven approach has proven its success in distributed collaborative Software Engineering by its widespread adoption. The contribution of this paper is a new design artifact called OntoMaven. OntoMaven adopts the Maven-based development methodology and adapts its concepts to knowledge engineering for Maven-based ontology development and management of ontology artifacts in distributed ontology repositories.
△ Less
Submitted 27 September, 2013;
originally announced September 2013.
-
An Inter-lingual Reference Approach For Multi-Lingual Ontology Matching
Authors:
Haytham Al-Feel,
Ralph Schafermeier,
Adrian Paschke
Abstract:
Ontologies are considered as the backbone of the Semantic Web. With the rising success of the Semantic Web, the number of participating communities from different countries is constantly increasing. The growing number of ontologies available in different natural languages leads to an interoperability problem. In this paper, we discuss several approaches for ontology matching; examine similarities…
▽ More
Ontologies are considered as the backbone of the Semantic Web. With the rising success of the Semantic Web, the number of participating communities from different countries is constantly increasing. The growing number of ontologies available in different natural languages leads to an interoperability problem. In this paper, we discuss several approaches for ontology matching; examine similarities and differences, identify weaknesses, and compare the existing automated approaches with the manual approaches for integrating multilingual ontologies. In addition to that, we propose a new architecture for a multilingual ontology matching service. As a case study we used an example of two multilingual enterprise ontologies - the university ontology of Freie Universitaet Berlin and the ontology for Fayoum University in Egypt.
△ Less
Submitted 25 September, 2013;
originally announced September 2013.
-
The Rule Responder eScience Infrastructure
Authors:
Adrian Paschke,
Zhili Zhao
Abstract:
To a large degree information and services for chemical e-Science have become accessible - anytime, anywhere - but not necessarily useful. The Rule Responder eScience middleware is about providing information consumers with rule-based agents to transform existing information into relevant information of practical consequences, hence providing control to the end-users to express in a declarative ru…
▽ More
To a large degree information and services for chemical e-Science have become accessible - anytime, anywhere - but not necessarily useful. The Rule Responder eScience middleware is about providing information consumers with rule-based agents to transform existing information into relevant information of practical consequences, hence providing control to the end-users to express in a declarative rule-based way how to turn existing information into personally relevant information and how to react or make automated decisions on top of it.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
Use of semantic technologies for the development of a dynamic trajectories generator in a Semantic Chemistry eLearning platform
Authors:
Richard Huber,
Kirsten Hantelmann,
Alexandru Todor,
Sebastian Krebs,
Ralf Heese,
Adrian Paschke
Abstract:
ChemgaPedia is a multimedia, webbased eLearning service platform that currently contains about 18.000 pages organized in 1.700 chapters covering the complete bachelor studies in chemistry and related topics of chemistry, pharmacy, and life sciences. The eLearning encyclopedia contains some 25.000 media objects and the eLearning platform provides services such as virtual and remote labs for experim…
▽ More
ChemgaPedia is a multimedia, webbased eLearning service platform that currently contains about 18.000 pages organized in 1.700 chapters covering the complete bachelor studies in chemistry and related topics of chemistry, pharmacy, and life sciences. The eLearning encyclopedia contains some 25.000 media objects and the eLearning platform provides services such as virtual and remote labs for experiments. With up to 350.000 users per month the platform is the most frequently used scientific educational service in the German spoken Internet. In this demo we show the benefit of map** the static eLearning contents of ChemgaPedia to a Linked Data representation for Semantic Chemistry which allows for generating dynamic eLearning paths tailored to the semantic profiles of the users.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
ChemCloud: Chemical e-Science Information Cloud
Authors:
Alexandru Todor,
Adrian Paschke,
Stephan Heineke
Abstract:
Our Chemical e-Science Information Cloud (ChemCloud) - a Semantic Web based eScience infrastructure - integrates and automates a multitude of databases, tools and services in the domain of chemistry, pharmacy and bio-chemistry available at the Fachinformationszentrum Chemie (FIZ Chemie), at the Freie Universitaet Berlin (FUB), and on the public Web. Based on the approach of the W3C Linked Open Dat…
▽ More
Our Chemical e-Science Information Cloud (ChemCloud) - a Semantic Web based eScience infrastructure - integrates and automates a multitude of databases, tools and services in the domain of chemistry, pharmacy and bio-chemistry available at the Fachinformationszentrum Chemie (FIZ Chemie), at the Freie Universitaet Berlin (FUB), and on the public Web. Based on the approach of the W3C Linked Open Data initiative and the W3C Semantic Web technologies for ontologies and rules it semantically links and integrates knowledge from our W3C HCLS knowledge base hosted at the FUB, our multi-domain knowledge base DBpedia (Deutschland) implemented at FUB, which is extracted from Wikipedia (De) providing a public semantic resource for chemistry, and our well-established databases at FIZ Chemie such as ChemInform for organic reaction data, InfoTherm the leading source for thermophysical data, Chemisches Zentralblatt, the complete chemistry knowledge from 1830 to 1969, and ChemgaPedia the largest and most frequented e-Learning platform for Chemistry and related sciences in German language.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
Process Makna - A Semantic Wiki for Scientific Workflows
Authors:
Adrian Paschke,
Zhili Zhao
Abstract:
Virtual e-Science infrastructures supporting Web-based scientific workflows are an example for knowledge-intensive collaborative and weakly-structured processes where the interaction with the human scientists during process execution plays a central role. In this paper we propose the lightweight dynamic user-friendly interaction with humans during execution of scientific workflows via the low-barr…
▽ More
Virtual e-Science infrastructures supporting Web-based scientific workflows are an example for knowledge-intensive collaborative and weakly-structured processes where the interaction with the human scientists during process execution plays a central role. In this paper we propose the lightweight dynamic user-friendly interaction with humans during execution of scientific workflows via the low-barrier approach of Semantic Wikis as an intuitive interface for non-technical scientists. Our Process Makna Semantic Wiki system is a novel combination of an business process management system adapted for scientific workflows with a Corporate Semantic Web Wiki user interface supporting knowledge intensive human interaction tasks during scientific workflow execution.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
A Homogeneous Reaction Rule Language for Complex Event Processing
Authors:
Adrian Paschke,
Alexander Kozlenkov,
Harold Boley
Abstract:
Event-driven automation of reactive functionalities for complex event processing is an urgent need in today's distributed service-oriented architectures and Web-based event-driven environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic embodied in reaction rules, and combining this with other conditional…
▽ More
Event-driven automation of reactive functionalities for complex event processing is an urgent need in today's distributed service-oriented architectures and Web-based event-driven environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic embodied in reaction rules, and combining this with other conditional decision logic embodied, e.g., in derivation rules. This paper elaborates a homogeneous integration approach that combines derivation rules, reaction rules and other rule types such as integrity constraints into the general framework of logic programming, the industrial-strength version of declarative programming. We describe syntax and semantics of the language, implement a distributed web-based middleware using enterprise service technologies and illustrate its adequacy in terms of expressiveness, efficiency and scalability through examples extracted from industrial use cases. The developed reaction rule language provides expressive features such as modular ID-based updates with support for external imports and self-updates of the intensional and extensional knowledge bases, transactions including integrity testing and roll-backs of update transition paths. It also supports distributed complex event processing, event messaging and event querying via efficient and scalable enterprise middleware technologies and event/action reasoning based on an event/action algebra implemented by an interval-based event calculus variant as a logic inference formalism.
△ Less
Submitted 4 August, 2010;
originally announced August 2010.
-
Design Patterns for Complex Event Processing
Authors:
Adrian Paschke
Abstract:
Currently engineering efficient and successful event-driven applications based on the emerging Complex Event Processing (CEP) technology, is a laborious trial and error process. The proposed CEP design pattern approach should support CEP engineers in their design decisions to build robust and efficient CEP solutions with well understood tradeoffs and should enable an interdisciplinary and effici…
▽ More
Currently engineering efficient and successful event-driven applications based on the emerging Complex Event Processing (CEP) technology, is a laborious trial and error process. The proposed CEP design pattern approach should support CEP engineers in their design decisions to build robust and efficient CEP solutions with well understood tradeoffs and should enable an interdisciplinary and efficient communication process about successful CEP solutions in different application domains.
△ Less
Submitted 6 June, 2008;
originally announced June 2008.
-
Knowledge Representation Concepts for Automated SLA Management
Authors:
Adrian Paschke,
Martin Bichler
Abstract:
Outsourcing of complex IT infrastructure to IT service providers has increased substantially during the past years. IT service providers must be able to fulfil their service-quality commitments based upon predefined Service Level Agreements (SLAs) with the service customer. They need to manage, execute and maintain thousands of SLAs for different customers and different types of services, which…
▽ More
Outsourcing of complex IT infrastructure to IT service providers has increased substantially during the past years. IT service providers must be able to fulfil their service-quality commitments based upon predefined Service Level Agreements (SLAs) with the service customer. They need to manage, execute and maintain thousands of SLAs for different customers and different types of services, which needs new levels of flexibility and automation not available with the current technology. The complexity of contractual logic in SLAs requires new forms of knowledge representation to automatically draw inferences and execute contractual agreements. A logic-based approach provides several advantages including automated rule chaining allowing for compact knowledge representation as well as flexibility to adapt to rapidly changing business requirements. We suggest adequate logical formalisms for representation and enforcement of SLA rules and describe a proof-of-concept implementation. The article describes selected formalisms of the ContractLog KR and their adequacy for automated SLA management and presents results of experiments to demonstrate flexibility and scalability of the approach.
△ Less
Submitted 23 November, 2006;
originally announced November 2006.
-
The Reaction RuleML Classification of the Event / Action / State Processing and Reasoning Space
Authors:
Adrian Paschke
Abstract:
Reaction RuleML is a general, practical, compact and user-friendly XML-serialized language for the family of reaction rules. In this white paper we give a review of the history of event / action /state processing and reaction rule approaches and systems in different domains, define basic concepts and give a classification of the event, action, state processing and reasoning space as well as a di…
▽ More
Reaction RuleML is a general, practical, compact and user-friendly XML-serialized language for the family of reaction rules. In this white paper we give a review of the history of event / action /state processing and reaction rule approaches and systems in different domains, define basic concepts and give a classification of the event, action, state processing and reasoning space as well as a discussion of relevant / related work
△ Less
Submitted 10 November, 2006;
originally announced November 2006.
-
ECA-RuleML: An Approach combining ECA Rules with temporal interval-based KR Event/Action Logics and Transactional Update Logics
Authors:
Adrian Paschke
Abstract:
An important problem to be addressed within Event-Driven Architecture (EDA) is how to correctly and efficiently capture and process the event/action-based logic. This paper endeavors to bridge the gap between the Knowledge Representation (KR) approaches based on durable events/actions and such formalisms as event calculus, on one hand, and event-condition-action (ECA) reaction rules extending th…
▽ More
An important problem to be addressed within Event-Driven Architecture (EDA) is how to correctly and efficiently capture and process the event/action-based logic. This paper endeavors to bridge the gap between the Knowledge Representation (KR) approaches based on durable events/actions and such formalisms as event calculus, on one hand, and event-condition-action (ECA) reaction rules extending the approach of active databases that view events as instantaneous occurrences and/or sequences of events, on the other. We propose formalism based on reaction rules (ECA rules) and a novel interval-based event logic and present concrete RuleML-based syntax, semantics and implementation. We further evaluate this approach theoretically, experimentally and on an example derived from common industry use cases and illustrate its benefits.
△ Less
Submitted 10 November, 2006; v1 submitted 30 October, 2006;
originally announced October 2006.
-
A Typed Hybrid Description Logic Programming Language with Polymorphic Order-Sorted DL-Typed Unification for Semantic Web Type Systems
Authors:
Adrian Paschke
Abstract:
In this paper we elaborate on a specific application in the context of hybrid description logic programs (hybrid DLPs), namely description logic Semantic Web type systems (DL-types) which are used for term ty** of LP rules based on a polymorphic, order-sorted, hybrid DL-typed unification as procedural semantics of hybrid DLPs. Using Semantic Web ontologies as type systems facilitates interchan…
▽ More
In this paper we elaborate on a specific application in the context of hybrid description logic programs (hybrid DLPs), namely description logic Semantic Web type systems (DL-types) which are used for term ty** of LP rules based on a polymorphic, order-sorted, hybrid DL-typed unification as procedural semantics of hybrid DLPs. Using Semantic Web ontologies as type systems facilitates interchange of domain-independent rules over domain boundaries via dynamically ty** and map** of explicitly defined type ontologies.
△ Less
Submitted 3 April, 2007; v1 submitted 2 October, 2006;
originally announced October 2006.
-
ECA-LP / ECA-RuleML: A Homogeneous Event-Condition-Action Logic Programming Language
Authors:
Adrian Paschke
Abstract:
Event-driven reactive functionalities are an urgent need in nowadays distributed service-oriented applications and (Semantic) Web-based environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic represented as ECA rules in combination with other conditional decision logic which is represented as derivation…
▽ More
Event-driven reactive functionalities are an urgent need in nowadays distributed service-oriented applications and (Semantic) Web-based environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic represented as ECA rules in combination with other conditional decision logic which is represented as derivation rules. In this paper we elaborate on a homogeneous integration approach which combines derivation rules, reaction rules (ECA rules) and other rule types such as integrity constraint into the general framework of logic programming. The developed ECA-LP language provides expressive features such as ID-based updates with support for external and self-updates of the intensional and extensional knowledge, transac-tions including integrity testing and an event algebra to define and process complex events and actions based on a novel interval-based Event Calculus variant.
△ Less
Submitted 26 September, 2006;
originally announced September 2006.
-
Rule-based Knowledge Representation for Service Level Agreement
Authors:
Adrian Paschke
Abstract:
Automated management and monitoring of service contracts like Service Level Agreements (SLAs) or higher-level policies is vital for efficient and reliable distributed service-oriented architectures (SOA) with high quality of ser-vice (QoS) levels. IT service provider need to manage, execute and maintain thousands of SLAs for different customers and different types of services, which needs new le…
▽ More
Automated management and monitoring of service contracts like Service Level Agreements (SLAs) or higher-level policies is vital for efficient and reliable distributed service-oriented architectures (SOA) with high quality of ser-vice (QoS) levels. IT service provider need to manage, execute and maintain thousands of SLAs for different customers and different types of services, which needs new levels of flexibility and automation not available with the current technol-ogy. I propose a novel rule-based knowledge representation (KR) for SLA rules and a respective rule-based service level management (RBSLM) framework. My rule-based approach based on logic programming provides several advantages including automated rule chaining allowing for compact knowledge representation and high levels of automation as well as flexibility to adapt to rapidly changing business requirements. Therewith, I address an urgent need service-oriented busi-nesses do have nowadays which is to dynamically change their business and contractual logic in order to adapt to rapidly changing business environments and to overcome the restricting nature of slow change cycles.
△ Less
Submitted 21 September, 2006;
originally announced September 2006.
-
Verification, Validation and Integrity of Distributed and Interchanged Rule Based Policies and Contracts in the Semantic Web
Authors:
Adrian Paschke
Abstract:
Rule-based policy and contract systems have rarely been studied in terms of their software engineering properties. This is a serious omission, because in rule-based policy or contract representation languages rules are being used as a declarative programming language to formalize real-world decision logic and create IS production systems upon. This paper adopts an SE methodology from extreme pro…
▽ More
Rule-based policy and contract systems have rarely been studied in terms of their software engineering properties. This is a serious omission, because in rule-based policy or contract representation languages rules are being used as a declarative programming language to formalize real-world decision logic and create IS production systems upon. This paper adopts an SE methodology from extreme programming, namely test driven development, and discusses how it can be adapted to verification, validation and integrity testing (V&V&I) of policy and contract specifications. Since, the test-driven approach focuses on the behavioral aspects and the drawn conclusions instead of the structure of the rule base and the causes of faults, it is independent of the complexity of the rule language and the system under test and thus much easier to use and understand for the rule engineer and the user.
△ Less
Submitted 29 September, 2006; v1 submitted 21 September, 2006;
originally announced September 2006.