-
WineGraph: A Graph Representation For Food-Wine Pairing
Authors:
Zuzanna Gawrysiak,
Agata Żywot,
Agnieszka Ławrynowicz
Abstract:
We present WineGraph, an extended version of FlavorGraph, a heterogeneous graph incorporating wine data into its structure. This integration enables food-wine pairing based on taste and sommelier-defined rules. Leveraging a food dataset comprising 500,000 reviews and a wine reviews dataset with over 130,000 entries, we computed taste descriptors for both food and wine. This information was then ut…
▽ More
We present WineGraph, an extended version of FlavorGraph, a heterogeneous graph incorporating wine data into its structure. This integration enables food-wine pairing based on taste and sommelier-defined rules. Leveraging a food dataset comprising 500,000 reviews and a wine reviews dataset with over 130,000 entries, we computed taste descriptors for both food and wine. This information was then utilised to pair food items with wine and augment FlavorGraph with additional data. The results demonstrate the potential of heterogeneous graphs to acquire supplementary information, proving beneficial for wine pairing.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
A Knowledge Engineering Primer
Authors:
Agnieszka Ławrynowicz
Abstract:
The aim of this primer is to introduce the subject of knowledge engineering in a concise but synthetic way to develop the reader's intuition about the area.
The aim of this primer is to introduce the subject of knowledge engineering in a concise but synthetic way to develop the reader's intuition about the area.
△ Less
Submitted 25 March, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark
Authors:
Ania Wróblewska,
Agnieszka Kaliska,
Maciej Pawłowski,
Dawid Wiśniewski,
Witold Sosnowski,
Agnieszka Ławrynowicz
Abstract:
Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity…
▽ More
Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste.
The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.
△ Less
Submitted 16 April, 2022;
originally announced April 2022.
-
BigCQ: A large-scale synthetic dataset of competency question patterns formalized into SPARQL-OWL query templates
Authors:
Dawid Wiśniewski,
Jędrzej Potoniec,
Agnieszka Ławrynowicz
Abstract:
Competency Questions (CQs) are used in many ontology engineering methodologies to collect requirements and track the completeness and correctness of an ontology being constructed. Although they are frequently suggested by ontology engineering methodologies, the publicly available datasets of CQs and their formalizations in ontology query languages are very scarce. Since first efforts to automate p…
▽ More
Competency Questions (CQs) are used in many ontology engineering methodologies to collect requirements and track the completeness and correctness of an ontology being constructed. Although they are frequently suggested by ontology engineering methodologies, the publicly available datasets of CQs and their formalizations in ontology query languages are very scarce. Since first efforts to automate processes utilizing CQs are being made, it is of high importance to provide large and diverse datasets to fuel these solutions. In this paper, we present BigCQ, the biggest dataset of CQ templates with their formalizations into SPARQL-OWL query templates. BigCQ is created automatically from a dataset of frequently used axiom shapes. These pairs of CQ templates and query templates can be then materialized as actual CQs and SPARQL-OWL queries if filled with resource labels and IRIs from a given ontology. We describe the dataset in detail, provide a description of the process leading to the creation of the dataset and analyze how well the dataset covers real-world examples. We also publish the dataset as well as scripts transforming axiom shapes into pairs of CQ patterns and SPARQL-OWL templates, to make engineers able to adapt the process to their particular needs.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
More Effective Ontology Authoring with Test-Driven Development
Authors:
C. Maria Keet,
Kieren Davies,
Agnieszka Lawrynowicz
Abstract:
Ontology authoring is a complex process, where commonly the automated reasoner is invoked for verification of newly introduced changes, therewith amounting to a time-consuming test-last approach. Test-Driven Development (TDD) for ontology authoring is a recent {\em test-first} approach that aims to reduce authoring time and increase authoring efficiency. Current TDD testing falls short on coverage…
▽ More
Ontology authoring is a complex process, where commonly the automated reasoner is invoked for verification of newly introduced changes, therewith amounting to a time-consuming test-last approach. Test-Driven Development (TDD) for ontology authoring is a recent {\em test-first} approach that aims to reduce authoring time and increase authoring efficiency. Current TDD testing falls short on coverage of OWL features and possible test outcomes, the rigorous foundation thereof, and evaluations to ascertain its effectiveness.
We aim to address these issues in one instantiation of TDD for ontology authoring. We first propose a succinct, logic-based model of TDD testing and present novel TDD algorithms so as to cover also any OWL 2 class expression for the TBox and for the principal ABox assertions, and prove their correctness. The algorithms use methods from the OWL API directly such that reclassification is not necessary for test execution, therewith reducing ontology authoring time. The algorithms were implemented in TDDonto2, a Protégé plugin. TDDonto2 was evaluated on editing efficiency and by users. The editing efficiency study demonstrated that it is faster than a typical ontology authoring interface, especially for medium size and large ontologies. The user evaluation demonstrated that modellers make significantly less errors with TDDonto2 compared to the standard Protégé interface and complete their tasks better using less time. Thus, the results indicate that Test-Driven Development is a promising approach in an ontology development methodology.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.
-
Competency Questions and SPARQL-OWL Queries Dataset and Analysis
Authors:
Dawid Wisniewski,
Jedrzej Potoniec,
Agnieszka Lawrynowicz,
C. Maria Keet
Abstract:
Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SP…
▽ More
Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SPARQL queries. This paper aims to contribute to addressing the engineering shortcomings of using CQs in ontology development, to facilitate wider use of CQs. In order to understand the relation between CQs and the queries over the ontology to test the CQs on an ontology, we gather, analyse, and publicly release a set of 234 CQs and their translations to SPARQL-OWL for several ontologies in different domains developed by different groups. We analysed the CQs in two principal ways. The first stage focused on a linguistic analysis of the natural language text itself, i.e., a lexico-syntactic analysis without any presuppositions of ontology elements, and a subsequent step of semantic analysis in order to find patterns. This increased diversity of CQ sources resulted in a 5-fold increase of hitherto published patterns, to 106 distinct CQ patterns, which have a limited subset of few patterns shared across the CQ sets from the different ontologies. Next, we analysed the relation between the found CQ patterns and the 46 SPARQL-OWL query signatures, which revealed that one CQ pattern may be realised by more than one SPARQL-OWL query signature, and vice versa. We hope that our work will contribute to establishing common practices, templates, automation, and user tools that will support CQ formulation, formalisation, execution, and general management.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies
Authors:
Gustavo Correa Publio,
Diego Esteves,
Agnieszka Ławrynowicz,
Panče Panov,
Larisa Soldatova,
Tommaso Soru,
Joaquin Vanschoren,
Hamid Zafar
Abstract:
The ML-Schema, proposed by the W3C Machine Learning Schema Community Group, is a top-level ontology that provides a set of classes, properties, and restrictions for representing and interchanging information on machine learning algorithms, datasets, and experiments. It can be easily extended and specialized and it is also mapped to other more domain-specific ontologies developed in the area of mac…
▽ More
The ML-Schema, proposed by the W3C Machine Learning Schema Community Group, is a top-level ontology that provides a set of classes, properties, and restrictions for representing and interchanging information on machine learning algorithms, datasets, and experiments. It can be easily extended and specialized and it is also mapped to other more domain-specific ontologies developed in the area of machine learning and data mining. In this paper we overview existing state-of-the-art machine learning interchange formats and present the first release of ML-Schema, a canonical format resulted of more than seven years of experience among different research institutions. We argue that exposing semantics of machine learning algorithms, models, and experiments through a canonical format may pave the way to better interpretability and to realistically achieve the full interoperability of experiments regardless of platform or adopted workflow solution.
△ Less
Submitted 14 July, 2018;
originally announced July 2018.
-
Swift Linked Data Miner: Mining OWL 2 EL class expressions directly from online RDF datasets
Authors:
Jedrzej Potoniec,
Piotr Jakubowski,
Agnieszka Ławrynowicz
Abstract:
In this study, we present Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new SubClassOf: axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the inde…
▽ More
In this study, we present Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new SubClassOf: axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the index to mine axioms. We propose a transformation function from mined axioms to RDF Data Shapes. We show, by means of a crowdsourcing experiment, that most of the axioms mined by Swift Linked Data Miner are correct and can be added to an ontology. We provide a ready to use Protégé plugin implementing the algorithm, to support ontology engineers in their daily modeling work.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
The design and the performance of stratospheric mission in the search for the Schumann resonances
Authors:
Arkadiusz Papaj,
Piotr Weszka,
Marcin Bocheński,
Mateusz Michałek,
Andrzej Kułak,
Agata Kołodziejczyk,
Matt Harasymczuk,
Paweł Karbowniczek,
Aleksandra Ławrynowicz,
Joanna Kuźma,
Tomasz Brol,
Radosław A. Kycia
Abstract:
The technical details of a balloon stratospheric mission that is aimed at measuring the Schumann resonances are described. The gondola is designed specifically for the measuring of faint effects of ELF (Extremely Low Frequency electromagnetic waves) phenomena. The prototype met the design requirements. The ELF measuring system worked properly for entire mission; however, the level of signal amplif…
▽ More
The technical details of a balloon stratospheric mission that is aimed at measuring the Schumann resonances are described. The gondola is designed specifically for the measuring of faint effects of ELF (Extremely Low Frequency electromagnetic waves) phenomena. The prototype met the design requirements. The ELF measuring system worked properly for entire mission; however, the level of signal amplification that was chosen taking into account ground-level measurements was too high. Movement of the gondola in the Earth magnetic field induced the signal in the antenna that saturated the measuring system. This effect will be taken into account in the planning of future missions. A large telemetry dataset was gathered during the experiment and is currently under processing. The payload consists also of biological material as well as electronic equipment that was tested under extreme conditions.
△ Less
Submitted 2 October, 2017; v1 submitted 28 April, 2017;
originally announced April 2017.
-
Test-Driven Development of ontologies (extended version)
Authors:
C. Maria Keet,
Agnieszka Lawrynowicz
Abstract:
Emerging ontology authoring methods to add knowledge to an ontology focus on ameliorating the validation bottleneck. The verification of the newly added axiom is still one of trying and seeing what the reasoner says, because a systematic testbed for ontology authoring is missing. We sought to address this by introducing the approach of test-driven development for ontology authoring. We specify 36…
▽ More
Emerging ontology authoring methods to add knowledge to an ontology focus on ameliorating the validation bottleneck. The verification of the newly added axiom is still one of trying and seeing what the reasoner says, because a systematic testbed for ontology authoring is missing. We sought to address this by introducing the approach of test-driven development for ontology authoring. We specify 36 generic tests, as TBox queries and TBox axioms tested through individuals, and structure their inner workings in an `open box'-way, which cover the OWL 2 DL language features. This is implemented as a Protege plugin so that one can perform a TDD test as a black box test. We evaluated the two test approaches on their performance. The TBox queries were faster, and that effect is more pronounced the larger the ontology is. We provide a general sequence of a TDD process for ontology engineering as a foundation for a TDD methodology.
△ Less
Submitted 19 December, 2015;
originally announced December 2015.
-
The role of semantics in mining frequent patterns from knowledge bases in description logics with rules
Authors:
Joanna Jozefowska,
Agnieszka Lawrynowicz,
Tomasz Lukaszewski
Abstract:
We propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular we consider the setting of using a language that combines description logics with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to…
▽ More
We propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular we consider the setting of using a language that combines description logics with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to the task of frequent pattern discovery, and for the core of our method, we propose an algorithm that exploits the semantics of the combined knowledge base. We have developed a proof-of-concept data mining implementation of this. Using this we have empirically shown that using the combined knowledge base to perform semantic tests can make data mining faster by pruning useless candidate patterns before their evaluation. We have also shown that the quality of the set of patterns produced may be improved: the patterns are more compact, and there are fewer patterns. We conclude that exploiting the semantics of a chosen representation formalism is key to the design and application of (onto-)relational frequent pattern discovery methods. Note: To appear in Theory and Practice of Logic Programming (TPLP)
△ Less
Submitted 1 April, 2010; v1 submitted 13 March, 2010;
originally announced March 2010.