-
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models
Authors:
Anna A. Ivanova,
Aalok Sathe,
Benjamin Lipkin,
Unnathi Kumar,
Setayesh Radkani,
Thomas H. Clark,
Carina Kauf,
Jennifer Hu,
R. T. Pramod,
Gabriel Grand,
Vivian Paulun,
Maria Ryskina,
Ekin Akyürek,
Ethan Wilcox,
Nafisa Rashid,
Leshem Choshen,
Roger Levy,
Evelina Fedorenko,
Joshua Tenenbaum,
Jacob Andreas
Abstract:
The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/i…
▽ More
The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/implausible context. EWOK targets specific concepts from multiple knowledge domains known to be vital for world modeling in humans. Domains range from social interactions (help/hinder) to spatial relations (left/right). Both, contexts and targets are minimal pairs. Objects, agents, and locations in the items can be flexibly filled in enabling easy generation of multiple controlled datasets. We then introduce EWOK-CORE-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 openweights large language models (1.3B--70B parameters) across a battery of evaluation paradigms along with a human norming study comprising 12,480 measurements. The overall performance of all tested models is worse than human performance, with results varying drastically across domains. These data highlight simple cases where even large models fail and present rich avenues for targeted research on LLM world modeling capabilities.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Knowledge Engineering for Wind Energy
Authors:
Yuriy Marykovskiy,
Thomas Clark,
Justin Day,
Marcus Wiens,
Charles Henderson,
Julian Quick,
Imad Abdallah,
Anna Maria Sempreviva,
Jean-Paul Calbimonte,
Eleni Chatzi,
Sarah Barber
Abstract:
With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it a…
▽ More
With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it available for use in next generation artificially intelligent systems. To this end, this article highlights the role that knowledge engineering can play in the process of digital transformation of the wind energy sector. It presents the main concepts underpinning Knowledge-Based Systems and summarises previous work in the areas of knowledge engineering and knowledge representation in a manner that is relevant and accessible to domain experts. A systematic analysis of the current state-of-the-art on knowledge engineering in the wind energy domain is performed, with available tools put into perspective by establishing the main domain actors and their needs and identifying key problematic areas. Finally, guidelines for further development and improvement are provided.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
A Cross-Linguistic Pressure for Uniform Information Density in Word Order
Authors:
Thomas Hikaru Clark,
Clara Meister,
Tiago Pimentel,
Michael Hahn,
Ryan Cotterell,
Richard Futrell,
Roger Levy
Abstract:
While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the unifor…
▽ More
While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.
△ Less
Submitted 9 July, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
From Compact Plasma Particle Sources to Advanced Accelerators with Modeling at Exascale
Authors:
Axel Huebl,
Remi Lehe,
Edoardo Zoni,
Olga Shapoval,
Ryan T. Sandberg,
Marco Garten,
Arianna Formenti,
Revathi Jambunathan,
Prabhat Kumar,
Kevin Gott,
Andrew Myers,
Weiqun Zhang,
Ann Almgren,
Chad E. Mitchell,
Ji Qiang,
David Grote,
Alexander Sinn,
Severin Diederichs,
Maxence Thevenet,
Luca Fedeli,
Thomas Clark,
Neil Zaim,
Henri Vincenti,
Jean-Luc Vay
Abstract:
Develo** complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using…
▽ More
Develo** complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using the US DOE Exascale Computing Project WarpX. Leveraging developments for Exascale, the new DOE SCIDAC-5 Consortium for Advanced Modeling of Particle Accelerators (CAMPA) will advance numerical algorithms and accelerate community modeling codes in a cohesive manner: from beam source, over energy boost, transport, injection, storage, to application or interaction. Such start-to-end modeling will enable the exploration of hybrid accelerators, with conventional and advanced elements, as the next step for advanced accelerator modeling. Following open community standards, we seed an open ecosystem of codes that can be readily combined with each other and machine learning frameworks. These will cover ultrafast to ultraprecise modeling for future hybrid accelerator design, even enabling virtual test stands and twins of accelerators that can be used in operations.
△ Less
Submitted 18 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Through-life Monitoring of Resource-constrained Systems and Fleets
Authors:
Felipe Montana,
Adam Hartwell,
Will Jacobs,
Visakan Kadirkamanathan,
Andrew R Mills,
Tom Clark
Abstract:
A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges…
▽ More
A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Analyzing Wrap-Up Effects through an Information-Theoretic Lens
Authors:
Clara Meister,
Tiago Pimentel,
Thomas Hikaru Clark,
Ryan Cotterell,
Roger Levy
Abstract:
Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension. However, data measured on words at the end of a sentence -- or even at the end of a clause -- is often omitted due to the confounding factors introduced by so-called "wrap-up effects," which manifests as a skewed distribution of RTs for t…
▽ More
Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension. However, data measured on words at the end of a sentence -- or even at the end of a clause -- is often omitted due to the confounding factors introduced by so-called "wrap-up effects," which manifests as a skewed distribution of RTs for these words. Consequently, the understanding of the cognitive processes that might be involved in these wrap-up effects is limited. In this work, we attempt to learn more about these processes by examining the relationship between wrap-up effects and information-theoretic quantities, such as word and context surprisals. We find that the distribution of information in prior contexts is often predictive of sentence- and clause-final RTs (while not of sentence-medial RTs). This lends support to several prior hypotheses about the processes involved in wrap-up effects.
△ Less
Submitted 5 January, 2024; v1 submitted 31 March, 2022;
originally announced March 2022.
-
In-flight Novelty Detection with Convolutional Neural Networks
Authors:
Adam Hartwell,
Felipe Montana,
Will Jacobs,
Visakan Kadirkamanathan,
Andrew R Mills,
Tom Clark
Abstract:
Gas turbine engines are complex machines that typically generate a vast amount of data, and require careful monitoring to allow for cost-effective preventative maintenance. In aerospace applications, returning all measured data to ground is prohibitively expensive, often causing useful, high value, data to be discarded. The ability to detect, prioritise, and return useful data in real-time is ther…
▽ More
Gas turbine engines are complex machines that typically generate a vast amount of data, and require careful monitoring to allow for cost-effective preventative maintenance. In aerospace applications, returning all measured data to ground is prohibitively expensive, often causing useful, high value, data to be discarded. The ability to detect, prioritise, and return useful data in real-time is therefore vital. This paper proposes that system output measurements, described by a convolutional neural network model of normality, are prioritised in real-time for the attention of preventative maintenance decision makers.
Due to the complexity of gas turbine engine time-varying behaviours, deriving accurate physical models is difficult, and often leads to models with low prediction accuracy and incompatibility with real-time execution. Data-driven modelling is a desirable alternative producing high accuracy, asset specific models without the need for derivation from first principles.
We present a data-driven system for online detection and prioritisation of anomalous data. Biased data assessment deriving from novel operating conditions is avoided by uncertainty management integrated into the deep neural predictive model. Testing is performed on real and synthetic data, showing sensitivity to both real and synthetic faults. The system is capable of running in real-time on low-power embedded hardware and is currently in deployment on the Rolls-Royce Pearl 15 engine flight trials.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT
Authors:
Zaiqiao Meng,
Fangyu Liu,
Thomas Hikaru Clark,
Ehsan Shareghi,
Nigel Collier
Abstract:
Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowl…
▽ More
Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowledge for a target task, these sub-graph adapters are further fine-tuned along with the underlying BERT through a mixture layer. We evaluate our MoP with three biomedical BERTs (SciBERT, BioBERT, PubmedBERT) on six downstream tasks (inc. NLI, QA, Classification), and the results show that our MoP consistently enhances the underlying BERTs in task performance, and achieves new SOTA performances on five evaluated datasets.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Software Citation Implementation Challenges
Authors:
Daniel S. Katz,
Daina Bouquin,
Neil P. Chue Hong,
Jessica Hausman,
Catherine Jones,
Daniel Chivvis,
Tim Clark,
Mercè Crosas,
Stephan Druskat,
Martin Fenner,
Tom Gillespie,
Alejandra Gonzalez-Beltran,
Morane Gruenpeter,
Ted Habermann,
Robert Haines,
Melissa Harrison,
Edwin Henneken,
Lorraine Hwang,
Matthew B. Jones,
Alastair A. Kelly,
David N. Kennedy,
Katrin Leinweber,
Fernando Rios,
Carly B. Robinson,
Ilian Todorov
, et al. (2 additional authors not shown)
Abstract:
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit…
▽ More
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
A Basic Model of KBS Software
Authors:
Tony Clark
Abstract:
The Euclid 6.2 project MOSES addressed quality issues in the development of military KBS. A contribution to this project was to develop a computational model of KBS that could be used to define and analyze aspects of KBS quality. Since a key characteristic of KBS is search, a computational model based on non-determinism was developed and used to express terms relating to quality. This research rep…
▽ More
The Euclid 6.2 project MOSES addressed quality issues in the development of military KBS. A contribution to this project was to develop a computational model of KBS that could be used to define and analyze aspects of KBS quality. Since a key characteristic of KBS is search, a computational model based on non-determinism was developed and used to express terms relating to quality. This research report describes the approach.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
Metaclasses and Reflection in Smalltalk
Authors:
Tony Clark
Abstract:
Many Object Oriented Programming Languages provide reflective features which may be used to control the interpretive mechanism of the language. Often these features are defined with respect to a golden braid consisting of objects classes and meta-classes. This report reviews the Smalltalk golden braid and generalize it for multiple inheritance leading to choices between many different inheritance…
▽ More
Many Object Oriented Programming Languages provide reflective features which may be used to control the interpretive mechanism of the language. Often these features are defined with respect to a golden braid consisting of objects classes and meta-classes. This report reviews the Smalltalk golden braid and generalize it for multiple inheritance leading to choices between many different inheritance strategies. The reflective features of Smalltalk cannot affect the basic mechanisms of inheritance and so an arbitrary choice must be made for multiple inheritance. A language is described in which the reflective features of Smalltalk are extended so as to allow programmer defined inheritance strategies.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
EBG: A Lazy Functional Programming Language Implemented on the Java Virtual Machine
Authors:
Tony Clark
Abstract:
This technical report describes the implementation of a lazy functional programming language on the Java VM.
This technical report describes the implementation of a lazy functional programming language on the Java VM.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
A General Architecture for Heterogeneous Language Engineering and Projectional Editor Support
Authors:
Tony Clark
Abstract:
Tool support for language engineering has typically prioritises concrete syntax over abstract syntax by providing meta-languages for expressing concrete syntax and then map** concrete to abstract structures. Text-based languages are usually specified using a BNF-like language used to generate a syntax-aware editor that includes features such as keyword completion. Similarly, graphical languages…
▽ More
Tool support for language engineering has typically prioritises concrete syntax over abstract syntax by providing meta-languages for expressing concrete syntax and then map** concrete to abstract structures. Text-based languages are usually specified using a BNF-like language used to generate a syntax-aware editor that includes features such as keyword completion. Similarly, graphical languages are defined using a declarative graphical syntax language, producing an editor that supports features such as shapes, graphs and edges. Projectional editors invert traditional approaches by prioritising abstract over concrete syntax. This paper describes a projectional meta-tool architecture, including general purpose abstract and concrete meta-languages, that uses declarative rules to integrate the syntax and tool support for a range of heterogeneous languages. The architecture has been implemented in Racket and the paper illustrates the architecture with concrete examples.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Meta-Packages: Painless Domain Specific Languages
Authors:
Tony Clark
Abstract:
Domain Specific Languages are used to provide a tailored modelling notation for a specific application domain. There are currently two main approaches to DSLs: standard notations that are tailored by adding simple properties; new notations that are designed from scratch. There are problems with both of these approaches which can be addressed by providing access to a small meta-language based on pa…
▽ More
Domain Specific Languages are used to provide a tailored modelling notation for a specific application domain. There are currently two main approaches to DSLs: standard notations that are tailored by adding simple properties; new notations that are designed from scratch. There are problems with both of these approaches which can be addressed by providing access to a small meta-language based on packages and classes. A meta-modelling approach based on meta-packages allows a wide range of DSLs to be defined in a standard way. The DSLs can be processed using standard object-based extension at the meta-level and existing tooling can easily be defined to adapt to the new languages. This paper introduces the concept of meta-packages and provides a simple example.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Model Driven Reactive Applications
Authors:
Tony Clark,
Dean Kramer,
Samia Oussena
Abstract:
Reactive applications (rapps) are of interest because of the explosion of mobile, tablet and web-based platforms. The complexity and proliferation of implementation technologies makes it attractive to use model-driven techniques to develop rapp systems. This article proposes a domain specific language for rapps consisting of stereotyped class models for the structure of the application and state m…
▽ More
Reactive applications (rapps) are of interest because of the explosion of mobile, tablet and web-based platforms. The complexity and proliferation of implementation technologies makes it attractive to use model-driven techniques to develop rapp systems. This article proposes a domain specific language for rapps consisting of stereotyped class models for the structure of the application and state machine models for the application behaviour. The models are given a semantics in terms of a transformation to a calculus called Widget. The languages are introduced using an example application for mobile phones.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Processing XML for Domain Specific Languages
Authors:
Tony Clark
Abstract:
XML is a standard and universal language for representing information. XML processing is supported by two key frameworks: DOM and SAX. SAX is efficient, but leaves the developer to encode much of the processing. This paper introduces a language for expressing XML-based languages via grammars that can be used to process XML documents and synthesize arbitrary values. The language is declarative and…
▽ More
XML is a standard and universal language for representing information. XML processing is supported by two key frameworks: DOM and SAX. SAX is efficient, but leaves the developer to encode much of the processing. This paper introduces a language for expressing XML-based languages via grammars that can be used to process XML documents and synthesize arbitrary values. The language is declarative and shields the developer from SAX implementation details. The language is specified and an efficient implementation is defined as an abstract machine.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Super-Languages: Develo** Languages and Applications with XMF (Second Edition)
Authors:
Tony Clark,
Paul Sammut,
James Willans
Abstract:
The aim of this book is to introduce the language XMF. This is done by defining the language, providing some examples of applications that can be written directly in the XOCL language that comes with XMF, and then by showing how XMF can be used for language engineering. The main focus of this book is on language engineering by example.
The aim of this book is to introduce the language XMF. This is done by defining the language, providing some examples of applications that can be written directly in the XOCL language that comes with XMF, and then by showing how XMF can be used for language engineering. The main focus of this book is on language engineering by example.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Applied Metamodelling: A Foundation for Language Driven Development (Third Edition)
Authors:
Tony Clark,
Paul Sammut,
James Willans
Abstract:
Modern day system developers have some serious problems to contend with. The systems they develop are becoming increasingly complex as customers demand richer functionality delivered in ever shorter timescales. They have to manage a huge diversity of implementation technologies, design techniques and development processes: everything from scripting languages to web-services to the latest 'silver b…
▽ More
Modern day system developers have some serious problems to contend with. The systems they develop are becoming increasingly complex as customers demand richer functionality delivered in ever shorter timescales. They have to manage a huge diversity of implementation technologies, design techniques and development processes: everything from scripting languages to web-services to the latest 'silver bullet' design abstraction. To add to that, nothing stays still: today's 'must have' technology rapidly becomes tomorrow's legacy problem that must be managed along with everything else. How can these problems be dealt with? In this book we propose that there is a common foundation to their resolution: languages. Languages are the primary way in which system developers communicate, design and implement systems. Languages provide abstractions that can encapsulate complexity, embrace the diversity of technologies and design abstractions, and unite modern and legacy systems.
△ Less
Submitted 1 May, 2015;
originally announced May 2015.
-
Report on the Aachen OCL Meeting
Authors:
Achim D. Brucker,
Dan Chiorean,
Tony Clark,
Birgit Demuth,
Martin Gogolla,
Dimitri Plotnikov,
Bernhard Rumpe,
Edward D. Willink,
Burkhart Wolff
Abstract:
As a continuation of the OCL workshop during the MODELS 2013 conference in October 2013, a number of OCL experts decided to meet in November 2013 in Aachen for two days to discuss possible short term improvements of OCL for an upcoming OMG meeting and to envision possible future long-term developments of the language. This paper is a sort of \minutes of the meeting" and intended to quickly inform…
▽ More
As a continuation of the OCL workshop during the MODELS 2013 conference in October 2013, a number of OCL experts decided to meet in November 2013 in Aachen for two days to discuss possible short term improvements of OCL for an upcoming OMG meeting and to envision possible future long-term developments of the language. This paper is a sort of \minutes of the meeting" and intended to quickly inform the OCL community about the discussion topics.
△ Less
Submitted 25 August, 2014;
originally announced August 2014.
-
Web Annotation as a First Class Object
Authors:
Paolo Ciccarese,
Stian Soiland-Reyes,
Tim Clark
Abstract:
Scholars have made handwritten notes and comments in books and manuscripts for centuries. Today's blogs and news sites typically invite users to express their opinions on the published content; URLs allow web resources to be shared with accompanying annotations and comments using third-party services like Twitter or Facebook. These contributions have until recently been constrained within specific…
▽ More
Scholars have made handwritten notes and comments in books and manuscripts for centuries. Today's blogs and news sites typically invite users to express their opinions on the published content; URLs allow web resources to be shared with accompanying annotations and comments using third-party services like Twitter or Facebook. These contributions have until recently been constrained within specific services, making them second-class citizens of the Web.
Web Annotations are now emerging as fully independent Linked Data in their own right, no longer restricted to plain textual comments in application silos. Annotations can now range from bookmarks and comments, to fine-grained annotations of a selection of, for example, a section of a frame within a video stream. Technologies and standards now exist to create, publish, syndicate, mash-up and consume, finely targeted, semantically rich digital annotations on practically any content, as first-class Web citizens. This development is being driven by the need for collaboration and annotation reuse amongst domain researchers, computer scientists, scientific publishers, and scholarly content databases.
△ Less
Submitted 19 March, 2019; v1 submitted 24 October, 2013;
originally announced October 2013.
-
Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications
Authors:
Tim Clark,
Paolo N. Ciccarese,
Carole A. Goble
Abstract:
The Micropublications semantic model for scientific claims, evidence, argumentation and annotation in biomedical publications, is a metadata model of scientific argumentation, designed to support several key requirements for exchange and value-addition of semantic metadata across the biomedical publications ecosystem.
Micropublications allow formalizing the argument structure of scientific publi…
▽ More
The Micropublications semantic model for scientific claims, evidence, argumentation and annotation in biomedical publications, is a metadata model of scientific argumentation, designed to support several key requirements for exchange and value-addition of semantic metadata across the biomedical publications ecosystem.
Micropublications allow formalizing the argument structure of scientific publications so that (a) their internal structure is semantically clear and computable; (b) citation networks can be easily constructed across large corpora; (c) statements can be formalized in multiple useful abstraction models; (d) statements in one work may cite statements in another, individually; (e) support, similarity and challenge of assertions can be modelled across corpora; (f) scientific assertions, particularly in review articles, may be transitively closed to supporting evidence and methods.
The model supports natural language statements; data; methods and materials specifications; discussion and commentary; as well as challenge and disagreement. A detailed analysis of nine use cases is provided, along with an implementation in OWL 2 and SWRL, with several example instantiations in RDF.
△ Less
Submitted 2 February, 2014; v1 submitted 15 May, 2013;
originally announced May 2013.
-
PAV ontology: Provenance, Authoring and Versioning
Authors:
Paolo Ciccarese,
Stian Soiland-Reyes,
Khalid Belhajjame,
Alasdair J G Gray,
Carole Goble,
Tim Clark
Abstract:
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to…
▽ More
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. We identify the specific need for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.
We present the Provenance, Authoring and Versioning ontology (PAV): a lightweight ontology for capturing just enough descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present map**s that show how PAV extends the PROV-O ontology to support broader interoperability.
The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible.
We analyze and compare PAV with related approaches, namely Provenance Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their differences with PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS map**s that align PAV with DC Terms.
△ Less
Submitted 6 December, 2013; v1 submitted 26 April, 2013;
originally announced April 2013.