Search | arXiv e-print repository

arXiv:2405.09605 [pdf, other]

Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models

Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

Abstract: The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/i… ▽ More The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/implausible context. EWOK targets specific concepts from multiple knowledge domains known to be vital for world modeling in humans. Domains range from social interactions (help/hinder) to spatial relations (left/right). Both, contexts and targets are minimal pairs. Objects, agents, and locations in the items can be flexibly filled in enabling easy generation of multiple controlled datasets. We then introduce EWOK-CORE-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 openweights large language models (1.3B--70B parameters) across a battery of evaluation paradigms along with a human norming study comprising 12,480 measurements. The overall performance of all tested models is worse than human performance, with results varying drastically across domains. These data highlight simple cases where even large models fail and present rich avenues for targeted research on LLM world modeling capabilities. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 21 pages (11 main), 7 figures. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally

arXiv:2310.00804 [pdf, other]

doi 10.5194/wes-9-883-2024

Knowledge Engineering for Wind Energy

Authors: Yuriy Marykovskiy, Thomas Clark, Justin Day, Marcus Wiens, Charles Henderson, Julian Quick, Imad Abdallah, Anna Maria Sempreviva, Jean-Paul Calbimonte, Eleni Chatzi, Sarah Barber

Abstract: With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it a… ▽ More With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it available for use in next generation artificially intelligent systems. To this end, this article highlights the role that knowledge engineering can play in the process of digital transformation of the wind energy sector. It presents the main concepts underpinning Knowledge-Based Systems and summarises previous work in the areas of knowledge engineering and knowledge representation in a manner that is relevant and accessible to domain experts. A systematic analysis of the current state-of-the-art on knowledge engineering in the wind energy domain is performed, with available tools put into perspective by establishing the main domain actors and their needs and identifying key problematic areas. Finally, guidelines for further development and improvement are provided. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Journal ref: Wind Energ. Sci. 9 (2024) 883-917

arXiv:2306.03734 [pdf, other]

A Cross-Linguistic Pressure for Uniform Information Density in Word Order

Authors: Thomas Hikaru Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell, Roger Levy

Abstract: While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the unifor… ▽ More While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages. △ Less

Submitted 9 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2303.12873 [pdf, other]

From Compact Plasma Particle Sources to Advanced Accelerators with Modeling at Exascale

Authors: Axel Huebl, Remi Lehe, Edoardo Zoni, Olga Shapoval, Ryan T. Sandberg, Marco Garten, Arianna Formenti, Revathi Jambunathan, Prabhat Kumar, Kevin Gott, Andrew Myers, Weiqun Zhang, Ann Almgren, Chad E. Mitchell, Ji Qiang, David Grote, Alexander Sinn, Severin Diederichs, Maxence Thevenet, Luca Fedeli, Thomas Clark, Neil Zaim, Henri Vincenti, Jean-Luc Vay

Abstract: Develo** complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using… ▽ More Develo** complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using the US DOE Exascale Computing Project WarpX. Leveraging developments for Exascale, the new DOE SCIDAC-5 Consortium for Advanced Modeling of Particle Accelerators (CAMPA) will advance numerical algorithms and accelerate community modeling codes in a cohesive manner: from beam source, over energy boost, transport, injection, storage, to application or interaction. Such start-to-end modeling will enable the exploration of hybrid accelerators, with conventional and advanced elements, as the next step for advanced accelerator modeling. Following open community standards, we seed an open ecosystem of codes that can be readily combined with each other and machine learning frameworks. These will cover ultrafast to ultraprecise modeling for future hybrid accelerator design, even enabling virtual test stands and twins of accelerators that can be used in operations. △ Less

Submitted 18 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 4 pages, 3 figures, presented at the 20th Advanced Accelerator Concepts Workshop (AAC22)

arXiv:2301.01017 [pdf, other]

Through-life Monitoring of Resource-constrained Systems and Fleets

Authors: Felipe Montana, Adam Hartwell, Will Jacobs, Visakan Kadirkamanathan, Andrew R Mills, Tom Clark

Abstract: A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges… ▽ More A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2203.17213 [pdf, other]

Analyzing Wrap-Up Effects through an Information-Theoretic Lens

Authors: Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy

Abstract: Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension. However, data measured on words at the end of a sentence -- or even at the end of a clause -- is often omitted due to the confounding factors introduced by so-called "wrap-up effects," which manifests as a skewed distribution of RTs for t… ▽ More Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension. However, data measured on words at the end of a sentence -- or even at the end of a clause -- is often omitted due to the confounding factors introduced by so-called "wrap-up effects," which manifests as a skewed distribution of RTs for these words. Consequently, the understanding of the cognitive processes that might be involved in these wrap-up effects is limited. In this work, we attempt to learn more about these processes by examining the relationship between wrap-up effects and information-theoretic quantities, such as word and context surprisals. We find that the distribution of information in prior contexts is often predictive of sentence- and clause-final RTs (while not of sentence-medial RTs). This lends support to several prior hypotheses about the processes involved in wrap-up effects. △ Less

Submitted 5 January, 2024; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: ACL 2022 (main conference)

arXiv:2112.03765 [pdf, other]

In-flight Novelty Detection with Convolutional Neural Networks

Authors: Adam Hartwell, Felipe Montana, Will Jacobs, Visakan Kadirkamanathan, Andrew R Mills, Tom Clark

Abstract: Gas turbine engines are complex machines that typically generate a vast amount of data, and require careful monitoring to allow for cost-effective preventative maintenance. In aerospace applications, returning all measured data to ground is prohibitively expensive, often causing useful, high value, data to be discarded. The ability to detect, prioritise, and return useful data in real-time is ther… ▽ More Gas turbine engines are complex machines that typically generate a vast amount of data, and require careful monitoring to allow for cost-effective preventative maintenance. In aerospace applications, returning all measured data to ground is prohibitively expensive, often causing useful, high value, data to be discarded. The ability to detect, prioritise, and return useful data in real-time is therefore vital. This paper proposes that system output measurements, described by a convolutional neural network model of normality, are prioritised in real-time for the attention of preventative maintenance decision makers. Due to the complexity of gas turbine engine time-varying behaviours, deriving accurate physical models is difficult, and often leads to models with low prediction accuracy and incompatibility with real-time execution. Data-driven modelling is a desirable alternative producing high accuracy, asset specific models without the need for derivation from first principles. We present a data-driven system for online detection and prioritisation of anomalous data. Biased data assessment deriving from novel operating conditions is avoided by uncertainty management integrated into the deep neural predictive model. Testing is performed on real and synthetic data, showing sensitivity to both real and synthetic faults. The system is capable of running in real-time on low-power embedded hardware and is currently in deployment on the Rolls-Royce Pearl 15 engine flight trials. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2109.04810 [pdf, other]

Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT

Authors: Zaiqiao Meng, Fangyu Liu, Thomas Hikaru Clark, Ehsan Shareghi, Nigel Collier

Abstract: Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowl… ▽ More Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowledge for a target task, these sub-graph adapters are further fine-tuned along with the underlying BERT through a mixture layer. We evaluate our MoP with three biomedical BERTs (SciBERT, BioBERT, PubmedBERT) on six downstream tasks (inc. NLI, QA, Classification), and the results show that our MoP consistently enhances the underlying BERTs in task performance, and achieves new SOTA performances on five evaluated datasets. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: EMNLP 2021 camera-ready version

arXiv:1905.08674 [pdf]

Software Citation Implementation Challenges

Authors: Daniel S. Katz, Daina Bouquin, Neil P. Chue Hong, Jessica Hausman, Catherine Jones, Daniel Chivvis, Tim Clark, Mercè Crosas, Stephan Druskat, Martin Fenner, Tom Gillespie, Alejandra Gonzalez-Beltran, Morane Gruenpeter, Ted Habermann, Robert Haines, Melissa Harrison, Edwin Henneken, Lorraine Hwang, Matthew B. Jones, Alastair A. Kelly, David N. Kennedy, Katrin Leinweber, Fernando Rios, Carly B. Robinson, Ilian Todorov , et al. (2 additional authors not shown)

Abstract: The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit… ▽ More The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1804.07273 [pdf, ps]

A Basic Model of KBS Software

Authors: Tony Clark

Abstract: The Euclid 6.2 project MOSES addressed quality issues in the development of military KBS. A contribution to this project was to develop a computational model of KBS that could be used to define and analyze aspects of KBS quality. Since a key characteristic of KBS is search, a computational model based on non-determinism was developed and used to express terms relating to quality. This research rep… ▽ More The Euclid 6.2 project MOSES addressed quality issues in the development of military KBS. A contribution to this project was to develop a computational model of KBS that could be used to define and analyze aspects of KBS quality. Since a key characteristic of KBS is search, a computational model based on non-determinism was developed and used to express terms relating to quality. This research report describes the approach. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1804.07272 [pdf, ps]

Metaclasses and Reflection in Smalltalk

Authors: Tony Clark

Abstract: Many Object Oriented Programming Languages provide reflective features which may be used to control the interpretive mechanism of the language. Often these features are defined with respect to a golden braid consisting of objects classes and meta-classes. This report reviews the Smalltalk golden braid and generalize it for multiple inheritance leading to choices between many different inheritance… ▽ More Many Object Oriented Programming Languages provide reflective features which may be used to control the interpretive mechanism of the language. Often these features are defined with respect to a golden braid consisting of objects classes and meta-classes. This report reviews the Smalltalk golden braid and generalize it for multiple inheritance leading to choices between many different inheritance strategies. The reflective features of Smalltalk cannot affect the basic mechanisms of inheritance and so an arbitrary choice must be made for multiple inheritance. A language is described in which the reflective features of Smalltalk are extended so as to allow programmer defined inheritance strategies. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1804.07271 [pdf, ps]

EBG: A Lazy Functional Programming Language Implemented on the Java Virtual Machine

Authors: Tony Clark

Abstract: This technical report describes the implementation of a lazy functional programming language on the Java VM. This technical report describes the implementation of a lazy functional programming language on the Java VM. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1506.03398 [pdf, other]

A General Architecture for Heterogeneous Language Engineering and Projectional Editor Support

Authors: Tony Clark

Abstract: Tool support for language engineering has typically prioritises concrete syntax over abstract syntax by providing meta-languages for expressing concrete syntax and then map** concrete to abstract structures. Text-based languages are usually specified using a BNF-like language used to generate a syntax-aware editor that includes features such as keyword completion. Similarly, graphical languages… ▽ More Tool support for language engineering has typically prioritises concrete syntax over abstract syntax by providing meta-languages for expressing concrete syntax and then map** concrete to abstract structures. Text-based languages are usually specified using a BNF-like language used to generate a syntax-aware editor that includes features such as keyword completion. Similarly, graphical languages are defined using a declarative graphical syntax language, producing an editor that supports features such as shapes, graphs and edges. Projectional editors invert traditional approaches by prioritising abstract over concrete syntax. This paper describes a projectional meta-tool architecture, including general purpose abstract and concrete meta-languages, that uses declarative rules to integrate the syntax and tool support for a range of heterogeneous languages. The architecture has been implemented in Racket and the paper illustrates the architecture with concrete examples. △ Less