-
KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery
Authors:
Shinnosuke Tanaka,
James Barry,
Vishnudev Kuruvanthodi,
Movina Moses,
Maxwell J. Giammona,
Nathan Herr,
Mohab Elkaref,
Geeth De Mel
Abstract:
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation…
▽ More
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. Named Entity Recognition (NER) and Relation Classification (RC) models can be trained on the resulting annotations and can be used to annotate the unannotated portion of the documents. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data. Furthermore, we integrate a suite of Large Language Models (LLMs) that can be used for QA and summarisation that is grounded in the included documents via a retrieval component. KnowledgeHub is a unique tool that supports annotation, IE and QA, which gives the user full insight into the knowledge discovery pipeline.
△ Less
Submitted 17 June, 2024; v1 submitted 16 May, 2024;
originally announced June 2024.
-
Encoding Seasonal Climate Predictions for Demand Forecasting with Modular Neural Network
Authors:
Smit Marvaniya,
Jitendra Singh,
Nicolas Galichet,
Fred Ochieng Otieno,
Geeth De Mel,
Kommy Weldemariam
Abstract:
Current time-series forecasting problems use short-term weather attributes as exogenous inputs. However, in specific time-series forecasting solutions (e.g., demand prediction in the supply chain), seasonal climate predictions are crucial to improve its resilience. Representing mid to long-term seasonal climate forecasts is challenging as seasonal climate predictions are uncertain, and encoding sp…
▽ More
Current time-series forecasting problems use short-term weather attributes as exogenous inputs. However, in specific time-series forecasting solutions (e.g., demand prediction in the supply chain), seasonal climate predictions are crucial to improve its resilience. Representing mid to long-term seasonal climate forecasts is challenging as seasonal climate predictions are uncertain, and encoding spatio-temporal relationship of climate forecasts with demand is complex.
We propose a novel modeling framework that efficiently encodes seasonal climate predictions to provide robust and reliable time-series forecasting for supply chain functions. The encoding framework enables effective learning of latent representations -- be it uncertain seasonal climate prediction or other time-series data (e.g., buyer patterns) -- via a modular neural network architecture. Our extensive experiments indicate that learning such representations to model seasonal climate forecast results in an error reduction of approximately 13\% to 17\% across multiple real-world data sets compared to existing demand forecasting methods.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
A framework for fostering transparency in shared artificial intelligence models by increasing visibility of contributions
Authors:
Iain Barclay,
Harrison Taylor,
Alun Preece,
Ian Taylor,
Dinesh Verma,
Geeth de Mel
Abstract:
Increased adoption of artificial intelligence (AI) systems into scientific workflows will result in an increasing technical debt as the distance between the data scientists and engineers who develop AI system components and scientists, researchers and other users grows. This could quickly become problematic, particularly where guidance or regulations change and once-acceptable best practice become…
▽ More
Increased adoption of artificial intelligence (AI) systems into scientific workflows will result in an increasing technical debt as the distance between the data scientists and engineers who develop AI system components and scientists, researchers and other users grows. This could quickly become problematic, particularly where guidance or regulations change and once-acceptable best practice becomes outdated, or where data sources are later discredited as biased or inaccurate. This paper presents a novel method for deriving a quantifiable metric capable of ranking the overall transparency of the process pipelines used to generate AI systems, such that users, auditors and other stakeholders can gain confidence that they will be able to validate and trust the data sources and contributors in the AI systems that they rely on. The methodology for calculating the metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are evaluated through models published at ModelHub and PyTorch Hub, popular archives for sharing science resources, and is found to be helpful in driving consideration of the contributions made to generating AI systems and approaches towards effective documentation and improving transparency in machine learning assets shared within scientific communities.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Federated Learning for Coalition Operations
Authors:
D. Verma,
S. Calo,
S. Witherspoon,
E. Bertino,
A. Abu Jabal,
A. Swami,
G. Cirincione,
S. Julier,
G. White,
G. de Mel,
G. Pearson
Abstract:
Machine Learning in coalition settings requires combining insights available from data assets and knowledge repositories distributed across multiple coalition partners. In tactical environments, this requires sharing the assets, knowledge and models in a bandwidth-constrained environment, while staying in conformance with the privacy, security and other applicable policies for each coalition membe…
▽ More
Machine Learning in coalition settings requires combining insights available from data assets and knowledge repositories distributed across multiple coalition partners. In tactical environments, this requires sharing the assets, knowledge and models in a bandwidth-constrained environment, while staying in conformance with the privacy, security and other applicable policies for each coalition member. Federated Machine Learning provides an approach for such sharing. In its simplest version, federated machine learning could exchange training data available among the different coalition members, with each partner deciding which part of the training data from other partners to accept based on the quality and value of the offered data. In a more sophisticated version, coalition partners may exchange models learnt locally, which need to be transformed, accepted in entirety or in part based on the quality and value offered by each model, and fused together into an integrated model. In this paper, we examine the challenges present in creating federated learning solutions in coalition settings, and present the different flavors of federated learning that we have created as part of our research in the DAIS ITA. The challenges addressed include dealing with varying quality of data and models, determining the value offered by the data/model of each coalition partner, addressing the heterogeneity in data representation, labeling and AI model architecture selected by different coalition members, and handling the varying levels of trust present among members of the coalition. We also identify some open problems that remain to be addressed to create a viable solution for federated learning in coalition environments.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Synthetic Ground Truth Generation for Evaluating Generative Policy Models
Authors:
Daniel Cunnington,
Graham White,
Geeth de Mel
Abstract:
Generative Policy-based Models aim to enable a coalition of systems, be they devices or services to adapt according to contextual changes such as environmental factors, user preferences and different tasks whilst adhering to various constraints and regulations as directed by a managing party or the collective vision of the coalition. Recent developments have proposed new architectures to realize t…
▽ More
Generative Policy-based Models aim to enable a coalition of systems, be they devices or services to adapt according to contextual changes such as environmental factors, user preferences and different tasks whilst adhering to various constraints and regulations as directed by a managing party or the collective vision of the coalition. Recent developments have proposed new architectures to realize the potential of GPMs but as the complexity of systems and their associated requirements increases, there is an emerging requirement to have scenarios and associated datasets to realistically evaluate GPMs with respect to the properties of the operating environment, be it the future battlespace or an autonomous organization. In order to address this requirement, in this paper, we present a method of applying an agile knowledge representation framework to model requirements, both individualistic and collective that enables synthetic generation of ground truth data such that advanced GPMs can be evaluated robustly in complex environments. We also release conceptual models, annotated datasets, as well as means to extend the data generation approach so that similar datasets can be developed for varying complexities and different situations.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Feature-based reformulation of entities in triple pattern queries
Authors:
Amar Viswanathan,
Geeth de Mel,
James A. Hendler
Abstract:
Knowledge graphs encode uniquely identifiable entities to other entities or literal values by means of relationships, thus enabling semantically rich querying over the stored data. Typically, the semantics of such queries are often crisp thereby resulting in crisp answers. Query log statistics show that a majority of the queries issued to knowledge graphs are often entity centric queries. When a u…
▽ More
Knowledge graphs encode uniquely identifiable entities to other entities or literal values by means of relationships, thus enabling semantically rich querying over the stored data. Typically, the semantics of such queries are often crisp thereby resulting in crisp answers. Query log statistics show that a majority of the queries issued to knowledge graphs are often entity centric queries. When a user needs additional answers the state-of-the-art in assisting users is to rewrite the original query resulting in a set of approximations. Several strategies have been proposed in past to address this. They typically move up the taxonomy to relax a specific element to a more generic element. Entities don't have a taxonomy and they end up being generalized. To address this issue, in this paper, we propose an entity centric reformulation strategy that utilizes schema information and entity features present in the graph to suggest rewrites. Once the features are identified, the entity in concern is reformulated as a set of features. Since entities can have a large number of features, we introduce strategies that select the top-k most relevant and {informative ranked features and augment them to the original query to create a valid reformulation. We then evaluate our approach by showing that our reformulation strategy produces results that are more informative when compared with state-of-the-art
△ Less
Submitted 4 July, 2018;
originally announced July 2018.