Search | arXiv e-print repository

A Comprehensive Approach to Ensuring Quality in Spreadsheet-Based Metadata

Authors: Martin J. O'Connor, Marcos Martínez-Romero, Mete Ugur Akdogan, Josef Hardi, Mark A. Musen

Abstract: While scientists increasingly recognize the importance of metadata in describing their data, spreadsheets remain the preferred tool for supplying this information despite their limitations in ensuring compliance and quality. Various tools have been developed to address these limitations, but they suffer from their own shortcomings, such as steep learning curves and limited customization. In this p… ▽ More While scientists increasingly recognize the importance of metadata in describing their data, spreadsheets remain the preferred tool for supplying this information despite their limitations in ensuring compliance and quality. Various tools have been developed to address these limitations, but they suffer from their own shortcomings, such as steep learning curves and limited customization. In this paper, we describe an end-to-end approach that supports spreadsheet-based entry of metadata while providing rigorous compliance and quality control. Our approach employs several key strategies, including customizable templates for defining metadata, integral support for the use of controlled terminologies when defining these templates, and an interactive Web-based tool that allows users to rapidly identify and fix errors in the spreadsheet-based metadata they supply. We demonstrate how this approach is being deployed in a biomedical consortium to define and collect metadata about scientific experiments. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2208.02836 [pdf]

Modeling community standards for metadata as templates makes data FAIR

Authors: Mark A. Musen, Martin J. O'Connor, Erik Schultes, Marcos Martinez-Romero, Josef Hardi, John Graybeal

Abstract: It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to defin… ▽ More It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets--both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing. △ Less

Submitted 14 October, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: 20 pages, 1 table, 5 figures

arXiv:2107.06396 [pdf, other]

Forecasting Thermoacoustic Instabilities in Liquid Propellant Rocket Engines Using Multimodal Bayesian Deep Learning

Authors: Ushnish Sengupta, Günther Waxenegger-Wilfing, Jan Martin, Justin Hardi, Matthew P. Juniper

Abstract: The 100 MW cryogenic liquid oxygen/hydrogen multi-injector combustor BKD operated by the DLR Institute of Space Propulsion is a research platform that allows the study of thermoacoustic instabilities under realistic conditions, representative of small upper stage rocket engines. We use data from BKD experimental campaigns in which the static chamber pressure and fuel-oxidizer ratio are varied such… ▽ More The 100 MW cryogenic liquid oxygen/hydrogen multi-injector combustor BKD operated by the DLR Institute of Space Propulsion is a research platform that allows the study of thermoacoustic instabilities under realistic conditions, representative of small upper stage rocket engines. We use data from BKD experimental campaigns in which the static chamber pressure and fuel-oxidizer ratio are varied such that the first tangential mode of the combustor is excited under some conditions. We train an autoregressive Bayesian neural network model to forecast the amplitude of the dynamic pressure time series, inputting multiple sensor measurements (injector pressure/ temperature measurements, static chamber pressure, high-frequency dynamic pressure measurements, high-frequency OH* chemiluminescence measurements) and future flow rate control signals. The Bayesian nature of our algorithms allows us to work with a dataset whose size is restricted by the expense of each experimental run, without making overconfident extrapolations. We find that the networks are able to accurately forecast the evolution of the pressure amplitude and anticipate instability events on unseen experimental runs 500 milliseconds in advance. We compare the predictive accuracy of multiple models using different combinations of sensor inputs. We find that the high-frequency dynamic pressure signal is particularly informative. We also use the technique of integrated gradients to interpret the influence of different sensor inputs on the model prediction. The negative log-likelihood of data points in the test dataset indicates that predictive uncertainties are well-characterized by our Bayesian model and simulating a sensor failure event results as expected in a dramatic increase in the epistemic component of the uncertainty. △ Less

Submitted 1 August, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2011.14985 [pdf, other]

doi 10.1063/5.0038817

Early Detection of Thermoacoustic Instabilities in a Cryogenic Rocket Thrust Chamber using Combustion Noise Features and Machine Learning

Authors: Günther Waxenegger-Wilfing, Ushnish Sengupta, Jan Martin, Wolfgang Armbruster, Justin Hardi, Matthew Juniper, Michael Oschwald

Abstract: Combustion instabilities are particularly problematic for rocket thrust chambers because of their high energy release rates and their operation close to the structural limits. In the last decades, progress has been made in predicting high amplitude combustion instabilities but still, no reliable prediction ability is given. Reliable early warning signals are the main requirement for active combust… ▽ More Combustion instabilities are particularly problematic for rocket thrust chambers because of their high energy release rates and their operation close to the structural limits. In the last decades, progress has been made in predicting high amplitude combustion instabilities but still, no reliable prediction ability is given. Reliable early warning signals are the main requirement for active combustion control systems. In this paper, we present a data-driven method for the early detection of thermoacoustic instabilities. Recurrence quantification analysis is used to calculate characteristic combustion features from short-length time series of dynamic pressure sensor data. Features like the recurrence rate are used to train support vector machines to detect the onset of an instability a few hundred milliseconds in advance. The performance of the proposed method is investigated on experimental data from a representative LOX/H$_2$ research thrust chamber. In most cases, the method is able to timely predict two types of thermoacoustic instabilities on test data not used for training. The results are compared with state-of-the-art early warning indicators. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:1903.09270 [pdf]

doi 10.1093/database/baz059

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases

Authors: Marcos Martínez-Romero, Martin J. O'Connor, Attila L. Egyedi, Debra Willrett, Josef Hardi, John Graybeal, Mark A. Musen

Abstract: Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata… ▽ More Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information (NCBI) BioSample and European Bioinformatics Institute (EBI) BioSamples. The results show that our approach is able to use analyses of previous entered metadata coupled with ontology-based map**s to present users with accurate recommendations when authoring metadata. △ Less

Submitted 21 March, 2019; originally announced March 2019.

Showing 1–5 of 5 results for author: Hardi, J