Skip to main content

Showing 1–19 of 19 results for author: Musen, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.05893  [pdf

    cs.AI cs.CL cs.IR

    Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models

    Authors: Sowmya S. Sundaram, Benjamin Solomon, Avani Khatri, Anisha Laumas, Purvesh Khatri, Mark A. Musen

    Abstract: Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluati… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  2. arXiv:2312.09107  [pdf

    cs.DL

    A Comprehensive Approach to Ensuring Quality in Spreadsheet-Based Metadata

    Authors: Martin J. O'Connor, Marcos Martínez-Romero, Mete Ugur Akdogan, Josef Hardi, Mark A. Musen

    Abstract: While scientists increasingly recognize the importance of metadata in describing their data, spreadsheets remain the preferred tool for supplying this information despite their limitations in ensuring compliance and quality. Various tools have been developed to address these limitations, but they suffer from their own shortcomings, such as steep learning curves and limited customization. In this p… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  3. Making Metadata More FAIR Using Large Language Models

    Authors: Sowmya S. Sundaram, Mark A. Musen

    Abstract: With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical simil… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Journal ref: DaMaLOS 2023

  4. arXiv:2208.02836  [pdf

    cs.DL

    Modeling community standards for metadata as templates makes data FAIR

    Authors: Mark A. Musen, Martin J. O'Connor, Erik Schultes, Marcos Martinez-Romero, Josef Hardi, John Graybeal

    Abstract: It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to defin… ▽ More

    Submitted 14 October, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: 20 pages, 1 table, 5 figures

  5. arXiv:2105.07238  [pdf

    cs.IT

    Using Ethnographic Methods to Classify the Human Experience in Medicine: A Case Study of the Presence Ontology

    Authors: Amrapali Maitra, Maulik R. Kamdar, Donna M. Zulman, Marie C. Haverfield, Cati Brown-Johnson, Rachel Schwartz, Sonoo Thadaney Israni, Abraham Verghese, Mark A. Musen

    Abstract: Objective Although social and environmental factors are central to provider patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top down approach for ontology development based on the concept of re… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: 15 pages, 4 figures, 57 references

  6. arXiv:2007.14474  [pdf

    q-bio.QM cs.CL cs.DL

    Construction and Usage of a Human Body Common Coordinate Framework Comprising Clinical, Semantic, and Spatial Ontologies

    Authors: Katy Börner, Ellen M. Quardokus, Bruce W. Herr II, Leonard E. Cross, Elizabeth G. Record, Yingnan Ju, Andreas D. Bueckle, James P. Sluka, Jonathan C. Silverstein, Kristen M. Browne, Sanjay Jain, Clive H. Wasserfall, Marda L. Jorgensen, Jeffrey M. Spraggins, Nathan H. Patterson, Mark A. Musen, Griffin M. Weber

    Abstract: The National Institutes of Health's (NIH) Human Biomolecular Atlas Program (HuBMAP) aims to create a comprehensive high-resolution atlas of all the cells in the healthy human body. Multiple laboratories across the United States are collecting tissue specimens from different organs of donors who vary in sex, age, and body size. Integrating and harmonizing the data derived from these samples and 'ma… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: 24 pages with SI, 6 figures, 5 tables

  7. arXiv:2006.04161  [pdf, other

    cs.AI

    An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

    Authors: Maulik R. Kamdar, Mark A. Musen

    Abstract: While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Ope… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: Under Review at Nature Scientific Data

  8. Use of OWL and Semantic Web Technologies at Pinterest

    Authors: Rafael S. Gonçalves, Matthew Horridge, Rui Li, Yu Liu, Mark A. Musen, Csongor I. Nyulas, Evelyn Obamos, Dhananjay Shrouty, David Temple

    Abstract: Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pinterest, to help both content recommendation and a… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

  9. The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments

    Authors: Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, Mark A. Musen

    Abstract: The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed--the CEDAR Workbench--is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. T… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

  10. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases

    Authors: Marcos Martínez-Romero, Martin J. O'Connor, Attila L. Egyedi, Debra Willrett, Josef Hardi, John Graybeal, Mark A. Musen

    Abstract: Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  11. Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

    Authors: Rafael S. Gonçalves, Maulik R. Kamdar, Mark A. Musen

    Abstract: The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represe… ▽ More

    Submitted 16 May, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

  12. HopRank: How Semantic Structure Influences Teleportation in PageRank (A Case Study on BioPortal)

    Authors: Lisette Espín-Noboa, Florian Lemmerich, Simon Walk, Markus Strohmaier, Mark A. Musen

    Abstract: This paper introduces HopRank, an algorithm for modeling human navigation on semantic networks. HopRank leverages the assumption that users know or can see the whole structure of the network. Therefore, besides following links, they also follow nodes at certain distances (i.e., k-hop neighborhoods), and not at random as suggested by PageRank, which assumes only links are known or visible. We obser… ▽ More

    Submitted 15 March, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: Published at TheWebConf 2019 (WWW'19)

  13. arXiv:1902.11162  [pdf

    cs.DL

    The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data

    Authors: P. Wittenburg, H. Pergl Sustkova, A. Montesanti, S. M. Bloemers, S. H. de Waard, M. A. Musen, J. B. Graybeal, K. M. Hettne, A. Jacobsen, R. Pergl, R. W. W. Hooft, C. Staiger, C. W. G. van Gelder, S. L. Knijnenburg, A. C. van Arkel, B. Meerman, M. D. Wilkinson, S-A Sansone, P. Rocca-Serra, P. McQuilton, A. N. Gonzalez-Beltran, G. J. C. Aben, P. Henning, S. Alencar, C. Ribeiro , et al. (35 additional authors not shown)

    Abstract: There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for… ▽ More

    Submitted 6 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: This is a pre-print of the FAIR Funders pilot, an outcome of the first Metadata for Machines workshop, see: https://www.go-fair.org/resources/go-fair-workshop-series/metadata-for-machines-workshops/. Corresponding author: E. A Schultes, ORCID 0000-0001-8888-635X

  14. WebProtégé: A Cloud-Based Ontology Editor

    Authors: Matthew Horridge, Rafael S. Gonçalves, Csongor I. Nyulas, Tania Tudorache, Mark A. Musen

    Abstract: We present WebProtégé, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProtégé is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProtégeé currently hosts more than 68,000 OWL ontology projects and has over 50,000 user accounts. In this paper, we detail the main ne… ▽ More

    Submitted 5 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

  15. The variable quality of metadata about biological samples used in biomedical experiments

    Authors: Rafael S. Gonçalves, Mark A. Musen

    Abstract: We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample---a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples---a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4M sample metadata records… ▽ More

    Submitted 18 January, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

    Comments: arXiv admin note: text overlap with arXiv:1708.01286

  16. arXiv:1708.01286  [pdf

    cs.DB

    Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies

    Authors: Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, John Graybeal, Mark A. Musen

    Abstract: The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample--a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

  17. NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

    Authors: Marcos Martinez-Romero, Clement Jonquet, Martin J. O'Connor, John Graybeal, Alejandro Pazos, Mark A. Musen

    Abstract: Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) release… ▽ More

    Submitted 25 May, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

    Comments: 29 pages, 8 figures, 11 tables

    ACM Class: I.2.4

    Journal ref: Journal of Biomedical Semantics 8 (2017) 1-22

  18. arXiv:1407.2002  [pdf, ps, other

    cs.SI cs.AI cs.DL physics.data-an

    Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

    Authors: Simon Walk, Philipp Singer, Markus Strohmaier, Tania Tudorache, Mark A. Musen, Natalya F. Noy

    Abstract: Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For… ▽ More

    Submitted 29 February, 2016; v1 submitted 8 July, 2014; originally announced July 2014.

    Comments: Published in the Journal of Biomedical Informatics

  19. arXiv:1303.1482  [pdf

    cs.AI

    Graph-Grammar Assistance for Automated Generation of Influence Diagrams

    Authors: John W. Egar, Mark A. Musen

    Abstract: One of the most difficult aspects of modeling complex dilemmas in decision-analytic terms is composing a diagram of relevance relations from a set of domain concepts. Decision models in domains such as medicine, however, exhibit certain prototypical patterns that can guide the modeling process. Medical concepts can be classified according to semantic types that have characteristic positions and t… ▽ More

    Submitted 6 March, 2013; originally announced March 2013.

    Comments: Appears in Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (UAI1993)

    Report number: UAI-P-1993-PG-235-242