Skip to main content

Showing 1–28 of 28 results for author: Dumontier, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00116  [pdf, other

    cs.LG cs.AI

    Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges

    Authors: Mahmoud Ibrahim, Yasmina Al Khalil, Sina Amirrajab, Chang Sun, Marcel Breeuwer, Josien Pluim, Bart Elen, Gokhan Ertaylan, Michel Dumontier

    Abstract: This paper presents a comprehensive systematic review of generative models (GANs, VAEs, DMs, and LLMs) used to synthesize various medical data types, including imaging (dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray), text, time-series, and tabular data (EHR). Unlike previous narrowly focused reviews, our study encompasses a broad array of medical data modalities and explores various ge… ▽ More

    Submitted 2 July, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

  2. arXiv:2406.10144  [pdf, other

    cs.AI

    Improving rule mining via embedding-based link prediction

    Authors: N'Dah Jean Kouagou, Arif Yilmaz, Michel Dumontier, Axel-Cyrille Ngonga Ngomo

    Abstract: Rule mining on knowledge graphs allows for explainable link prediction. Contrarily, embedding-based methods for link prediction are well known for their generalization capabilities, but their predictions are not interpretable. Several approaches combining the two families have been proposed in recent years. The majority of the resulting hybrid approaches are usually trained within a unified learni… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 2 figures, 11 tables

  3. arXiv:2403.18572  [pdf, ps, other

    cs.SD eess.AS

    ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

    Authors: Gijs Wijngaard, Elia Formisano, Bruno L. Giordano, Michel Dumontier

    Abstract: Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have employed metrics derived from machine translation and image captioning to evaluate the quality of generated audio captions. Drawing inspiration from auditory cognit… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  4. arXiv:2311.04164  [pdf, other

    cs.CE

    Models towards Risk Behavior Prediction and Analysis: A Netherlands Case study

    Authors: Onaopepo Adekunle, Arno Riedl, Michel Dumontier

    Abstract: In many countries financial service providers have to elicit their customers risk preferences, when offering products and services. For instance, in the Netherlands pension funds will be legally obliged to factor in their clients risk preferences when devising their investment strategies. Therefore, assessing and measuring the risk preferences of individuals is critical for the analysis of individ… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  5. arXiv:2303.07429  [pdf, other

    cs.DL cs.CY

    FAIR Begins at home: Implementing FAIR via the Community Data Driven Insights

    Authors: Carlos Utrilla Guerrero, Maria Vivas Romero, Marc Dolman, Michel Dumontier

    Abstract: Arguments for the FAIR principles have mostly been based on appeals to values. However, the work of onboarding diverse researchers to make efficient and effective implementations of FAIR requires different appeals. In our recent effort to transform the institution into a FAIR University by 2025, here we report on the experiences of the Community of Data Driven Insights (CDDI). We describe these ex… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Presented at the First International Conference of FAIR Digital Objects (FDO2022)

  6. arXiv:2302.01041  [pdf, other

    cs.CY cs.AI

    TAPS Responsibility Matrix: A tool for responsible data science by design

    Authors: Visara Urovi, Remzi Celebi, Chang Sun, Linda Rieswijk, Michael Erard, Arif Yilmaz, Kody Moodley, Parveen Kumar, Michel Dumontier

    Abstract: Data science is an interdisciplinary research area where scientists are typically working with data coming from different fields. When using and analyzing data, the scientists implicitly agree to follow standards, procedures, and rules set in these fields. However, guidance on the responsibilities of the data scientists and the other involved actors in a data science project is typically missing.… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    MSC Class: I.2.1

  7. arXiv:2206.13787  [pdf, other

    cs.LG cs.AI cs.CR cs.DS

    Improving Correlation Capture in Generating Imbalanced Data using Differentially Private Conditional GANs

    Authors: Chang Sun, Johan van Soest, Michel Dumontier

    Abstract: Despite the remarkable success of Generative Adversarial Networks (GANs) on text, images, and videos, generating high-quality tabular data is still under development owing to some unique challenges such as capturing dependencies in imbalanced data, optimizing the quality of synthetic patient data while preserving privacy. In this paper, we propose DP-CGANS, a differentially private conditional GAN… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    ACM Class: I.2; E.0

  8. Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science

    Authors: Deepak R. Unni, Sierra A. T. Moxon, Michael Bada, Matthew Brush, Richard Bruskiewich, Paul Clemons, Vlado Dancik, Michel Dumontier, Karamarie Fecho, Gustavo Glusman, Jennifer J. Hadlock, Nomi L. Harris, Arpita Joshi, Tim Putman, Guangrong Qin, Stephen A. Ramsey, Kent A. Shefchek, Harold Solbrig, Karthik Soman, Anne T. Thessen, Melissa A. Haendel, Chris Bizon, Christopher J. Mungall, the Biomedical Data Translator Consortium

    Abstract: Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness between core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  9. arXiv:2203.06732  [pdf, other

    q-bio.QM cs.CE q-bio.MN

    BioSimulators: a central registry of simulation engines and services for recommending specific tools

    Authors: Bilal Shaikh, Lucian P. Smith, Dan Vasilescu, Gnaneswara Marupilla, Michael Wilson, Eran Agmon, Henry Agnew, Steven S. Andrews, Azraf Anwar, Moritz E. Beber, Frank T. Bergmann, David Brooks, Lutz Brusch, Laurence Calzone, Kiri Choi, Joshua Cooper, John Detloff, Brian Drawert, Michel Dumontier, G. Bard Ermentrout, James R. Faeder, Andrew P. Freiburger, Fabian Fröhlich, Akira Funahashi, Alan Garny , et al. (46 additional authors not shown)

    Abstract: Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Comments: 6 pages, 2 figures

  10. arXiv:2202.11646  [pdf, other

    cs.DC

    LUCE: A Blockchain-based data sharing platform for monitoring data license accountability and compliance

    Authors: Visara Urovi, Vikas Jaiman, Arno Angerer, Michel Dumontier

    Abstract: Easy access to data is one of the main avenues to accelerate scientific research. As a key element of scientific innovations, data sharing allows the reproduction of results, helps prevent data fabrication, falsification, and misuse. Although the research benefits from data reuse are widely acknowledged, the data collections existing today are still kept in silos. Indeed, monitoring what happens t… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 18 pages, 11 figures. arXiv admin note: text overlap with arXiv:1908.02287

  11. User-friendly Composition of FAIR Workflows in a Notebook Environment

    Authors: Robin A Richardson, Remzi Celebi, Sven van der Burg, Djura Smits, Lars Ridder, Michel Dumontier, Tobias Kuhn

    Abstract: There has been a large focus in recent years on making assets in scientific research findable, accessible, interoperable and reusable, collectively known as the FAIR principles. A particular area of focus lies in applying these principles to scientific computational workflows. Jupyter notebooks are a very popular medium by which to program and communicate computational scientific analyses. However… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Journal ref: Proceedings of the 11th Knowledge Capture Conference (K-CAP '21), December 2-3, 2021, Virtual Event, USA

  12. arXiv:2012.11936  [pdf, other

    cs.AI

    Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019

    Authors: Nacira Abbas, Kholoud Alghamdi, Mortaza Alinam, Francesca Alloatti, Glenda Amaral, Claudia d'Amato, Luigi Asprino, Martin Beno, Felix Bensmann, Russa Biswas, Ling Cai, Riley Capshaw, Valentina Anita Carriero, Irene Celino, Amine Dadoun, Stefano De Giorgis, Harm Delva, John Domingue, Michel Dumontier, Vincent Emonet, Marieke van Erp, Paola Espinoza Arias, Omaima Fallatah, Sebastián Ferrada, Marc Gallofré Ocaña , et al. (49 additional authors not shown)

    Abstract: One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this fur… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  13. arXiv:1911.09531  [pdf, other

    cs.LG cs.AI

    Towards FAIR protocols and workflows: The OpenPREDICT case study

    Authors: Remzi Celebi, Joao Rebelo Moreira, Ahmed A. Hassan, Sandeep Ayyar, Lars Ridder, Tobias Kuhn, Michel Dumontier

    Abstract: It is essential for the advancement of science that scientists and researchers share, reuse and reproduce workflows and protocols used by others. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize a number of important points regarding the means by which digital objects are found and reused by others. The question of how to app… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Preprint. Submitted to PeerJ on 13th November 2019. 3 appendixes as PDF files

  14. arXiv:1911.03183  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

    Authors: Erik-Jan van Kesteren, Chang Sun, Daniel L. Oberski, Michel Dumontier, Lianne Ippel

    Abstract: Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptogra… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Fully reproducible code for all results and images can be found at https://github.com/vankesteren/privacy-preserving-glm, and the software package can be found at https://github.com/vankesteren/privreg

  15. arXiv:1908.02287  [pdf, other

    cs.DC cs.DB

    LUCE: A Blockchain Solution for monitoring data License accoUntability and CompliancE

    Authors: Andine Havelange, Michel Dumontier, Birgit Wouters, Jona Linde, David Townend, Arno Riedl, Visara Urovi

    Abstract: In this paper we present our preliminary work on monitoring data License accoUntability and CompliancE (LUCE). LUCE is a blockchain platform solution designed to stimulate data sharing and reuse, by facilitating compliance with licensing terms. The platform enables data accountability by recording the use of data and their purpose on a blockchain-supported platform. LUCE allows for individual data… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

    Comments: 14 pages, 10 figures

  16. arXiv:1812.00991  [pdf

    cs.CY

    Analyzing Partitioned FAIR Health Data Responsibly

    Authors: Chang Sun, Lianne Ippel, Birgit Wouters, Johan van Soest, Alexander Malic, Onaopepo Adekunle, Bob van den Berg, Marco Puts, Ole Mussmann, Annemarie Koster, Carla van der Kallen, David Townend, Andre Dekker, Michel Dumontier

    Abstract: It is widely anticipated that the use of health-related big data will enable further understanding and improvements in human health and wellbeing. Our current project, funded through the Dutch National Research Agenda, aims to explore the relationship between the development of diabetes and socio-economic factors such as lifestyle and health care utilization. The analysis involves combining data f… ▽ More

    Submitted 2 December, 2018; originally announced December 2018.

    Comments: 6 pages, 1 figure, preliminary result, project report

    ACM Class: E.1; E.3; H.2.4; H.2.8

  17. arXiv:1809.06532  [pdf, other

    cs.DL

    Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data

    Authors: Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert, Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan M. Banda, Egon Willighagen, Friederike Ehrhart, Chris Evelo, Tareq B. Malas, Michel Dumontier

    Abstract: Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this atomic level. While the nanopublications format i… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Journal ref: In Proceedings of IEEE eScience 2018

  18. arXiv:1611.05204  [pdf

    cs.SI cs.DL physics.soc-ph q-bio.OT

    The emergence and evolution of the research fronts in HIV/AIDS research

    Authors: David Fajardo-Ortiz, Malaquias Lopez-Cervantes, Luis Duran, Michel Dumontier, Miguel Lara, Hector Ochoa, Victor M Castano

    Abstract: In this paper, we have identified and analyzed the emergence, structure and dynamics of the paradigmatic research fronts that established the fundamentals of the biomedical knowledge on HIV/AIDS. A search of papers with the identifiers "HIV/AIDS", "Human Immunodeficiency Virus", "HIV-1" and "Acquired Immunodeficiency Syndrome" in the Web of Science (Thomson Reuters), was carried out. A citation ne… ▽ More

    Submitted 31 May, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

    Journal ref: PLoS ONE, 2017 12(5): e0178293

  19. arXiv:1609.07108  [pdf, other

    cs.SE

    A Web API ecosystem through feature-based reuse

    Authors: Ruben Verborgh, Michel Dumontier

    Abstract: The fast-growing Web API landscape brings clients more options than ever before---in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for develo** Web APIs based on reuse of interface parts called features. Through the introduction of 5 design principles, we investigate the impact of feature-based reuse on Web APIs. Ap… ▽ More

    Submitted 12 March, 2018; v1 submitted 22 September, 2016; originally announced September 2016.

  20. arXiv:1509.04513  [pdf, ps, other

    cs.AI cs.DB

    On Reasoning with RDF Statements about Statements using Singleton Property Triples

    Authors: Vinh Nguyen, Olivier Bodenreider, Krishnaprasad Thirunarayan, Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I. Furlong, Michel Dumontier, Amit Sheth

    Abstract: The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Se… ▽ More

    Submitted 15 September, 2015; originally announced September 2015.

  21. arXiv:1509.02822  [pdf, other

    cs.DB cs.PF

    Exposing Provenance Metadata Using Different RDF Models

    Authors: Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I. Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier

    Abstract: A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be ve… ▽ More

    Submitted 9 September, 2015; originally announced September 2015.

  22. arXiv:1507.05408  [pdf, other

    cs.CY

    Provenance-Centered Dataset of Drug-Drug Interactions

    Authors: Juan M. Banda, Tobias Kuhn, Nigam H. Shah, Michel Dumontier

    Abstract: Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this pa… ▽ More

    Submitted 20 July, 2015; originally announced July 2015.

    Comments: In Proceedings of the 14th International Semantic Web Conference (ISWC) 2015

  23. Making Digital Artifacts on the Web Verifiable and Reliable

    Authors: Tobias Kuhn, Michel Dumontier

    Abstract: The current Web has no general mechanisms to make digital artifacts --- such as datasets, code, texts, and images --- verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web res… ▽ More

    Submitted 7 July, 2015; originally announced July 2015.

    Comments: Extended version of conference paper: arXiv:1401.5775

    ACM Class: H.3.4; H.3.5

  24. arXiv:1411.2749  [pdf, other

    cs.DL

    Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

    Authors: Tobias Kuhn, Christine Chichester, Michael Krauthammer, Michel Dumontier

    Abstract: Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which… ▽ More

    Submitted 22 July, 2015; v1 submitted 11 November, 2014; originally announced November 2014.

    Comments: In Proceedings of the 14th International Semantic Web Conference (ISWC) 2015

  25. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data

    Authors: Tobias Kuhn, Michel Dumontier

    Abstract: To make digital resources on the web verifiable, immutable, and permanent, we propose a technique to include cryptographic hash values in URIs. We call them trusty URIs and we show how they can be used for approaches like nanopublications to make not only specific resources but their entire reference trees verifiable. Digital artifacts can be identified not only on the byte level but on more abstr… ▽ More

    Submitted 28 May, 2014; v1 submitted 16 January, 2014; originally announced January 2014.

    Comments: Small error corrected in the text (table data was correct) on page 13: "All average values are below 0.8s (0.03s for batch mode). Using Java in batch mode even requires only 1ms per file."

    ACM Class: H.3.4; H.3.5

    Journal ref: Proceedings of The Semantic Web: Trends and Challenges, 11th International Conference, ESWC 2014, Springer

  26. arXiv:1305.6800  [pdf

    cs.DL

    Ovopub: Modular data publication with minimal provenance

    Authors: Alison Callahan, Michel Dumontier

    Abstract: With the growth of the Semantic Web as a medium for creating, consuming, mashing up and republishing data, our ability to trace any statement(s) back to their origin is becoming ever more important. Several approaches have now been proposed to associate statements with provenance, with multiple applications in data publication, attribution and argumentation. Here, we describe the ovopub, a modul… ▽ More

    Submitted 29 May, 2013; originally announced May 2013.

    Comments: 11 pages, 5 figures

  27. arXiv:1202.3602  [pdf, ps, other

    cs.AI q-bio.QM

    Towards quantitative measures in applied ontology

    Authors: Robert Hoehndorf, Michel Dumontier, Georgios V. Gkoutos

    Abstract: Applied ontology is a relatively new field which aims to apply theories and methods from diverse disciplines such as philosophy, cognitive science, linguistics and formal logics to perform or improve domain-specific tasks. To support the development of effective research methodologies for applied ontology, we critically discuss the question how its research results should be evaluated. We propose… ▽ More

    Submitted 16 February, 2012; originally announced February 2012.

    Comments: Initial manuscript, submitted to FOIS 2012

  28. Towards an interoperable information infrastructure providing decision support for genomic medicine

    Authors: Matthias Samwald, Holger Stenzhorn, Michel Dumontier, M. Scott Marshall, Joanne Luciano, Klaus-Peter Adlassnig

    Abstract: Genetic dispositions play a major role in individual disease risk and treatment response. Genomic medicine, in which medical decisions are refined by genetic information of particular patients, is becoming increasingly important. Here we describe our work and future visions around the creation of a distributed infrastructure for pharmacogenetic data and medical decision support, based on industry… ▽ More

    Submitted 28 September, 2011; originally announced September 2011.

    Journal ref: User Centred Networked Health Care - Proceedings of MIE 2011