-
From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer
Authors:
Md. Rezaul Karim,
Lina Molinas Comet,
Md Shajalal,
Oya Deniz Beyan,
Dietrich Rebholz-Schuhmann,
Stefan Decker
Abstract:
Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for develo** prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment re…
▽ More
Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for develo** prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.
△ Less
Submitted 19 November, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
A Biomedical Knowledge Graph for Biomarker Discovery in Cancer
Authors:
Md. Rezaul Karim,
Lina Molinas Comet,
Oya Beyan,
Dietrich Rebholz-Schuhmann,
Stefan Decker
Abstract:
Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A domain-specific knowledge graph~(KG) is an explicit conceptualization of a specific…
▽ More
Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A domain-specific knowledge graph~(KG) is an explicit conceptualization of a specific subject-matter domain represented w.r.t semantically interrelated entities and relations. A KG can be constructed by integrating such facts and data and be used for data integration, exploration, and federated queries. However, exploration and querying large-scale KGs is tedious for certain groups of users due to a lack of knowledge about underlying data assets or semantic technologies. Such a KG will not only allow deducing new knowledge and question answering(QA) but also allows domain experts to explore. Since cross-disciplinary explanations are important for accurate diagnosis, it is important to query the KG to provide interactive explanations about learned biomarkers. Inspired by these, we construct a domain-specific KG, particularly for cancer-specific biomarker discovery. The KG is constructed by integrating cancer-related knowledge and facts from multiple sources. First, we construct a domain-specific ontology, which we call OncoNet Ontology (ONO). The ONO ontology is developed to enable semantic reasoning for verification of the predictions for relations between diseases and genes. The KG is then developed and enriched by harmonizing the ONO, additional metadata schemas, ontologies, controlled vocabularies, and additional concepts from external sources using a BERT-based information extraction method. BioBERT and SciBERT are finetuned with the selected articles crawled from PubMed. We listed down some queries and some examples of QA and deducing knowledge based on the KG.
△ Less
Submitted 23 February, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Signatures of fluid and kinetic properties in the energy distributions of multicharged Ta ions from nanosecond laser-heated plasma
Authors:
F. Gobet,
M. Comet,
J-R. Marques,
V. Meot,
X. Raymond,
M. Versteegen,
J. -L. Henares,
O. Morice
Abstract:
The energy distributions of Ta ions produced in a nanosecond laser-heated plasma at 4$\times$10$^{15}$ W/cm$^{2}$ are experimentally and theoretically investigated. They are measured far from the target with an electrostatic spectrometer and charge collectors. Shadowgraphy and interferometry are used to characterize the plasma dynamics in the first nanoseconds of the plasma expansion for electron…
▽ More
The energy distributions of Ta ions produced in a nanosecond laser-heated plasma at 4$\times$10$^{15}$ W/cm$^{2}$ are experimentally and theoretically investigated. They are measured far from the target with an electrostatic spectrometer and charge collectors. Shadowgraphy and interferometry are used to characterize the plasma dynamics in the first nanoseconds of the plasma expansion for electron densities ranging from 10$^{18}$ to 10$^{20}$ cm$^{-3}$. The experimental data clearly show two components in the energy distributions which depend on the ion charge states. These components are discussed in light of fluid and kinetic descriptions of the expanding plasma. In particular, quantitative comparisons with calculations performed with 3D hydrodynamic (Troll) and 1D3V Particle In Cell (XooPIC) codes demonstrate that a double layer created at the plasma-vacuum interface plays a crucial role in the acceleration of the highest charge state ions at high energy.
△ Less
Submitted 22 November, 2018;
originally announced November 2018.
-
Comment on electron-impact excitation cross-section measurements for He-like Xenon
Authors:
Jean-Christophe Pain,
Maxime Comet,
Christopher J Fontes
Abstract:
We discuss lower-than-predicted collisional-excitation cross-sections for helium-like xenon measured at an Electron Beam Ion Trap facility. In a review paper (H. Chen and P. Beiersdorfer, Can. J. Phys. 86, 55 (2008)), the authors find a significant effect due to the Breit interaction between the free and the bound electrons in the excitation process of He-like xenon. The authors state that the agr…
▽ More
We discuss lower-than-predicted collisional-excitation cross-sections for helium-like xenon measured at an Electron Beam Ion Trap facility. In a review paper (H. Chen and P. Beiersdorfer, Can. J. Phys. 86, 55 (2008)), the authors find a significant effect due to the Breit interaction between the free and the bound electrons in the excitation process of He-like xenon. The authors state that the agreement between the measured and calculated cross-section values can only be found when the generalized Breit interaction is included in the calculations. We have performed new calculations with a Multi-Configuration Dirac-Fock code, as well as with the Penn State University suite of codes, and our conclusions are that the contribution of the Breit interaction is much lower than found in the calculations presented in the abovementioned article. In fact, our predictions are subsequently almost twice as large as the experimental values. We present these considerations in hopes of motivating new experimental investigations.
△ Less
Submitted 14 June, 2018;
originally announced June 2018.
-
Detailed Opacity Calculations for Stellar Models
Authors:
Jean-Christophe Pain,
Franck Gilleron,
Maxime Comet
Abstract:
Radiative opacity is an important quantity in the modeling of stellar structure and evolution. In the present work we recall the role of opacity in the interpretation of pulsations of different kinds of stars. The detailed opacity code SCO-RCG for local-thermodynamic-equilibrium (LTE) plasmas is described, as well as the OPAMCDF project dedicated to the spectroscopy of LTE and non-LTE plasmas. Int…
▽ More
Radiative opacity is an important quantity in the modeling of stellar structure and evolution. In the present work we recall the role of opacity in the interpretation of pulsations of different kinds of stars. The detailed opacity code SCO-RCG for local-thermodynamic-equilibrium (LTE) plasmas is described, as well as the OPAMCDF project dedicated to the spectroscopy of LTE and non-LTE plasmas. Interpretations, with the latter codes, of several laser and Z pinch experiments in conditions relevant to astrophysical applications are also presented and our work in progress as concerns the internal solar conditions is illustrated.
△ Less
Submitted 2 February, 2018;
originally announced February 2018.
-
A project based on Multi-Configuration Dirac-Fock calculations for plasma spectroscopy
Authors:
Maxime Comet,
Jean-Christophe Pain,
Franck Gilleron,
Robin Piron
Abstract:
We present a project dedicated to hot plasma spectroscopy based on a Multi-Configuration Dirac-Fock (MCDF) code, initially developed by J. Bruneau. The code is briefly described and the use of the transition-state method for plasma spectroscopy is detailed. Then an opacity code for local-thermodynamic-equilibrium plasmas using MCDF data, named OPAMCDF, is presented. Transition arrays for which the…
▽ More
We present a project dedicated to hot plasma spectroscopy based on a Multi-Configuration Dirac-Fock (MCDF) code, initially developed by J. Bruneau. The code is briefly described and the use of the transition-state method for plasma spectroscopy is detailed. Then an opacity code for local-thermodynamic-equilibrium plasmas using MCDF data, named OPAMCDF, is presented. Transition arrays for which the number of lines is too large to be handled in a Detailed-Line-Accounting calculation can be modeled within the Partially-Resolved-Transition-Array method or using the Unresolved-Transition-Arrays formalism in jj-coupling. An improvement of the original Partially-Resolved-Transition-Array method is presented which gives a better agreement with Detailed-Line-Accounting computations. Comparisons with some absorption and emission experimental spectra are shown. Finally, the capability of the MCDF code to compute atomic data required for collisional-radiative modeling of plasma at non local thermodynamic equilibrium is illustrated. Additionally to photoexcitation, this code can be used to calculate photoionization, electron impact excitation and ionization cross-sections as well as autoionization rates in the Distorted-Wave or Close Coupling approximations. Comparisons with cross-sections and rates available in the literature are discussed.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
Detailed opacity calculations for astrophysical applications
Authors:
Jean-Christophe Pain,
Franck Gilleron,
Maxime Comet
Abstract:
Nowadays, several opacity codes are able to provide data for stellar structure models, but the computed opacities may show significant differences. In this work, we present state-of-the-art precise spectral opacity calculations, illustrated by stellar applications. The essential role of laboratory experiments to check the quality of the computed data is underlined. We review some X-ray and XUV las…
▽ More
Nowadays, several opacity codes are able to provide data for stellar structure models, but the computed opacities may show significant differences. In this work, we present state-of-the-art precise spectral opacity calculations, illustrated by stellar applications. The essential role of laboratory experiments to check the quality of the computed data is underlined. We review some X-ray and XUV laser and Z-pinch photo-absorption measurements as well as X-ray emission spectroscopy experiments involving hot dense plasmas produced by ultra-high-intensity laser irradiation. The measured spectra are systematically compared with the fine-structure opacity code SCO-RCG. Focus is put on iron, due to its crucial role in understanding asteroseismic observations of $β$ Cephei-type and Slowly Pulsating B stars, as well as of the Sun. For instance, in $β$ Cephei-type stars, the iron-group opacity peak excites acoustic modes through the "kappa-mechanism". A particular attention is paid to the higher-than-predicted iron opacity measured at the Sandia Z-machine at solar interior conditions. We discuss some theoretical aspects such as density effects, photo-ionization, autoionization or the "filling-the-gap" effect of highly excited states.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
K-shell spectroscopy in hot plasmas: Stark effect, Breit interaction and QED corrections
Authors:
Jean-Christophe Pain,
Franck Gilleron,
Maxime Comet,
Dominique Gilles
Abstract:
The broadening of lines by Stark effect is widely used for inferring electron density and temperature in plasmas. Stark-effect calculations often rely on atomic data (transition rates, energy levels,...) not always exhaustive and/or valid only for isolated atoms. In this work, we first present a recent development in the detailed opacity code SCO-RCG for K-shell spectroscopy. The approach is adapt…
▽ More
The broadening of lines by Stark effect is widely used for inferring electron density and temperature in plasmas. Stark-effect calculations often rely on atomic data (transition rates, energy levels,...) not always exhaustive and/or valid only for isolated atoms. In this work, we first present a recent development in the detailed opacity code SCO-RCG for K-shell spectroscopy. The approach is adapted from the work of Gilles and Peyrusse. Neglecting non-diagonal terms in dipolar and collision operators, the line profile is expressed as a sum of Voigt functions associated to the Stark components. The formalism relies on the use of parabolic coordinates and the relativistic fine-structure of Lyman lines is included by diagonalizing the hamiltonian matrix associated to quantum states having the same principal quantum number n. The SCO-RCG code enables one to investigate plasma environment effects, the impact of the microfield distribution, the decoupling between electron and ion temperatures and the role of satellite lines (such as Li-like 1s nl n'l' - 1s2 nl, Be-like, etc.). Atomic-structure calculations have reached levels of accuracy which require evaluation of Breit interaction and many-electron quantum electro-dynamics (QED) contributions. Although much work was done for QED effects (self-energy and vacuum polarization) in hydrogenic atoms, the case of an arbitrary number of electrons is more complicated. Since exact analytic solutions do not exist, a number of heuristic methods have been used to approximate the screening of additional électrons in the self-energy part. We compare different ways of including such effects in atomic-structure codes (Slater-Condon, Multi-Configuration Dirac-Fock, etc.).
△ Less
Submitted 17 October, 2016; v1 submitted 3 August, 2016;
originally announced August 2016.