Skip to main content

Showing 1–2 of 2 results for author: Torkar, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.00693  [pdf, other

    cs.DL q-bio.OT

    A large dataset of software mentions in the biomedical literature

    Authors: Ana-Maria Istrate, Donghui Li, Dario Taraborelli, Michaela Torkar, Boris Veytsman, Ivana Williams

    Abstract: We describe the CZ Software Mentions dataset, a new dataset of software mentions in biomedical papers. Plain-text software mentions are extracted with a trained SciBERT model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the… ▽ More

    Submitted 27 September, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

  2. arXiv:2204.06584  [pdf, other

    cs.CL cs.AI

    A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

    Authors: Dongxu Zhang, Sunil Mohan, Michaela Torkar, Andrew McCallum

    Abstract: We introduce ChemDisGene, a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models. Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes, portions of which human experts labeled with 18 types of biomedical relationships between these entities (intended for evaluation), and the re… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: LREC 2022 (Oral)