Skip to main content

Showing 1–6 of 6 results for author: Idrissi-Yaghir, A

.
  1. arXiv:2405.11950  [pdf, other

    cs.CL cs.LG

    WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles

    Authors: Tabea M. G. Pakull, Hendrik Damm, Ahmad Idrissi-Yaghir, Henning Schäfer, Peter A. Horn, Christoph M. Friedrich

    Abstract: This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization perfor… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 4 pages, 6 figure, 3 tables, submitted to: BIONLP 2024 and Shared Tasks @ ACL 2024

  2. arXiv:2405.11255  [pdf, other

    cs.CL cs.LG

    WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV

    Authors: Hendrik Damm, Tabea M. G. Pakull, Bahadır Eryılmaz, Helmut Becker, Ahmad Idrissi-Yaghir, Henning Schäfer, Sergej Schultenkämper, Christoph M. Friedrich

    Abstract: This study aims to leverage state of the art language models to automate generating the "Brief Hospital Course" and "Discharge Instructions" sections of Discharge Summaries from the MIMIC-IV dataset, reducing clinicians' administrative workload. We investigate how automation can improve documentation accuracy, alleviate clinician burnout, and enhance operational efficacy in healthcare facilities.… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 tables, 8 figures, submitted to: BioNLP 2024 and Shared Tasks @ ACL 2024

  3. arXiv:2405.10004  [pdf, other

    eess.IV cs.CV cs.LG

    ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

    Authors: Johannes Rückert, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Cynthia S. Schmidt, Sven Koitka, Obioma Pelka, Asma Ben Abacha, Alba G. Seco de Herrera, Henning Müller, Peter A. Horn, Felix Nensa, Christoph M. Friedrich

    Abstract: Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated versio… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted for Scientific Data

  4. arXiv:2404.05694  [pdf, other

    cs.CL cs.AI cs.LG

    Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

    Authors: Ahmad Idrissi-Yaghir, Amin Dada, Henning Schäfer, Kamyar Arzideh, Giulia Baldini, Jan Trienes, Max Hasin, Jeanette Bewersdorff, Cynthia S. Schmidt, Marie Bauer, Kaleb E. Smith, Jiang Bian, Yonghui Wu, Jörg Schlötterer, Torsten Zesch, Peter A. Horn, Christin Seifert, Felix Nensa, Jens Kleesiek, Christoph M. Friedrich

    Abstract: Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are commo… ▽ More

    Submitted 8 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024

  5. arXiv:2310.07321  [pdf, other

    cs.CL cs.AI cs.LG

    On the Impact of Cross-Domain Data on German Language Models

    Authors: Amin Dada, Aokun Chen, Cheng Peng, Kaleb E Smith, Ahmad Idrissi-Yaghir, Constantin Marc Seibold, Jianning Li, Lars Heiliger, Xi Yang, Christoph M. Friedrich, Daniel Truhn, Jan Egger, Jiang Bian, Jens Kleesiek, Yonghui Wu

    Abstract: Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed… ▽ More

    Submitted 13 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 13 pages, 1 figure, accepted at Findings of the Association for Computational Linguistics: EMNLP 2023

  6. Domain Adaptation of Transformer-Based Models using Unlabeled Data for Relevance and Polarity Classification of German Customer Feedback

    Authors: Ahmad Idrissi-Yaghir, Henning Schäfer, Nadja Bauer, Christoph M. Friedrich

    Abstract: Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working wit… ▽ More

    Submitted 8 March, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Complete

    Journal ref: SN Computer Science 4 (2) (2023)