Skip to main content

Showing 1–30 of 30 results for author: Rehm, G

.
  1. arXiv:2406.06366  [pdf, other

    cs.CL

    Symmetric Dot-Product Attention for Efficient Training of BERT Language Models

    Authors: Martin Courtois, Malte Ostendorff, Leonhard Hennig, Georg Rehm

    Abstract: Initially introduced as a machine translation model, the Transformer architecture has now become the foundation for modern deep learning architecture, with applications in a wide range of fields, from computer vision to natural language processing. Nowadays, to tackle increasingly more complex tasks, Transformer-based models are stretched to enormous sizes, requiring increasingly larger training d… ▽ More

    Submitted 19 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: to be published in Findings of the Association for Computational Linguistics: ACL 2024

  2. arXiv:2404.11726  [pdf, other

    cs.CL

    Investigating Gender Bias in Turkish Language Models

    Authors: Orhun Caglidil, Malte Ostendorff, Georg Rehm

    Abstract: Language models are trained mostly on Web data, which often contains social stereotypes and biases that the models can inherit. This has potentially negative consequences, as models can amplify these biases in downstream tasks or applications. However, prior research has primarily focused on the English language, especially in the context of gender bias. In particular, grammatically gender-neutral… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:1903.10561 by other authors

  3. arXiv:2404.08443  [pdf, other

    cs.DL cs.IR

    Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

    Authors: Raia Abu Ahmad, Jennifer D'Souza, Matthäus Zloch, Wolfgang Otto, Georg Rehm, Allard Oelen, Stefan Dietze, Sören Auer

    Abstract: Search engines these days can serve datasets as search results. Datasets get picked up by search technologies based on structured descriptions on their official web pages, informed by metadata ontologies such as the Dataset content type of schema.org. Despite this promotion of the content type dataset as a first-class citizen of search results, a vast proportion of datasets, particularly research… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, published in the Joint Proceedings of the Onto4FAIR 2023 Workshops

    Journal ref: In Joint Proceedings of the Onto4FAIR 2023 Workshops: Collocated with FOIS 2023 and SEMANTICS 2023. pp.23-31. https://hal.science/hal-04312604

  4. arXiv:2301.09626  [pdf, other

    cs.CL cs.AI

    Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

    Authors: Malte Ostendorff, Georg Rehm

    Abstract: Most Transformer language models are primarily pretrained on English text, limiting their use for other languages. As the model sizes grow, the performance gap between English and other languages with fewer compute and data resources increases even further. Consequently, more resource-efficient training methods are needed to bridge the gap for languages with fewer resources available. To address t… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

  5. arXiv:2204.13943  [pdf, other

    cs.CL cs.AI cs.HC

    User Experience Design for Automatic Credibility Assessment of News Content About COVID-19

    Authors: Konstantin Schulz, Jens Rauenbusch, Jan Fillies, Lisa Rutenburg, Dimitrios Karvelas, Georg Rehm

    Abstract: The increasingly rapid spread of information about COVID-19 on the web calls for automatic measures of quality assurance. In that context, we check the credibility of news content using selected linguistic features. We present two empirical studies to evaluate the usability of graphical interfaces that offer such credibility assessment. In a moderated qualitative interview with six participants, w… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: 25 pages, 7 figures, to be published in HCI International 2022 - Late Breaking Papers

    MSC Class: 68-04 ACM Class: H.5.2; I.2.7

  6. arXiv:2203.14541  [pdf, other

    cs.IR cs.CL

    Specialized Document Embeddings for Aspect-based Similarity of Research Papers

    Authors: Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm

    Abstract: Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted for publication at JCDL 2022

  7. arXiv:2203.09629  [pdf, other

    cs.CL

    HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information

    Authors: Qian Ruan, Malte Ostendorff, Georg Rehm

    Abstract: Transformer-based language models usually treat texts as linear sequences. However, most texts also have an inherent hierarchical structure, i.e., parts of a text can be identified using their position in this hierarchy. In addition, section titles usually indicate the common topic of their respective sentences. We propose a novel approach to formulate, extract, encode and inject hierarchical stru… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 17 pages, 3 figures, to be published in Findings ACL 2022

  8. arXiv:2202.06671  [pdf, other

    cs.CL

    Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

    Authors: Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm

    Abstract: Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics. Prior work relies on discrete citation relations to generate contrast samples. However, discrete citations enforce a hard cut-off to similarity. This is counter-i… ▽ More

    Submitted 19 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted to EMNLP 2022

  9. arXiv:2109.12323  [pdf

    cs.LG

    Deep Learning-Based Detection of the Acute Respiratory Distress Syndrome: What Are the Models Learning?

    Authors: Gregory B. Rehm, Chao Wang, Irene Cortes-Puch, Chen-Nee Chuah, Jason Adams

    Abstract: The acute respiratory distress syndrome (ARDS) is a severe form of hypoxemic respiratory failure with in-hospital mortality of 35-46%. High mortality is thought to be related in part to challenges in making a prompt diagnosis, which may in turn delay implementation of evidence-based therapies. A deep neural network (DNN) algorithm utilizing unbiased ventilator waveform data (VWD) may help to impro… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

  10. arXiv:2109.10224  [pdf

    q-bio.QM cs.LG

    Clinical Validation of Single-Chamber Model-Based Algorithms Used to Estimate Respiratory Compliance

    Authors: Gregory Rehm, Jimmy Nguyen, Chelsea Gilbeau, Marc T Bomactao, Chen-Nee Chuah, Jason Adams

    Abstract: Non-invasive estimation of respiratory physiology using computational algorithms promises to be a valuable technique for future clinicians to detect detrimental changes in patient pathophysiology. However, few clinical algorithms used to non-invasively analyze lung physiology have undergone rigorous validation in a clinical setting, and are often validated either using mechanical devices, or with… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

  11. arXiv:2104.13841  [pdf, other

    cs.CL cs.IR

    Evaluating Document Representations for Content-based Legal Literature Recommendations

    Authors: Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm

    Abstract: Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datase… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: Accepted for publication at ICAIL 2021

  12. arXiv:2010.06395  [pdf, other

    cs.CL cs.IR

    Aspect-based Document Similarity for Research Papers

    Authors: Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm

    Abstract: Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classifi… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at COLING 2020

  13. arXiv:2009.00345  [pdf, other

    eess.SY physics.acc-ph

    Multi-Array Electron Beam Stabilization using Block-Circulant Transformation and Generalized Singular Value Decomposition

    Authors: Idris Kempf, Stephen R. Duncan, Paul J. Goulart, Guenther Rehm

    Abstract: We introduce a novel structured controller design for the electron beam stabilization problem of the UK's national synchrotron light source. Because changes to the synchrotron will not allow the application of existing control approaches, we develop a novel method to diagonalize the multi-input multi-output (MIMO) system. A generalized singular value decomposition (GSVD) is used to simultaneously… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

  14. arXiv:2008.13428  [pdf, ps, other

    physics.acc-ph eess.SP

    Symmetry Exploitation in Orbit Feedback Systems of Synchrotron Storage Rings

    Authors: Idris Kempf, Paul J. Goulart, Stephen R. Duncan, Guenther Rehm

    Abstract: Structural symmetries in the storage ring of synchrotrons are intentionally created during the design phase of the magnetic lattices, but they are not considered in the design of control algorithms that stabilize the beam of accelerated particles. The choice of control algorithm, however, is limited by the speed requirements of the synchrotron. Standard control algorithms for synchrotrons are base… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

  15. arXiv:2004.14130  [pdf, other

    cs.CL cs.SE

    A Workflow Manager for Complex NLP and Content Curation Pipelines

    Authors: Julián Moreno-Schneider, Peter Bourgonje, Florian Kintzel, Georg Rehm

    Abstract: We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear

  16. arXiv:2004.12195  [pdf, other

    cs.DL cs.CL cs.HC

    QURATOR: Innovative Technologies for Content and Data Curation

    Authors: Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle, Jens Rauenbusch, Lisa Rutenburg, André Schmidt, Mikka Wild, Henry Hoffmann, Julian Fink, Sarah Schulz, Jurica Seva, Joachim Quantz, Joachim Böttger, Josefine Matthey, Rolf Fricke, Jan Thomsen, Adrian Paschke, Jamal Al Qundus , et al. (15 additional authors not shown)

    Abstract: In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industr… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    Comments: Proceedings of QURATOR 2020: The conference for intelligent content solutions, Berlin, Germany, February 2020

  17. arXiv:2004.12190  [pdf, other

    cs.CL

    Towards Discourse Parsing-inspired Semantic Storytelling

    Authors: Georg Rehm, Karolina Zaczynska, Julián Moreno-Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, André Schmidt, Mikka Wild

    Abstract: Previous work of ours on Semantic Storytelling uses text analytics procedures including Named Entity Recognition and Event Detection. In this paper, we outline our longer-term vision on Semantic Storytelling and describe the current conceptual and technical approach. In the project that drives our research we develop AI-based technologies that are verified by partners from industry. One long-term… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    Comments: Proceedings of QURATOR 2020: The conference for intelligent content solutions, Berlin, Germany, February 2020

  18. arXiv:2004.10283  [pdf, other

    cs.CL

    Observations on Annotations

    Authors: Georg Rehm

    Abstract: The annotation of textual information is a fundamental activity in Linguistics and Computational Linguistics. This article presents various observations on annotations. It approaches the topic from several angles including Hypertext, Computational Linguistics and Language Technology, Artificial Intelligence and Open Science. Annotations can be examined along different dimensions. In terms of compl… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

    Comments: To be published in: Annotations in Scholarly Editions and Research: Functions, Differentiation, Systematization (2020), Julia Nantke and Frederik Schlupkothen (editors). De Gruyter. In print

  19. arXiv:2004.08355  [pdf

    cs.CL cs.AI

    Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

    Authors: Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim Köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš

    Abstract: With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the a… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear

  20. arXiv:2003.13833  [pdf

    cs.CL cs.AI cs.DL

    The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

    Authors: Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin , et al. (22 additional authors not shown)

    Abstract: Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitu… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  21. arXiv:2003.13551  [pdf

    cs.CL

    European Language Grid: An Overview

    Authors: Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici , et al. (11 additional authors not shown)

    Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, lang… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  22. arXiv:2003.13236  [pdf, other

    cs.CL cs.DL

    Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid

    Authors: Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitrios Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuel Gómez Pérez, Andres Garcia Silva

    Abstract: The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  23. arXiv:2003.13032  [pdf, other

    cs.CL

    Named Entities in Medical Case Reports: Corpus and Experiments

    Authors: Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm

    Abstract: We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  24. arXiv:2003.13027  [pdf, other

    cs.CL cs.LG

    Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

    Authors: Dmitrii Aksenov, Julián Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm

    Abstract: We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT win… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  25. arXiv:2003.13016  [pdf, ps, other

    cs.CL cs.IR

    A Dataset of German Legal Documents for Named Entity Recognition

    Authors: Elena Leitner, Georg Rehm, Julián Moreno-Schneider

    Abstract: We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European le… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  26. arXiv:2003.12900  [pdf, other

    cs.CL cs.AI

    Orchestrating NLP Services for the Legal Domain

    Authors: Julián Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza

    Abstract: Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  27. arXiv:2003.09881  [pdf, other

    cs.DL cs.CL cs.IR

    Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

    Authors: Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp

    Abstract: Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between do… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

    Comments: Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020)

  28. arXiv:1909.08402  [pdf, other

    cs.CL cs.IR cs.LG

    Enriching BERT with Knowledge Graph Embeddings for Document Classification

    Authors: Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno-Schneider, Georg Rehm, Bela Gipp

    Abstract: In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classific… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

  29. arXiv:1904.12969  [pdf

    cs.LG stat.ML

    Improving Mechanical Ventilator Clinical Decision Support Systems with A Machine Learning Classifier for Determining Ventilator Mode

    Authors: Gregory B. Rehm, Brooks T. Kuhn, Jimmy Nguyen, Nicholas R. Anderson, Chen-Nee Chuah, Jason Y. Adams

    Abstract: Clinical decision support systems (CDSS) will play an in-creasing role in improving the quality of medical care for critically ill patients. However, due to limitations in current informatics infrastructure, CDSS do not always have com-plete information on state of supporting physiologic monitor-ing devices, which can limit the input data available to CDSS. This is especially true in the use case… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

  30. arXiv:1711.02181  [pdf, other

    cs.CR

    Mobile Encryption Gateway (MEG) for Email Encryption

    Authors: Gregory B Rehm, Michael Thompson, Brad Busenius, Jennifer Fowler

    Abstract: Email cryptography applications often suffer from major problems that prevent their widespread implementation. MEG, or the Mobile Encryption Gateway aims to fix the issues associated with email encryption by ensuring that encryption is easy to perform while still maintaining data security. MEG performs automatic decryption and encryption of all emails using PGP. Users do not need to understand the… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.