Skip to main content

Showing 1–4 of 4 results for author: Pfitzmann, B

.
  1. arXiv:2405.10725  [pdf, other

    cs.CL cs.IR

    INDUS: Effective and Efficient Language Models for Scientific Applications

    Authors: Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics,… ▽ More

    Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  2. arXiv:2308.09637  [pdf, other

    cs.SE

    Visually Analyzing Company-wide Software Service Dependencies: An Industrial Case Study

    Authors: Sebastian Baltes, Brian Pfitzmann, Thomas Kowark, Christoph Treude, Fabian Beck

    Abstract: Managing dependencies between software services is a crucial task for any company operating cloud applications. Visualizations can help to understand and maintain these complex dependencies. In this paper, we present a force-directed service dependency visualization and filtering tool that has been developed and used within SAP. The tool's use cases include guiding service retirement as well as un… ▽ More

    Submitted 22 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: 5 pages, 3 figures, 1 table, 11th IEEE Working Conference on Software Visualization (VISSOFT 2023)

  3. DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents

    Authors: Furkan Simsek, Brian Pfitzmann, Hendrik Raetz, Jona Otholt, Hao** Yang, Christoph Meinel

    Abstract: Language identification describes the task of recognizing the language of written text in documents. This information is crucial because it can be used to support the analysis of a document's vocabulary and context. Supervised learning methods in recent years have advanced the task of language identification. However, these methods usually require large labeled datasets, which often need to be inc… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 6 pages (including references and excluding appendix)

  4. DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

    Authors: Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S Nassar, Peter W J Staar

    Abstract: Accurate document layout analysis is a key requirement for high-quality PDF document conversion. With the recent availability of public, large ground-truth datasets such as PubLayNet and DocBank, deep-learning models have proven to be very effective at layout detection and segmentation. While these datasets are of adequate size to train such models, they severely lack in layout variability since t… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 9 pages, 6 figures, 5 tables. Accepted paper at SIGKDD 2022 conference