Skip to main content

Showing 1–1 of 1 results for author: Kuebart, K

.
  1. arXiv:2401.16845  [pdf, other

    cs.DL

    Chronicling Germany: An Annotated Historical Newspaper Dataset

    Authors: Christian Schultze, Niklas Kerkfeld, Kara Kuebart, Princilia Weber, Moritz Wolter, Felix Selgert

    Abstract: The correct detection of article layout in historical newspaper pages remains challenging but is important for Natural Language Processing ( NLP) and machine learning applications in the field of digital history. Digital newspaper portals typically provide Optical Character Recognition ( OCR) text, albeit of varying quality. Unfortunately, layout information is often missing, limiting this rich so… ▽ More

    Submitted 7 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Dataset available at: https://gitlab.uni-bonn.de/digital-history/Chronicling-Germany-Dataset . Baseline code: https://github.com/Digital-History-Bonn/Chronicling-Germany-Code