Skip to main content

Showing 1–2 of 2 results for author: Krickl, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2102.00583  [pdf, other

    cs.CL

    Neural OCR Post-Hoc Correction of Historical Corpora

    Authors: Lijun Lyu, Maria Koutraki, Martin Krickl, Besnik Fetahu

    Abstract: Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of character, word, or word segmentation transcription errors. For digital corpora of historical prints, the errors are further exacerbated due to low scan quality and… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: To appear at TACL

  2. arXiv:2001.01673  [pdf, other

    cs.DL

    Identifying Historical Travelogues in Large Text Corpora Using Machine Learning

    Authors: Jan Rörden, Doris Gruber, Martin Krickl, Bernhard Haslhofer

    Abstract: Travelogues represent an important and intensively studied source for scholars in the humanities, as they provide insights into people, cultures, and places of the past. However, existing studies rarely utilize more than a dozen primary sources, since the human capacities of working with a large number of historical sources are naturally limited. In this paper, we define the notion of travelogue a… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 14 pages, accepted for presentation at iConference 2020