Skip to main content

Showing 1–3 of 3 results for author: Ehrmann, M

.
  1. arXiv:2109.11406  [pdf, other

    cs.CL cs.LG

    Named Entity Recognition and Classification on Historical Documents: A Survey

    Authors: Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet

    Abstract: After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficientl… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: 39 pages

    ACM Class: A.1; I.2.7

    Journal ref: ACM Computing Surveys 56-2 (2023) 1-47

  2. arXiv:2002.06144  [pdf, other

    cs.CV cs.CL cs.IR cs.LG

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Authors: Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan

    Abstract: The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images… ▽ More

    Submitted 14 December, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

    Journal ref: Journal of Data Mining & Digital Humanities, HistoInformatics, HistoInformatics (January 19, 2021) jdmdh:6107

  3. arXiv:1309.6185  [pdf

    cs.CL

    Acronym recognition and processing in 22 languages

    Authors: Maud Ehrmann, Leonida della Rocca, Ralf Steinberger, Hristo Tanev

    Abstract: We are presenting work on recognising acronyms of the form Long-Form (Short-Form) such as "International Monetary Fund (IMF)" in millions of news articles in twenty-two languages, as part of our more general effort to recognise entities and their variants in news text and to use them for the automatic analysis of the news, including the linking of related news across languages. We show how the acr… ▽ More

    Submitted 24 September, 2013; originally announced September 2013.

    ACM Class: H.3.1; H.3.3; I.2.7; I.5.4

    Journal ref: Proceedings of the 9th Conference 'Recent Advances in Natural Language Processing' (RANLP), pp. 237-244. Hissar, Bulgaria, 7-13 September 2013