Skip to main content

Showing 1–8 of 8 results for author: Tarride, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19317  [pdf, other

    cs.CV cs.CL

    Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition

    Authors: Solène Tarride, Christopher Kermorvant

    Abstract: In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the fiel… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  2. arXiv:2404.18722  [pdf, ps, other

    cs.CV cs.CL

    Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

    Authors: Solène Tarride, Yoann Schneider, Marie Generali-Lince, Mélodie Boillet, Bastien Abadie, Christopher Kermorvant

    Abstract: PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy. In this paper, we outline our recent contributions to the PyLaia library, focusing on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way t… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  3. arXiv:2404.18706  [pdf, other

    cs.CV

    The Socface Project: Large-Scale Collection, Processing, and Analysis of a Century of French Censuses

    Authors: Mélodie Boillet, Solène Tarride, Manon Blanco, Valentin Rigal, Yoann Schneider, Bastien Abadie, Lionel Kesztenbaum, Christopher Kermorvant

    Abstract: This paper presents a complete processing workflow for extracting information from French census lists from 1836 to 1936. These lists contain information about individuals living in France and their households. We aim at extracting all the information contained in these tables using automatic handwritten table recognition. At the end of the Socface project, in which our work is taking place, the e… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.18664  [pdf, other

    cs.CV

    Reading Order Independent Metrics for Information Extraction in Handwritten Documents

    Authors: David Villanova-Aparisi, Solène Tarride, Carlos-D. Martínez-Hinarejos, Verónica Romero, Christopher Kermorvant, Moisés Pastor-Gadea

    Abstract: Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  5. arXiv:2306.10878  [pdf, other

    cs.CV cs.AI

    Handwritten Text Recognition from Crowdsourced Annotations

    Authors: Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, Christopher Kermorvant

    Abstract: In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: Accepted to the 7th International Workshop on Historical Document Imaging and Processing (HIP 23)

  6. Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records

    Authors: Solène Tarride, Martin Maarand, Mélodie Boillet, James McGrath, Eugénie Capel, Hélène Vézina, Christopher Kermorvant

    Abstract: This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family information highly valuable for genetic, demographic and social studies of the Quebec population. From an image of parish records, our workflow is able to identify the acts and extract personal information. The workflow is d… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Journal ref: International Journal on Document Analysis and Recognition (IJDAR) (2023)

  7. arXiv:2304.13606  [pdf, other

    cs.CV cs.DB

    SIMARA: a database for key-value information extraction from full pages

    Authors: Solène Tarride, Mélodie Boillet, Jean-François Moufflet, Christopher Kermorvant

    Abstract: We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids from six different series, dating from the 18th-20th centuries. Finding aids are handwritten documents that contain metadata describing older archives. They are stored in the National Archives of France and are used by archivists to identify and find archival documents… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  8. arXiv:2304.13530  [pdf, other

    cs.CV cs.AI cs.IR

    Key-value information extraction from full handwritten pages

    Authors: Solène Tarride, Mélodie Boillet, Christopher Kermorvant

    Abstract: We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition befo… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.