Skip to main content

Showing 1–12 of 12 results for author: Puppe, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18745  [pdf, other

    cs.LG

    QBI: Quantile-based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

    Authors: Micha V. Nowak, Tim P. Bott, David Khachaturov, Frank Puppe, Adrian Krenzer, Amar Hekalo

    Abstract: Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the mode… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. OCR4all -- An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

    Authors: Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe

    Abstract: Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processin… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: submitted to MDPI - Applied Sciences

    Journal ref: https://www.mdpi.com/2076-3417/9/22/4853/htm

  3. arXiv:1810.03436  [pdf

    cs.CV

    State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines

    Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

    Abstract: In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: Submitted to DHd 2019 (https://dhd2019.org/) which demands a... creative... submission format. Consequently, some captions might look weird and some links aren't clickable. Extended version with more technical details and some fixes to follow

  4. arXiv:1807.02004  [pdf

    cs.CV

    Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition

    Authors: Christoph Wick, Christian Reul, Frank Puppe

    Abstract: Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and Lüdeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to… ▽ More

    Submitted 6 August, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

    Comments: 11 pages, 3 figures

    Journal ref: Digital Humanities Quarterly 14 (2), 2020

  5. arXiv:1802.10038  [pdf, other

    cs.CV

    Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

    Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

    Abstract: We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcri… ▽ More

    Submitted 28 February, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Submitted to JLCL Volume 33 (2018), Issue 1: Special Issue on Automatic Text and Layout Recognition

  6. arXiv:1802.10033  [pdf, other

    cs.CV cs.DL

    Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks

    Authors: Christoph Wick, Christian Reul, Frank Puppe

    Abstract: This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books. While the standard model of line based OCR uses a single LSTM layer, we utilize a CNN- and Pooling-Layer combination in advance of an LSTM layer. Due to the higher amount of trainable parameters the performance of the network relies on a high amount of training examples to… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

    Comments: 16 pages, 4 figures, 8 tables, submitted to JLCL Volume 33 (2018), Issue 1

  7. arXiv:1712.05586  [pdf

    cs.CV

    Transfer Learning for OCRopus Model Training on Early Printed Books

    Authors: Christian Reul, Christoph Wick, Uwe Springmann, Frank Puppe

    Abstract: A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretraine… ▽ More

    Submitted 21 December, 2017; v1 submitted 15 December, 2017; originally announced December 2017.

  8. arXiv:1712.00967  [pdf, other

    cs.CV

    Leaf Identification Using a Deep Convolutional Neural Network

    Authors: Christoph Wick, Frank Puppe

    Abstract: Convolutional neural networks (CNNs) have become popular especially in computer vision in the last few years because they achieved outstanding performance on different tasks, such as image classifications. We propose a nine-layer CNN for leaf identification using the famous Flavia and Foliage datasets. Usually the supervised learning of deep CNNs requires huge datasets for training. However, the u… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

  9. Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting

    Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

    Abstract: In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

  10. Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images

    Authors: Christoph Wick, Frank Puppe

    Abstract: We propose a high-performance fully convolutional neural network (FCN) for historical document segmentation that is designed to process a single page in one step. The advantage of this model beside its speed is its ability to directly learn from raw pixels instead of using preprocessing steps e. g. feature computation or superpixel generation. We show that this network yields better results than e… ▽ More

    Submitted 15 February, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: 6 pages, 7 figures, conference

  11. arXiv:1701.07396  [pdf

    cs.CV cs.AI

    LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

    Authors: Christian Reul, Uwe Springmann, Frank Puppe

    Abstract: A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible… ▽ More

    Submitted 20 January, 2017; originally announced January 2017.

  12. arXiv:cs/0509040  [pdf, ps, other

    cs.AI cs.IR

    Authoring case based training by document data extraction

    Authors: Christian Betz, Alexander Hoernlein, Frank Puppe

    Abstract: In this paper, we propose an scalable approach to modeling based upon word processing documents, and we describe the tool Phoenix providing the technical infrastructure. For our training environment d3web.Train, we developed a tool to extract case knowledge from existing documents, usually dismissal records, extending Phoenix to d3web.CaseImporter. Independent authors used this tool to develop… ▽ More

    Submitted 14 September, 2005; originally announced September 2005.

    Comments: 11 pages, 10th ChEM Workshop, 2005; technical article