Search | arXiv e-print repository

doi 10.1145/3628797.3628976

Impact of Ground Truth Quality on Handwriting Recognition

Authors: Michael Jungo, Lars Vögtlin, Atefeh Fakhari, Nathan Wegmann, Rolf Ingold, Andreas Fischer, Anna Scius-Bertrand

Abstract: Handwriting recognition is a key technology for accessing the content of old manuscripts, hel** to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth… ▽ More Handwriting recognition is a key technology for accessing the content of old manuscripts, hel** to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as is the case for the recently introduced Bullinger database. It contains an impressive amount of over a hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: SOICT 2023

Journal ref: SOICT 2023: The 12th International Symposium on Information and Communication Technology

arXiv:2201.08295 [pdf, other]

DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

Authors: Lars Vögtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer, Rolf Ingold

Abstract: Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historica… ▽ More Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the framework also allows to reduce the time of model training significantly. △ Less

Submitted 15 February, 2024; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2103.08236 [pdf, other]

Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs

Authors: Lars Vögtlin, Manuel Drazyk, Vinaychandran Pondenkandath, Michele Alberti, Rolf Ingold

Abstract: We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require expertise or result in p… ▽ More We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require expertise or result in poor accuracy in the synthetic documents. To achieve high precision transformations without requiring expertise, we tackle the problem in two steps. First, we create template documents with user-specified content and structure. Second, we transfer the style of a collection of unlabeled historical images to these template documents while preserving their text and layout. We evaluate the use of our synthetic historical documents in a pre-training setting and find that we outperform the baselines (randomly initialized and pre-trained). Additionally, with visual examples, we demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth. △ Less

Submitted 16 May, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.07007 [pdf]

doi 10.1016/j.softx.2021.100684

Document Towers: A MATLAB software implementing a three-dimensional architectural paradigm for the visual exploration of digital documents and libraries

Authors: Vlad Atanasiu, Rolf Ingold

Abstract: This article introduces the generic Document Towers paradigm, visualization, and software for visualizing the structure of paginated documents, based on the metaphor of documents-as-architecture. The Document Towers visualizations resemble three-dimensional building models and represent the physical boundaries of logical (e.g., titles, images), semantic (e.g., topics, named entities), graphical (e… ▽ More This article introduces the generic Document Towers paradigm, visualization, and software for visualizing the structure of paginated documents, based on the metaphor of documents-as-architecture. The Document Towers visualizations resemble three-dimensional building models and represent the physical boundaries of logical (e.g., titles, images), semantic (e.g., topics, named entities), graphical (e.g., typefaces, colors), and other types of information with spatial extent as a stack of rooms and floors. The software takes as input user-supplied JSON-formatted coordinates and labels of document entities, or extracts them itself from ALTO and InDesign IDML files. The Document Towers paradigm and visualization enable information systems to support information behaviors other than goal-oriented searches. Visualization encourages exploration by generating panoramic overviews and fostering serendipitous insights, while the use of metaphors assists with comprehension of the representations through the application of a familiar cognitive model. Document Towers visualizations also provide access to types of information other than textual content, specifically by means of their physical structure, which corresponds to the material, logical, semantic, and contextual aspects of documents. Visualization renders documents transparent, making the invisible visible and facilitating analysis at a glance and without the need for physical manipulation. Keyword searches and other language-based interactions with documents must be clearly expressed and will return only answers to questions asked; by contrast, visual observation is well suited to fuzzy goals and uncovering unexpected aspects of the data. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: Afferent Open Source software is found at https://github.com/ElsevierSoftwareX/SOFTX-D-21-00009

Journal ref: SoftwareX, 2021 (14): 100684

arXiv:1911.05045 [pdf, other]

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

Authors: Michele Alberti, Angela Botros, Narayan Schuez, Rolf Ingold, Marcus Liwicki, Mathias Seuret

Abstract: In this work, we investigate the application of trainable and spectrally initializable matrix transformations on the feature maps produced by convolution operations. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural co… ▽ More In this work, we investigate the application of trainable and spectrally initializable matrix transformations on the feature maps produced by convolution operations. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers, ImageNet) images to historical documents (CB55) and handwriting recognition (GPDS). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases by an average of 2.2 across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source. △ Less

Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 8 pages

arXiv:1906.11894 [pdf, other]

Labeling, Cutting, Grou**: an Efficient Text Line Segmentation Method for Medieval Manuscripts

Authors: Michele Alberti, Lars Vögtlin, Vinaychandran Pondenkandath, Mathias Seuret, Rolf Ingold, Marcus Liwicki

Abstract: This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, suc… ▽ More This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset. △ Less

Submitted 1 July, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Journal ref: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia

arXiv:1906.10401 [pdf, other]

Graph-Based Offline Signature Verification

Authors: Paul Maergner, Nicholas R. Howe, Kaspar Riesen, Rolf Ingold, Andreas Fischer

Abstract: Graphs provide a powerful representation formalism that offers great promise to benefit tasks like handwritten signature verification. While most state-of-the-art approaches to signature verification rely on fixed-size representations, graphs are flexible in size and allow modeling local features as well as the global structure of the handwriting. In this article, we present two recent graph-based… ▽ More Graphs provide a powerful representation formalism that offers great promise to benefit tasks like handwritten signature verification. While most state-of-the-art approaches to signature verification rely on fixed-size representations, graphs are flexible in size and allow modeling local features as well as the global structure of the handwriting. In this article, we present two recent graph-based approaches to offline signature verification: keypoint graphs with approximated graph edit distance and inkball models. We provide a comprehensive description of the methods, propose improvements both in terms of computational time and accuracy, and report experimental results for four benchmark datasets. The proposed methods achieve top results for several benchmarks, highlighting the potential of graph-based signature verification. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1906.04736 [pdf, other]

Improving Reproducible Deep Learning Workflows with DeepDIVA

Authors: Michele Alberti, Vinaychandran Pondenkandath, Lars Vögtlin, Marcel Würsch, Rolf Ingold, Marcus Liwicki

Abstract: The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and the… ▽ More The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Journal ref: 6th Swiss Conference on Data Science (SDS), Bern, Switzerland, 2019

arXiv:1906.04439 [pdf, ps, other]

Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

Authors: Joel Niklaus, Michele Alberti, Vinaychandran Pondenkandath, Rolf Ingold, Marcus Liwicki

Abstract: In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the gam… ▽ More In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Journal ref: 6th Swiss Conference on Data Science (SDS), Bern, Switzerland, 2019

arXiv:1905.09113 [pdf, other]

A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

Authors: Linda Studer, Michele Alberti, Vinaychandran Pondenkandath, Pinar Goktepe, Thomas Kolonko, Andreas Fischer, Marcus Liwicki, Rolf Ingold

Abstract: Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents.… ▽ More Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval. △ Less

Submitted 22 May, 2019; originally announced May 2019.

arXiv:1811.01640 [pdf, other]

Leveraging Random Label Memorization for Unsupervised Pre-Training

Authors: Vinaychandran Pondenkandath, Michele Alberti, Sammer Puran, Rolf Ingold, Marcus Liwicki

Abstract: We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the "memorization infrastructure" l… ▽ More We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the "memorization infrastructure" learned by the network during the random-label training proves to be beneficial for the conventional supervised learning as well. We test the effectiveness of our pre-training on several video action recognition datasets (HMDB51, UCF101, Kinetics) by comparing the results of the same network with and without the random label pre-training. Our approach yields an improvement - ranging from 1.5% on UCF-101 to 5% on Kinetics - in classification accuracy, which calls for further research in this direction. △ Less

Submitted 5 November, 2018; originally announced November 2018.

Comments: 6 pages

arXiv:1810.07491 [pdf, ps, other]

doi 10.1007/978-3-319-97785-0_45

Offline Signature Verification by Combining Graph Edit Distance and Triplet Networks

Authors: Paul Maergner, Vinaychandran Pondenkandath, Michele Alberti, Marcus Liwicki, Kaspar Riesen, Rolf Ingold, Andreas Fischer

Abstract: Biometric authentication by means of handwritten signatures is a challenging pattern recognition task, which aims to infer a writer model from only a handful of genuine signatures. In order to make it more difficult for a forger to attack the verification system, a promising strategy is to combine different writer models. In this work, we propose to complement a recent structural approach to offli… ▽ More Biometric authentication by means of handwritten signatures is a challenging pattern recognition task, which aims to infer a writer model from only a handful of genuine signatures. In order to make it more difficult for a forger to attack the verification system, a promising strategy is to combine different writer models. In this work, we propose to complement a recent structural approach to offline signature verification based on graph edit distance with a statistical approach based on metric learning with deep neural networks. On the MCYT and GPDS benchmark datasets, we demonstrate that combining the structural and statistical models leads to significant improvements in performance, profiting from their complementary properties. △ Less

Submitted 17 October, 2018; originally announced October 2018.

Journal ref: Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science, vol 11004. Springer, Cham

arXiv:1808.06809 [pdf, other]

Are You Tampering With My Data?

Authors: Michele Alberti, Vinaychandran Pondenkandath, Marcel Würsch, Manuel Bouillon, Mathias Seuret, Rolf Ingold, Marcus Liwicki

Abstract: We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) th… ▽ More We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary. △ Less

Submitted 21 August, 2018; originally announced August 2018.

Comments: 18 pages

Journal ref: European Conference on Computer Vision (ECCV 2018), Workshop on Objectionable Content and Misinformation

arXiv:1805.00329 [pdf, other]

DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

Authors: Michele Alberti, Vinaychandran Pondenkandath, Marcel Würsch, Rolf Ingold, Marcus Liwicki

Abstract: We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of… ▽ More We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, kee** track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices. △ Less

Submitted 23 April, 2018; originally announced May 2018.

Comments: Submitted at the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 6 pages, 6 Figures

arXiv:1804.01728 [pdf, other]

Identifying Cross-Depicted Historical Motifs

Authors: Vinaychandran Pondenkandath, Michele Alberti, Nicole Eichenberger, Rolf Ingold, Marcus Liwicki

Abstract: Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners. This is a common problem in handwritten historical documents image analysis, for instance when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional heuristic computer vision methods struggle to cope with it. In this paper we addr… ▽ More Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners. This is a common problem in handwritten historical documents image analysis, for instance when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional heuristic computer vision methods struggle to cope with it. In this paper we address this problem using state-of-the-art deep learning techniques on a dataset of historical watermarks containing images created with different methods of reproduction, such as hand tracing, rubbing, and radiography. To study the robustness of deep learning based approaches to the cross-depiction problem, we measure their performance on two different tasks: classification and similarity rankings. For the former we achieve a classification accuracy of 96% using deep convolutional neural networks. For the latter we have a false positive rate at 95% true positive rate of 0.11. These results outperform state-of-the-art methods by a significant margin. △ Less

Submitted 6 December, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 6 pages, 6 figures

Journal ref: 16th International Conference on Frontiers in Handwriting Recognition (Vol. 16, pp. 333-338), IEEE, 2018

arXiv:1712.01656 [pdf, other]

doi 10.1109/ICDAR.2017.311

Open Evaluation Tool for Layout Analysis of Document Images

Authors: Michele Alberti, Manuel Bouillon, Rolf Ingold, Marcus Liwicki

Abstract: This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many… ▽ More This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many metrics as possible to investigate layout analysis predictions, and also provide an easy way of visualizing the results. This tool evaluates document segmentation at pixel level, and support multi-labeled pixel ground truth. Finally, this tool has been successfully used for the ICDAR2017 competition on Layout Analysis for Challenging Medieval Manuscripts. △ Less

Submitted 23 November, 2017; originally announced December 2017.

Comments: The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), HIP: 4th International Workshop on Historical Document Imaging and Processing, Kyoto, Japan, 2017

Journal ref: ICDAR-OST 2017

arXiv:1712.01655

A Pitfall of Unsupervised Pre-Training

Authors: Michele Alberti, Mathias Seuret, Rolf Ingold, Marcus Liwicki

Abstract: The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead t… ▽ More The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead to high classification accuracy. Indeed, it became research practice and is a suggested strategy by introductory books. However, we prove that this is not always the case. We thoroughly investigate the quality of features produced by Stacked Convolutional Auto-Encoders when trained to reconstruct their input. In particular, we analyze the relation between the reconstruction and classification capabilities of the network, if we were to use the same features for both tasks. Experimental results suggest that in fact, there is no correlation between the reconstruction score and the quality of features for a classification task. This means, more formally, that the sub-dimension representation space learned from the Stacked Convolutional Auto-Encoder (while being trained for input reconstruction) is not necessarily better separable than the initial input space. Furthermore, we show that the reconstruction error is not a good metric to assess the quality of features, because it is biased by the decoder quality. We do not question the usefulness of pre-training, but we conclude that aiming for the lowest reconstruction error is not necessarily a good idea if afterwards one performs a classification task. △ Less

Submitted 17 December, 2017; v1 submitted 23 November, 2017; originally announced December 2017.

Comments: This submission has been withdrawn by the author, it is a duplicate of arXiv:1703.04332

Journal ref: Conference on Neural Information Processing Systems, Deep Learning: Bridging Theory and Practice, December 2017

arXiv:1710.07363 [pdf, other]

doi 10.1145/3151509.3151519

Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks

Authors: Michele Alberti, Mathias Seuret, Vinaychandran Pondenkandath, Rolf Ingold, Marcus Liwicki

Abstract: In this paper, we present a novel approach to perform deep neural networks layer-wise weight initialization using Linear Discriminant Analysis (LDA). Typically, the weights of a deep neural network are initialized with: random values, greedy layer-wise pre-training (usually as Deep Belief Network or as auto-encoder) or by re-using the layers from another network (transfer learning). Hence, many tr… ▽ More In this paper, we present a novel approach to perform deep neural networks layer-wise weight initialization using Linear Discriminant Analysis (LDA). Typically, the weights of a deep neural network are initialized with: random values, greedy layer-wise pre-training (usually as Deep Belief Network or as auto-encoder) or by re-using the layers from another network (transfer learning). Hence, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn an LDA into either a neural layer or a classification layer. We analyze the initialization technique on historical documents. First, we show that an LDA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis at pixel level, we investigate the effectiveness of LDA-based initialization and show that it outperforms state-of-the-art random weight initialization methods. △ Less

Submitted 19 October, 2017; originally announced October 2017.

Comments: 5 pages

Journal ref: ICDAR-HIP 2017

arXiv:1703.04332 [pdf, other]

A Pitfall of Unsupervised Pre-Training

Authors: Michele Alberti, Mathias Seuret, Rolf Ingold, Marcus Liwicki

Abstract: The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead t… ▽ More The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead to high classification accuracy. Indeed, it became research practice and is a suggested strategy by introductory books. However, we prove that this is not always the case. We thoroughly investigate the quality of features produced by Stacked Convolutional Auto-Encoders when trained to reconstruct their input. In particular, we analyze the relation between the reconstruction and classification capabilities of the network, if we were to use the same features for both tasks. Experimental results suggest that in fact, there is no correlation between the reconstruction score and the quality of features for a classification task. This means, more formally, that the sub-dimension representation space learned from the Stacked Convolutional Auto-Encoder (while being trained for input reconstruction) is not necessarily better separable than the initial input space. Furthermore, we show that the reconstruction error is not a good metric to assess the quality of features, because it is biased by the decoder quality. We do not question the usefulness of pre-training, but we conclude that aiming for the lowest reconstruction error is not necessarily a good idea if afterwards one performs a classification task. △ Less

Submitted 17 December, 2017; v1 submitted 13 March, 2017; originally announced March 2017.

Comments: Conference on Neural Information Processing Systems, Deep Learning: Bridging Theory and Practice, December 2017

ACM Class: I.2.6, I.5.2, I.7.5

arXiv:1702.00177 [pdf, other]

doi 10.1109/ICDAR.2017.148

PCA-Initialized Deep Neural Networks Applied To Document Image Analysis

Authors: Mathias Seuret, Michele Alberti, Rolf Ingold, Marcus Liwicki

Abstract: In this paper, we present a novel approach for initializing deep neural networks, i.e., by turning PCA into neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore,… ▽ More In this paper, we present a novel approach for initializing deep neural networks, i.e., by turning PCA into neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore, typically, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn a PCA into an auto-encoder, by generating an encoder layer of the PCA parameters and furthermore adding a decoding layer. We analyze the initialization technique on real documents. First, we show that a PCA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis we investigate the effectiveness of PCA-based initialization and show that it outperforms state-of-the-art random weight initialization methods. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Journal ref: ICDAR 2017

Showing 1–20 of 20 results for author: Ingold, R