-
An Iterative Fixpoint Semantics for MKNF Hybrid Knowledge Bases with Function Symbols
Authors:
Marco Alberti,
Riccardo Zese,
Fabrizio Riguzzi,
Evelina Lamma
Abstract:
Hybrid Knowledge Bases based on Lifschitz's logic of Minimal Knowledge with Negation as Failure are a successful approach to combine the expressivity of Description Logics and Logic Programming in a single language. Their syntax, defined by Motik and Rosati, disallows function symbols. In order to define a well-founded semantics for MKNF HKBs, Knorr et al. define a partition of the modal atoms occ…
▽ More
Hybrid Knowledge Bases based on Lifschitz's logic of Minimal Knowledge with Negation as Failure are a successful approach to combine the expressivity of Description Logics and Logic Programming in a single language. Their syntax, defined by Motik and Rosati, disallows function symbols. In order to define a well-founded semantics for MKNF HKBs, Knorr et al. define a partition of the modal atoms occurring in it, called the alternating fixpoint partition. In this paper, we propose an iterated fixpoint semantics for HKBs with function symbols. We prove that our semantics extends Knorr et al.'s, in that, for a function-free HKBs, it coincides with its alternating fixpoint partition. The proposed semantics lends itself well to a probabilistic extension with a distribution semantic approach, which is the subject of future work.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
CAISAR: A platform for Characterizing Artificial Intelligence Safety and Robustness
Authors:
Julien Girard-Satabin,
Michele Alberti,
François Bobot,
Zakaria Chihani,
Augustin Lemesle
Abstract:
We present CAISAR, an open-source platform under active development for the characterization of AI systems' robustness and safety. CAISAR provides a unified entry point for defining verification problems by using WhyML, the mature and expressive language of the Why3 verification platform. Moreover, CAISAR orchestrates and composes state-of-the-art machine learning verification tools which, individ…
▽ More
We present CAISAR, an open-source platform under active development for the characterization of AI systems' robustness and safety. CAISAR provides a unified entry point for defining verification problems by using WhyML, the mature and expressive language of the Why3 verification platform. Moreover, CAISAR orchestrates and composes state-of-the-art machine learning verification tools which, individually, are not able to efficiently handle all problems but, collectively, can cover a growing number of properties. Our aim is to assist, on the one hand, the V\&V process by reducing the burden of choosing the methodology tailored to a given verification problem, and on the other hand the tools developers by factorizing useful features-visualization, report generation, property description-in one platform. CAISAR will soon be available at https://git.frama-c.com/pub/caisar.
△ Less
Submitted 21 June, 2022; v1 submitted 7 June, 2022;
originally announced June 2022.
-
A Tutorial on Trusted and Untrusted non-3GPP Accesses in 5G Systems -- First Steps Towards a Unified Communications Infrastructure
Authors:
Mario Teixeira Lemes,
Antonio Marcos Alberti,
Cristiano Bonato Both,
Antonio C. de Oliveira Jr.,
Kleber Vieira Cardoso
Abstract:
Fifth-generation (5G) systems are designed to enable convergent access-agnostic service availability. This means that 5G services will be available over 5G New Radio air interface and also through other non-Third Generation Partnership Project (3GPP) access networks, e.g., IEEE 802.11 (Wi-Fi). 3GPP has recently published the Release 16 that includes trusted non-3GPP access network concept and wire…
▽ More
Fifth-generation (5G) systems are designed to enable convergent access-agnostic service availability. This means that 5G services will be available over 5G New Radio air interface and also through other non-Third Generation Partnership Project (3GPP) access networks, e.g., IEEE 802.11 (Wi-Fi). 3GPP has recently published the Release 16 that includes trusted non-3GPP access network concept and wireless wireline convergence. The main goal of this tutorial is to present an overview of access to 5G core via non-3GPP access networks specified by 3GPP until Release 16 (i.e., untrusted, trusted, and wireline access). The tutorial describes aspects of the convergence of a 5G system and these non-3GPP access networks, such as the authentication and authorization procedures and the data session establishment from the point of view of protocol stack and exchanged messages between the network functions. In order to illustrate several concepts and part of 3GPP specification, we present a basic but fully operational implementation of untrusted non-3GPP access using WLAN. We perform experiments that demonstrate how a Wi-Fi user is authorized in a 5G core and establishes user plane connectivity to a data network. Moreover, we evaluate the performance of this access in terms of time consumed, number of messages, and protocol overhead to established data sessions.
△ Less
Submitted 11 November, 2022; v1 submitted 18 September, 2021;
originally announced September 2021.
-
Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs
Authors:
Lars Vögtlin,
Manuel Drazyk,
Vinaychandran Pondenkandath,
Michele Alberti,
Rolf Ingold
Abstract:
We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require expertise or result in p…
▽ More
We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require expertise or result in poor accuracy in the synthetic documents. To achieve high precision transformations without requiring expertise, we tackle the problem in two steps. First, we create template documents with user-specified content and structure. Second, we transfer the style of a collection of unlabeled historical images to these template documents while preserving their text and layout. We evaluate the use of our synthetic historical documents in a pre-training setting and find that we outperform the baselines (randomly initialized and pre-trained). Additionally, with visual examples, we demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth.
△ Less
Submitted 16 May, 2021; v1 submitted 15 March, 2021;
originally announced March 2021.
-
Corneal Pachymetry by AS-OCT after Descemet's Membrane Endothelial Keratoplasty
Authors:
Friso G. Heslinga,
Ruben T. Lucassen,
Myrthe A. van den Berg,
Luuk van der Hoek,
Josien P. W. Pluim,
Javier Cabrerizo,
Mark Alberti,
Mitko Veta
Abstract:
Corneal thickness (pachymetry) maps can be used to monitor restoration of corneal endothelial function, for example after Descemet's membrane endothelial keratoplasty (DMEK). Automated delineation of the corneal interfaces in anterior segment optical coherence tomography (AS-OCT) can be challenging for corneas that are irregularly shaped due to pathology, or as a consequence of surgery, leading to…
▽ More
Corneal thickness (pachymetry) maps can be used to monitor restoration of corneal endothelial function, for example after Descemet's membrane endothelial keratoplasty (DMEK). Automated delineation of the corneal interfaces in anterior segment optical coherence tomography (AS-OCT) can be challenging for corneas that are irregularly shaped due to pathology, or as a consequence of surgery, leading to incorrect thickness measurements. In this research, deep learning is used to automatically delineate the corneal interfaces and measure corneal thickness with high accuracy in post-DMEK AS-OCT B-scans. Three different deep learning strategies were developed based on 960 B-scans from 50 patients. On an independent test set of 320 B-scans, corneal thickness could be measured with an error of 13.98 to 15.50 micrometer for the central 9 mm range, which is less than 3% of the average corneal thickness. The accurate thickness measurements were used to construct detailed pachymetry maps. Moreover, follow-up scans could be registered based on anatomical landmarks to obtain differential pachymetry maps. These maps may enable a more comprehensive understanding of the restoration of the endothelial function after DMEK, where thickness often varies throughout different regions of the cornea, and subsequently contribute to a standardized postoperative regime.
△ Less
Submitted 6 April, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
MAP Inference for Probabilistic Logic Programming
Authors:
Elena Bellodi,
Marco Alberti,
Fabrizio Riguzzi,
Riccardo Zese
Abstract:
In Probabilistic Logic Programming (PLP) the most commonly studied inference task is to compute the marginal probability of a query given a program. In this paper, we consider two other important tasks in the PLP setting: the Maximum-A-Posteriori (MAP) inference task, which determines the most likely values for a subset of the random variables given evidence on other variables, and the Most Probab…
▽ More
In Probabilistic Logic Programming (PLP) the most commonly studied inference task is to compute the marginal probability of a query given a program. In this paper, we consider two other important tasks in the PLP setting: the Maximum-A-Posteriori (MAP) inference task, which determines the most likely values for a subset of the random variables given evidence on other variables, and the Most Probable Explanation (MPE) task, the instance of MAP where the query variables are the complement of the evidence variables. We present a novel algorithm, included in the PITA reasoner, which tackles these tasks by representing each problem as a Binary Decision Diagram and applying a dynamic programming procedure on it. We compare our algorithm with the version of ProbLog that admits annotated disjunctions and can perform MAP and MPE inference. Experiments on several synthetic datasets show that PITA outperforms ProbLog in many cases.
△ Less
Submitted 1 September, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Quantifying Graft Detachment after Descemet's Membrane Endothelial Keratoplasty with Deep Convolutional Neural Networks
Authors:
Friso G. Heslinga,
Mark Alberti,
Josien P. W. Pluim,
Javier Cabrerizo,
Mitko Veta
Abstract:
Purpose: We developed a method to automatically locate and quantify graft detachment after Descemet's Membrane Endothelial Keratoplasty (DMEK) in Anterior Segment Optical Coherence Tomography (AS-OCT) scans. Methods: 1280 AS-OCT B-scans were annotated by a DMEK expert. Using the annotations, a deep learning pipeline was developed to localize scleral spur, center the AS-OCT B-scans and segment the…
▽ More
Purpose: We developed a method to automatically locate and quantify graft detachment after Descemet's Membrane Endothelial Keratoplasty (DMEK) in Anterior Segment Optical Coherence Tomography (AS-OCT) scans. Methods: 1280 AS-OCT B-scans were annotated by a DMEK expert. Using the annotations, a deep learning pipeline was developed to localize scleral spur, center the AS-OCT B-scans and segment the detached graft sections. Detachment segmentation model performance was evaluated per B-scan by comparing (1) length of detachment and (2) horizontal projection of the detached sections with the expert annotations. Horizontal projections were used to construct graft detachment maps. All final evaluations were done on a test set that was set apart during training of the models. A second DMEK expert annotated the test set to determine inter-rater performance. Results: Mean scleral spur localization error was 0.155 mm, whereas the inter-rater difference was 0.090 mm. The estimated graft detachment lengths were in 69% of the cases within a 10-pixel (~150μm) difference from the ground truth (77% for the second DMEK expert). Dice scores for the horizontal projections of all B-scans with detachments were 0.896 and 0.880 for our model and the second DMEK expert respectively. Conclusion: Our deep learning model can be used to automatically and instantly localize graft detachment in AS-OCT B-scans. Horizontal detachment projections can be determined with the same accuracy as a human DMEK expert, allowing for the construction of accurate graft detachment maps. Translational Relevance: Automated localization and quantification of graft detachment can support DMEK research and standardize clinical decision making.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
A Bayesian Approach to Conversational Recommendation Systems
Authors:
Francesca Mangili,
Denis Broggini,
Alessandro Antonucci,
Marco Alberti,
Lorenzo Cimasoni
Abstract:
We present a conversational recommendation system based on a Bayesian approach. A probability mass function over the items is updated after any interaction with the user, with information-theoretic criteria optimally sha** the interaction and deciding when the conversation should be terminated and the most probable item consequently recommended. Dedicated elicitation techniques for the prior pro…
▽ More
We present a conversational recommendation system based on a Bayesian approach. A probability mass function over the items is updated after any interaction with the user, with information-theoretic criteria optimally sha** the interaction and deciding when the conversation should be terminated and the most probable item consequently recommended. Dedicated elicitation techniques for the prior probabilities of the parameters modeling the interactions are derived from basic structural judgements. Such prior information can be combined with historical data to discriminate items with different recommendation histories. A case study based on the application of this approach to \emph{stagend.com}, an online platform for booking entertainers, is finally discussed together with an empirical analysis showing the advantages in terms of recommendation quality and efficiency.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks
Authors:
Michele Alberti,
Angela Botros,
Narayan Schuez,
Rolf Ingold,
Marcus Liwicki,
Mathias Seuret
Abstract:
In this work, we investigate the application of trainable and spectrally initializable matrix transformations on the feature maps produced by convolution operations. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural co…
▽ More
In this work, we investigate the application of trainable and spectrally initializable matrix transformations on the feature maps produced by convolution operations. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers, ImageNet) images to historical documents (CB55) and handwriting recognition (GPDS). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases by an average of 2.2 across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.
△ Less
Submitted 13 November, 2019; v1 submitted 12 November, 2019;
originally announced November 2019.
-
Labeling, Cutting, Grou**: an Efficient Text Line Segmentation Method for Medieval Manuscripts
Authors:
Michele Alberti,
Lars Vögtlin,
Vinaychandran Pondenkandath,
Mathias Seuret,
Rolf Ingold,
Marcus Liwicki
Abstract:
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, suc…
▽ More
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.
△ Less
Submitted 1 July, 2019; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Improving Reproducible Deep Learning Workflows with DeepDIVA
Authors:
Michele Alberti,
Vinaychandran Pondenkandath,
Lars Vögtlin,
Marcel Würsch,
Rolf Ingold,
Marcus Liwicki
Abstract:
The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and the…
▽ More
The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass
Authors:
Joel Niklaus,
Michele Alberti,
Vinaychandran Pondenkandath,
Rolf Ingold,
Marcus Liwicki
Abstract:
In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the gam…
▽ More
In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Authors:
Linda Studer,
Michele Alberti,
Vinaychandran Pondenkandath,
Pinar Goktepe,
Thomas Kolonko,
Andreas Fischer,
Marcus Liwicki,
Rolf Ingold
Abstract:
Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents.…
▽ More
Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Leveraging Random Label Memorization for Unsupervised Pre-Training
Authors:
Vinaychandran Pondenkandath,
Michele Alberti,
Sammer Puran,
Rolf Ingold,
Marcus Liwicki
Abstract:
We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the "memorization infrastructure" l…
▽ More
We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the "memorization infrastructure" learned by the network during the random-label training proves to be beneficial for the conventional supervised learning as well. We test the effectiveness of our pre-training on several video action recognition datasets (HMDB51, UCF101, Kinetics) by comparing the results of the same network with and without the random label pre-training. Our approach yields an improvement - ranging from 1.5% on UCF-101 to 5% on Kinetics - in classification accuracy, which calls for further research in this direction.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Offline Signature Verification by Combining Graph Edit Distance and Triplet Networks
Authors:
Paul Maergner,
Vinaychandran Pondenkandath,
Michele Alberti,
Marcus Liwicki,
Kaspar Riesen,
Rolf Ingold,
Andreas Fischer
Abstract:
Biometric authentication by means of handwritten signatures is a challenging pattern recognition task, which aims to infer a writer model from only a handful of genuine signatures. In order to make it more difficult for a forger to attack the verification system, a promising strategy is to combine different writer models. In this work, we propose to complement a recent structural approach to offli…
▽ More
Biometric authentication by means of handwritten signatures is a challenging pattern recognition task, which aims to infer a writer model from only a handful of genuine signatures. In order to make it more difficult for a forger to attack the verification system, a promising strategy is to combine different writer models. In this work, we propose to complement a recent structural approach to offline signature verification based on graph edit distance with a statistical approach based on metric learning with deep neural networks. On the MCYT and GPDS benchmark datasets, we demonstrate that combining the structural and statistical models leads to significant improvements in performance, profiting from their complementary properties.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Are You Tampering With My Data?
Authors:
Michele Alberti,
Vinaychandran Pondenkandath,
Marcel Würsch,
Manuel Bouillon,
Mathias Seuret,
Rolf Ingold,
Marcus Liwicki
Abstract:
We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) th…
▽ More
We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary.
△ Less
Submitted 21 August, 2018;
originally announced August 2018.
-
DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
Authors:
Michele Alberti,
Vinaychandran Pondenkandath,
Marcel Würsch,
Rolf Ingold,
Marcus Liwicki
Abstract:
We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of…
▽ More
We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, kee** track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.
△ Less
Submitted 23 April, 2018;
originally announced May 2018.
-
Identifying Cross-Depicted Historical Motifs
Authors:
Vinaychandran Pondenkandath,
Michele Alberti,
Nicole Eichenberger,
Rolf Ingold,
Marcus Liwicki
Abstract:
Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners. This is a common problem in handwritten historical documents image analysis, for instance when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional heuristic computer vision methods struggle to cope with it. In this paper we addr…
▽ More
Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners. This is a common problem in handwritten historical documents image analysis, for instance when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional heuristic computer vision methods struggle to cope with it. In this paper we address this problem using state-of-the-art deep learning techniques on a dataset of historical watermarks containing images created with different methods of reproduction, such as hand tracing, rubbing, and radiography. To study the robustness of deep learning based approaches to the cross-depiction problem, we measure their performance on two different tasks: classification and similarity rankings. For the former we achieve a classification accuracy of 96% using deep convolutional neural networks. For the latter we have a false positive rate at 95% true positive rate of 0.11. These results outperform state-of-the-art methods by a significant margin.
△ Less
Submitted 6 December, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Open Evaluation Tool for Layout Analysis of Document Images
Authors:
Michele Alberti,
Manuel Bouillon,
Rolf Ingold,
Marcus Liwicki
Abstract:
This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many…
▽ More
This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many metrics as possible to investigate layout analysis predictions, and also provide an easy way of visualizing the results. This tool evaluates document segmentation at pixel level, and support multi-labeled pixel ground truth. Finally, this tool has been successfully used for the ICDAR2017 competition on Layout Analysis for Challenging Medieval Manuscripts.
△ Less
Submitted 23 November, 2017;
originally announced December 2017.
-
A Pitfall of Unsupervised Pre-Training
Authors:
Michele Alberti,
Mathias Seuret,
Rolf Ingold,
Marcus Liwicki
Abstract:
The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead t…
▽ More
The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead to high classification accuracy. Indeed, it became research practice and is a suggested strategy by introductory books. However, we prove that this is not always the case. We thoroughly investigate the quality of features produced by Stacked Convolutional Auto-Encoders when trained to reconstruct their input. In particular, we analyze the relation between the reconstruction and classification capabilities of the network, if we were to use the same features for both tasks. Experimental results suggest that in fact, there is no correlation between the reconstruction score and the quality of features for a classification task. This means, more formally, that the sub-dimension representation space learned from the Stacked Convolutional Auto-Encoder (while being trained for input reconstruction) is not necessarily better separable than the initial input space. Furthermore, we show that the reconstruction error is not a good metric to assess the quality of features, because it is biased by the decoder quality. We do not question the usefulness of pre-training, but we conclude that aiming for the lowest reconstruction error is not necessarily a good idea if afterwards one performs a classification task.
△ Less
Submitted 17 December, 2017; v1 submitted 23 November, 2017;
originally announced December 2017.
-
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
Authors:
Michele Alberti,
Mathias Seuret,
Vinaychandran Pondenkandath,
Rolf Ingold,
Marcus Liwicki
Abstract:
In this paper, we present a novel approach to perform deep neural networks layer-wise weight initialization using Linear Discriminant Analysis (LDA). Typically, the weights of a deep neural network are initialized with: random values, greedy layer-wise pre-training (usually as Deep Belief Network or as auto-encoder) or by re-using the layers from another network (transfer learning). Hence, many tr…
▽ More
In this paper, we present a novel approach to perform deep neural networks layer-wise weight initialization using Linear Discriminant Analysis (LDA). Typically, the weights of a deep neural network are initialized with: random values, greedy layer-wise pre-training (usually as Deep Belief Network or as auto-encoder) or by re-using the layers from another network (transfer learning). Hence, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn an LDA into either a neural layer or a classification layer. We analyze the initialization technique on historical documents. First, we show that an LDA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis at pixel level, we investigate the effectiveness of LDA-based initialization and show that it outperforms state-of-the-art random weight initialization methods.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
Context Generation from Formal Specifications for C Analysis Tools
Authors:
Michele Alberti,
Julien Signoles
Abstract:
Analysis tools like abstract interpreters, symbolic execution tools and testing tools usually require a proper context to give useful results when analyzing a particular function. Such a context initializes the function parameters and global variables to comply with function requirements. However it may be error-prone to write it by hand: the handwritten context might contain bugs or not match the…
▽ More
Analysis tools like abstract interpreters, symbolic execution tools and testing tools usually require a proper context to give useful results when analyzing a particular function. Such a context initializes the function parameters and global variables to comply with function requirements. However it may be error-prone to write it by hand: the handwritten context might contain bugs or not match the intended specification. A more robust approach is to specify the context in a dedicated specification language, and hold the analysis tools to support it properly. This may mean to put significant development efforts for enhancing the tools, something that is often not feasible if ever possible.
This paper presents a way to systematically generate such a context from a formal specification of a C function. This is applied to a subset of the ACSL specification language in order to generate suitable contexts for the abstract interpretation-based value analysis plug-ins of Frama-C, a framework for analysis of code written in C. The idea here presented has been implemented in a new Frama-C plug-in which is currently in use in an operational industrial setting.
△ Less
Submitted 5 September, 2017;
originally announced September 2017.
-
A Pitfall of Unsupervised Pre-Training
Authors:
Michele Alberti,
Mathias Seuret,
Rolf Ingold,
Marcus Liwicki
Abstract:
The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead t…
▽ More
The point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead to high classification accuracy. Indeed, it became research practice and is a suggested strategy by introductory books. However, we prove that this is not always the case. We thoroughly investigate the quality of features produced by Stacked Convolutional Auto-Encoders when trained to reconstruct their input. In particular, we analyze the relation between the reconstruction and classification capabilities of the network, if we were to use the same features for both tasks. Experimental results suggest that in fact, there is no correlation between the reconstruction score and the quality of features for a classification task. This means, more formally, that the sub-dimension representation space learned from the Stacked Convolutional Auto-Encoder (while being trained for input reconstruction) is not necessarily better separable than the initial input space. Furthermore, we show that the reconstruction error is not a good metric to assess the quality of features, because it is biased by the decoder quality. We do not question the usefulness of pre-training, but we conclude that aiming for the lowest reconstruction error is not necessarily a good idea if afterwards one performs a classification task.
△ Less
Submitted 17 December, 2017; v1 submitted 13 March, 2017;
originally announced March 2017.
-
PCA-Initialized Deep Neural Networks Applied To Document Image Analysis
Authors:
Mathias Seuret,
Michele Alberti,
Rolf Ingold,
Marcus Liwicki
Abstract:
In this paper, we present a novel approach for initializing deep neural networks, i.e., by turning PCA into neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore,…
▽ More
In this paper, we present a novel approach for initializing deep neural networks, i.e., by turning PCA into neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore, typically, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn a PCA into an auto-encoder, by generating an encoder layer of the PCA parameters and furthermore adding a decoding layer. We analyze the initialization technique on real documents. First, we show that a PCA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis we investigate the effectiveness of PCA-based initialization and show that it outperforms state-of-the-art random weight initialization methods.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
On Coinductive Equivalences for Higher-Order Probabilistic Functional Programs (Long Version)
Authors:
Ugo Dal Lago,
Davide Sangiorgi,
Michele Alberti
Abstract:
We study bisimulation and context equivalence in a probabilistic $λ$-calculus. The contributions of this paper are threefold. Firstly we show a technique for proving congruence of probabilistic applicative bisimilarity. While the technique follows Howe's method, some of the technicalities are quite different, relying on non-trivial "disentangling" properties for sets of real numbers. Secondly we s…
▽ More
We study bisimulation and context equivalence in a probabilistic $λ$-calculus. The contributions of this paper are threefold. Firstly we show a technique for proving congruence of probabilistic applicative bisimilarity. While the technique follows Howe's method, some of the technicalities are quite different, relying on non-trivial "disentangling" properties for sets of real numbers. Secondly we show that, while bisimilarity is in general strictly finer than context equivalence, coincidence between the two relations is attained on pure $λ$-terms. The resulting equality is that induced by Levy-Longo trees, generally accepted as the finest extensional equivalence on pure $λ$-terms under a lazy regime. Finally, we derive a coinductive characterisation of context equivalence on the whole probabilistic language, via an extension in which terms akin to distributions may appear in redex position. Another motivation for the extension is that its operational semantics allows us to experiment with a different congruence technique, namely that of logical bisimilarity.
△ Less
Submitted 7 November, 2013;
originally announced November 2013.
-
A CHR-based Implementation of Known Arc-Consistency
Authors:
Marco Alberti,
Marco Gavanelli,
Evelina Lamma,
Paola Mello,
Michela Milano
Abstract:
In classical CLP(FD) systems, domains of variables are completely known at the beginning of the constraint propagation process. However, in systems interacting with an external environment, acquiring the whole domains of variables before the beginning of constraint propagation may cause waste of computation time, or even obsolescence of the acquired data at the time of use.
For such cases, the…
▽ More
In classical CLP(FD) systems, domains of variables are completely known at the beginning of the constraint propagation process. However, in systems interacting with an external environment, acquiring the whole domains of variables before the beginning of constraint propagation may cause waste of computation time, or even obsolescence of the acquired data at the time of use.
For such cases, the Interactive Constraint Satisfaction Problem (ICSP) model has been proposed as an extension of the CSP model, to make it possible to start constraint propagation even when domains are not fully known, performing acquisition of domain elements only when necessary, and without the need for restarting the propagation after every acquisition.
In this paper, we show how a solver for the two sorted CLP language, defined in previous work, to express ICSPs, has been implemented in the Constraint Handling Rules (CHR) language, a declarative language particularly suitable for high level implementation of constraint solvers.
△ Less
Submitted 24 August, 2004;
originally announced August 2004.